Data visualization

    How to create data visualization of presented information    

Creating effective data visualizations from presented information involves a systematic process. It's not just about picking a chart type; it's about understanding your data, your audience, and your message.

Here's a step-by-step guide on how to create data visualizations, assuming "presented information" means you already have your raw data:

Phase 1: Understand Your Data and Your Goal

  1. Understand Your Data:

    • What type of data do you have?
      • Quantitative (Numerical): Discrete (counts, e.g., number of students) or Continuous (measurements, e.g., temperature, sales revenue)
      • Categorical (Nominal/Ordinal): Nominal (labels, no order, e.g., gender, city) or Ordinal (labels with order, e.g., education level, satisfaction ratings).
      • Temporal (Time-based): Dates, times.
      • Geographical (Spatial): Locations, coordinates.
    • What are the key variables? Identify the columns/fields you want to visualize.
    • What are their ranges and distributions? (e.g., min/max values, averages, outliers).
    • Are there any missing values or inconsistencies? Clean your data if necessary.
  1. Define Your Goal and Message:

    • What question are you trying to answer? (e.g., "Which product sells the most?", "How has our revenue changed over time?").
    • What insight do you want to convey? (e.g., "Sales peaked in Q3," "Product A consistently underperforms").
    • Who is your audience? (e.g., executives, technical team, general public). This influences complexity, jargon, and visual style.
    • What action do you want your audience to take? (e.g., "Invest in Product X," "Adjust marketing strategy").

Phase 2: Choose the Right Visualization Type

This is the most critical step. The wrong chart can obscure insights or mislead your audience. Here's a quick guide based on common data relationships:

  • Comparing Values (How much/many?):

    • Bar Chart: Comparing discrete categories (e.g., sales by region, number of customers by product).
    • Column Chart: Similar to bar, but typically for categories or time series if few data points.
    • Ranked Bar/Column Chart: For showing top N or bottom N items.
    • Bullet Chart: For comparing actual performance against a target.
  • Showing Composition (Parts of a whole?):

    • Pie Chart/Donut Chart: (Use sparingly for 2-5 categories, where the sum is 100%). Overuse makes comparisons difficult.
    • Stacked Bar/Column Chart: For showing how composition changes across categories or over time.
    • Treemap: For hierarchical data, showing proportion by area.
  • Illustrating Distribution (How is data spread?):

    • Histogram: Showing the frequency distribution of a continuous variable.
    • Box Plot: Showing the spread, median, quartiles, and outliers of a numerical dataset, good for comparing distributions across groups.
    • Density Plot (Kernel Density Estimate): Showing the probability density function of a continuous variable.
  • Analyzing Trends Over Time (How has it changed?):

    • Line Chart: Showing trends of one or more continuous variables over time (most common for time series).
    • Area Chart: Similar to line chart, but useful for showing cumulative totals over time.
  • Showing Relationships/Correlations (What's the connection?):

    • Scatter Plot: Showing the relationship between two continuous variables. Useful for identifying clusters, outliers, and correlations.
    • Bubble Chart: Similar to scatter, but adds a third dimension (size of bubble).
    • Heatmap: Showing relationships between two categorical variables, or the strength of a correlation matrix.
  • Displaying Geographical Data (Where is it?):

    • Choropleth Map: Shading regions based on a data value (e.g., population density by county).
    • Symbol Map (Proportional Symbol Map): Placing symbols on a map where size or color indicates a value.
  • Text/Qualitative Data:

    • Word Cloud: (Use with caution, often more decorative than informative).
    • Bar Chart of Frequencies: Categorizing text data and counting occurrences.

Phase 3: Design and Refine Your Visualization

  1. Choose Your Tool:

    • Spreadsheets: Excel, Google Sheets (good for basic charts).
    • Business Intelligence (BI) Tools: Tableau, Power BI, Qlik Sense (powerful, interactive dashboards).
    • Programming Libraries: Python (Matplotlib, Seaborn, Plotly, Altair), R (ggplot2) (most flexible, for custom and complex visualizations).
    • Online Chart Makers: Canva, Piktochart, Datawrapper (user-friendly for quick, shareable charts).
  2. Follow Design Best Practices:

    • Clarity over Clutter: Remove unnecessary elements (chart junk).
    • Appropriate Scaling: Start Y-axes at zero for bar/column charts to avoid misleading comparisons. For line charts, choose scales that highlight trends without distortion.
    • Clear Labels and Titles:
      • Chart Title: Concise and descriptive, conveying the main message.
      • Axis Labels: Clearly indicate what each axis represents, including units.
      • Data Labels: Add labels to bars/points if necessary, especially for precise values.
      • Legend: Only if multiple data series are present, placed logically.
    • Strategic Color Use:
      • Use color to highlight key data points or categories.
      • Be mindful of colorblindness (use color-blind friendly palettes).
      • Avoid using too many colors; it can be distracting.
      • Use consistent color schemes across related visualizations.
    • Font Choice: Legible fonts, consistent sizing.
    • Whitespace: Allow for breathing room around elements.
    • Consistency: Maintain consistent styling across multiple charts in a report/dashboard.
  3. Iterate and Get Feedback:

    • Create a draft.
    • Ask someone from your target audience to review it.
    • Is the message clear? Is it easy to understand? Are there any ambiguities?
    • Refine based on feedback.

Example Walkthrough: Sales Data

Presented Information (Raw Data):

Month Product A Sales Product B Sales Product C Sales
Jan 120 80 50
Feb 130 90 60
Mar 110 85 55
Apr 140 95 70
May 150 100 75

Phase 1: Understand Data & Goal

  • Data Type: Monthly sales (temporal, quantitative). Products (categorical).
  • Goal: Show product sales trends over time, and compare performance between products.
  • Audience: Sales team, interested in performance and identifying top/bottom performers.

Phase 2: Choose Visualization Type

  • Line Chart: Best for showing trends over time for multiple series. We can see how each product performs month-to-month and how they compare against each other.

Phase 3: Design and Refine

  • Tool: Excel or any BI tool.
  • Design:
    • Title: "Monthly Sales Performance by Product (Jan-May)"
    • X-axis: "Month" (Jan, Feb, Mar, Apr, May)
    • Y-axis: "Sales Units"
    • Lines: Separate lines for Product A, B, and C.
    • Colors: Distinct colors for each product.
    • Legend: Clear legend to identify each product line.

What the visualization would show: A line chart where the X-axis is months and the Y-axis is sales units. Three distinct lines, one for each product, show their individual sales trajectories. You would immediately see that Product A has the highest sales, and all products show a general upward trend over the five months.

By following these steps, you can transform raw data into powerful visual stories that communicate insights effectively.