“A picture is worth a thousand words.” This is certainly true when you’re presenting and explaining data. You can provide tables setting out the figures, and you can talk about numbers, percentages, and relationships forever. However, the chances are that your point will be lost if you rely on these alone. Put up a graph or a chart, and suddenly everything you’re saying makes sense!
Graphs or charts help people understand data quickly. Whether you want to make a comparison, show a relationship, or highlight a trend, they help your audience “see” what you are talking about.
The trouble is there are so many different types of charts and graphs that it’s difficult to know which one to choose. Click on the chart option in your spreadsheet program and you’re presented with many styles. They all look smart, but which one is appropriate for the data you’ve collected?
Can you use a bar graph to show a trend? Is a line graph appropriate for sales data? When do you use a pie chart? The spreadsheet will chart anything you tell it to, whether the end result makes sense or not. It just takes its orders and executes them!
To figure out what orders to give, you need to have a good understanding of the mechanics of charts, graphs and diagrams. We’ll show you the basics using four very common graph types:
- Line graph
- Scatter Plots
- Bar graph
- Pie chart
- Venn diagram
First we’ll start with some basics.
X and Y Axes – Which is which?
To create most charts or graphs, excluding pie charts, you typically use data that is plotted in two dimensions, as shown in Figure 1.
- The horizontal dimension is the x-axis.
- The vertical dimension is the y-axis.
-
To remember which axis is which, think of the x-axis as going along the corridor and the y-axis as going up the stairs. The letter “a” comes before “u” in the alphabet just as “x” comes before “y”.
When you come to plot data, the known value goes on the x-axis and the measured (or “unknown”) value on the y-axis. For example, if you were to plot the measured average temperature for a number of months, you’d set up axes as shown in Figure 2:
- The next issue you face is deciding what type of graph to use.
2.3.1 Scatter plot or Scatter graph
Is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data.
The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.[2] This kind of plot is also called a scatter chart, scattergram, scatter diagram or scatter gr
- Scatter Plots (also called scatter diagrams) are used to investigate the possible relationship between two variables that both relate to the same “event.” A straight line of best fit (using the least squares method) is often included.
Things to look for:
If the points cluster in a band running from lower left to upper right, there is a positive correlation (if x increases, y increases).
If the points cluster in a band from upper left to lower right, there is a negative correlation (if x increases, y decreases).
Imagine drawing a straight line or curve through the data so that it “fits” as well as possible. The more the points cluster closely around the imaginary line of best fit, the stronger the relationship that exists between the two variables.
If it is hard to see where you would draw a line, and if the points show no significant clustering, there is probably no correlation.
Caution!
There is a maxim in statistics that says, “Correlation does not imply causality.” In other words, your scatter plot may show that a relationship exists, but it does not and cannot prove that one variable is causing the other. There could be a third factor involved which is causing both, some other systemic cause, or the apparent relationship could just be a fluke. Nevertheless, the scatter plot can give you a clue that two things might be related, and if so, how they move together.
Scatter Plot statistics:
For scatter plots, the following statistics are calculated: Mean X and Y: the average of all the data points in the series.
Maximum X and Y: the maximum value in the series.
Minimum X and Y the minimum value in the series.
Sample Size the number of values in the series.
X Range and Y Range the maximum value minus the minimum value.
Standard Deviations for X and Y values Indicates how widely data is spread around the mean.
Line of Best Fit – Slope The slope of the line which fits the data most closely (generally using the least squares method).
Line of Best Fit – Y Intercept The point at which the line of best fit crosses the Y axis.
2.3.2 Line graphs
One of the most common graphs you will encounter is a line graph. Line graphs simply use a line to connect the data points that you plot. They are most useful for showing trends, and for identifying whether two variables relate to (or “correlate with”) one another.
- Trend data:
- How do sales vary from month to month?
- How does engine performance change as its temperature increases?
- Correlation:
- On average, how much sleep do people get, based on their age?
- Does the distance a child lives from school affect how frequently he or she is late?
You can only use line graphs when the variable plotted along the x-axis is continuous – for example, time, temperature or distance.
Note:
When the y-axis indicates a quantity or percent and the x-axis represents units of time, the line graph is often referred to as a time series graph.Example: ABC Enterprises’ sales vary throughout the year. By plotting sales figures on a line graph, as shown in Figure 3
.It is easy to see the main fluctuations during the course of a year. Here, sales drop off during the summer months, and around New Year.
While some seasonal variation may be unavoidable in the line of business ABC Enterprises is in, it may be possible to boost cash flows during the low periods through marketing activity and special offers.
Line graphs can also depict multiple series. In this example you might have different trend lines for different product categories or store locations, as shown in Figure 4 below. It’s easy to compare trends when they’re represented on the same graph.
-
2.3.3 Bar Graphs
Another type of graph that shows relationships between different data series is the bar graph. Here the height of the bar represents the measured value or frequency: The higher or longer the bar, the greater the value.
ABC Enterprises sells three different models of its main product, the Alpha, the Platinum, and the Deluxe. By plotting the sales each model over a three year period, it becomes easy to see trends that might be masked by a simple analysis of the figures themselves. In Figure 5, you can see that, although the Deluxe is the highest-selling of the three, its sales have dropped off over the three year period, while sales of the other two have continued to grow. Perhaps the Deluxe is becoming outdated and needs to be replaced with a new model? Or perhaps it’s suffering from stiffer competition than the other two?
-
Of course, you could also represent this data on a multiple series line graph as shown in Figure 6.
-
Often the choice comes down to how easy the trend is to spot. In this example the line graph actually works better than the bar graph, but this might not be the case if the chart had to show data for 20 models rather than just three. It’s worth noting, though, that if you can use a line graph for your data you can often use a bar graph just as well.
The opposite is not always true, when your x-axis variables represent discontinuous data (such as different products or sales territories), you can only use a bar graph.
In general, line graphs are used to demonstrate data that is related on a continuous scale, whereas bar graphs are used to demonstrate discontinuous data.
Data can also be represented on a horizontal bar graph as shown in Figure 7. This is often the preferred method when you need more room to describe the measured variable. It can be written on the side of the graph rather than squashed underneath the x-axis.
-
Note:
A bar graph is not the same as a histogram. On a histogram, the width of the bar varies according to the range of the x-axis variable (for example, 0-2, 3-10, 11-20, 20-40 and so on) and the area of the column indicates the frequency of the data. With a bar graph, it is only the height of the bar that matters.
Leave a Reply
You must be logged in to post a comment.