Visualizing Categorical Data as Flows with Alluvial Diagrams

Alluvial diagrams are a type of flow diagram that  have traditionally been used to visually show changes in network structures over time. Density Design has included Alluvial Diagrams in their RAW online visualization tool and explored its use to show “relations between dimensions of categorical data.”

RAW is such a wonderfully easy tool to use that I wanted to explore the Alluvial diagram functionality with a few different data sets to see how the visualizations would come out.

United States Cancer Statistics

2010 Top 10 Cancers

The CDC has a data set on the top 10 cancer types categorized by gender, race and break these down into a series of traditional bar charts for each gender/race combination, i.e. white male, white female, black male, black female, asian male, asian female, etc. with the top ten cancer types for each with incidence rate per 100,000 for each.

CDC Bar Chart of Top 10 Cancers among white males

CDC Bar Chart of Top 10 Cancers among white males

 

What if we could show the data for both genders and all 5 races in the data set with the cancer types in one diagram that visually represents the incidence rate flow from gender, to race to cancer type.

Cancer-Alluvial-Diagram

Alluvial diagram showing “flow” of cancer incidence rate from gender to race to cancer type.

[Click on image to view larger version]

While this doesn’t do a good job of showing exact cancer incidence rates, it does give a good quick overview of which cancer types are more associated with a given gender & race combination. And, an interactive version of this data visualization could be made to show the incidence rate when hovering over one of the flows between the category nodes. I think this kind of diagram is much more visually striking and engaging than a series of bar charts and does a much better job of providing a high-level summary of the main cancers afflicting people in the US.

Energy Consumption by Source, Ranked by State, 2011

Let’s look at another example. The U.S. Energy Information Administration publishes reports on how much energy each state consumes from which sources; coal, natural gas, petroleum, etc. This is presented in a table.

Energy-Consumption-Table

 

This data also might typically be represented visually in a bar chart or multiple bar charts. But how would it look as a flow diagram? To find out, I copied the data and reformatted it a bit (more about the necessary data format for RAW’s alluvial diagrams in a subsequent article) and quickly generated this.

State-Energy-Consumption-Error

Look kind of cool, but if you’re observant, you’ll notice that something’s wrong. On the right side the states that are the highest consumers of a resource are at the top and those with the lowest are at the bottom. I found it hard to believe that Michigan, North Carolina, Alabama and others consumed more energy than California. In fact, California isn’t even showing up. It turns out there’s a simple little “gotcha” that can really mess up diagrams in RAW – comma separators for thousands. So, the number 3,511.40 for California’s Petroleum consumption needs to be formatted without the comma separator – 3511.40.

Screen shot 2014-06-02 at 3.42.04 PM

Going back to my spreadsheet that held the data and turning off the thousands comma separator and then re-pasting the data back into RAW fixed the problem and yield a diagram that made more sense.

State-Energy-Consumption

Though I think this looks kind of interesting I don’t think it does a very good job of visually comparing the values of each energy resource for each state. There’s so many states that the energy flow just gets lost in a blur of thin lines. So, I think the take away may be that for this kind of visualization to work, you need to limit it to a smaller set of categories as with the cancer visualization. (Note: In this example the image is the initial output from RAW without any editing which is why the lowest item, District of Columbia, is partially cut off).

Just for Fun – Renewable Energy Production

Since the last example look at traditional sources of energy, I thought it might be interesting to see visualize renewable energy production for each state, specifically wind and solar for March 2014.

wind-solar-1

So, alluvial diagrams for visualizing categorical dimensions as flows – worth considering for the right kind of data set. I’ll be working on a tutorial to show how to create these in RAW and Illustrator including tips on formatting your data in the right way in your spreadsheet to make it all work.

 

Tags:

Comments are closed.