As the recent animated video I created for MD Revolution pointed out, we face an epidemic of chronic disease in America. Many people aren’t aware that a large percentage of chronic disease is preventable through lifestyle changes like exercise and diet. There are direct correlations between physical activity, obesity and chronic disease rates such as heart disease and cancer.

In addition to animated videos, I believe data visualizations, both static and interactive, are a powerful tool to communicate and explain. So, I’ve been wanting to explore some of the data available related to physical activity, obesity and chronic disease in America. I’m also a big fan of scatter plots and bubble charts to show correlations and the new data visualization tool from Density Design, RAW, is well-suited for quick exploration of data.

Much has been written about the burgeoning obesity rates in the United States. On average, one third of Americans are obese or overweight. But this rate is not distributed evenly throughout the country. Some states and regions have a much higher rate than others. And, as the bubble charts below show, the states with the higher rates correspond to states with the lowest physical activity (Tennessee, Mississippi, Louisiana, Alabama, West Virginia). In the bubble plots the follow, physical activity levels (Percent of Adults who Participated in Moderate or Vigorous Physical Activities) are shown on the x-axis and obesity/overweight rates are shown on the y-axis. Also shown in each of the three plots below are bubbles corresponding to heart disease deaths per 100,000, cancer deaths per 100,000, and cancer rates per 100,000 for each state. The larger the bubble, the higher the rate of heart disease deaths, cancer deaths or cancer rate for the state.

Data Source: All data for these visualizations came from the Kaiser Family Foundation State Health Facts

Correlation of Physical Activity to Obesity and Deaths Due to Heart Disease

Chronic-Disease-Heart-Disease-Deaths-v2

Correlation of Physical Activity to Obesity and Deaths Due to Cancer

Chronic-Disease-Cancer-Deaths-v2

Correlation of Physical Activity to Obesity and Cancer Rate

Chronic-Disease-Cancer-Rate-v2

The bubble charts are a great way to get a very quick sense of relative relationships state-by-state, of obesity to physical activity and the chronic disease. For example, it’s very easy to see that states like Colorado and Hawaii are doing much better in terms of physical activity, obesity and heart disease deaths than states such as Mississippi, Tennessee, Alabama, and Louisiana. These charts also make it very evident that physical inactivity correlates well with higher obesity rates and higher number of deaths to heart disease and cancer.

The charts are also good for pointing out the outliers very quickly. For example, Wisconsin has an activity rate that is nearly as good as California but has a much higher rate of obesity/overweight.

The downside of bubble charts is that circles are a relatively poor way to compare absolute values of one state to another. As humans, our perceptual bias is to compare the circle sizes by their height, rather than the area of the circle as bubble charts ought to be interpreted according to Alberto Cairo, a leading educator in the data visualization community. As Alberto Cairo points out in his book  The Functional Art, “… the human brain is not good at calculating surface areas.” [page 52] Regarding bubbles, Cairo continues, “You want readers to compare areas, but they tend to compare heights.” [page 53]

Something else that’s a little harder to see initially is the comparison of cancer rate and deaths due to cancer. For example, look at cancer deaths for Hawaii and Colorado and they’re among the lowest on the plot. But if you look at those the cancer rate for those two states, they’re not quite as exemplary. Is that because the types of cancer that residents of Hawaii and Colorado experience are less lethal than those in other states?

While this data may not come as a surprise to seasoned healthcare industry veterans, it raised some questions and issues for me. The main question that comes to my mind about this data relates to how digital health companies like fitness device makers are using this data in their marketing. If you’re FitBit or Jawbone, MisFit Wearables, or some other fitness device maker how does the physical activity level by state affect your marketing plans? Do you focus on states like Colorado and Oregon where people seem to be more inclined to engage in vigorous physical activity? Do you ignore Tennessee, Mississippi, Louisiana, Alabama, and West Virginia because they seem so disinclined to exercise? And if you do, what are the ethical implications of that decision? Should you focus more on the low physical activity states because that’s where it’s needed most?

The other aspect of this exploration for me was to explore Density Design’s free online data visualization tool, RAW. I had previously experimented with RAW using some baseball data, but I wanted to try something a bit more serious. RAW is a very easy tool to work with but there’s currently a flaw in it’s remapping algorithm that can generate some very misleading graphics. In my initial exploration of the heart disease data in RAW I noticed something seemed a little bit odd in the sizes of the bubbles in bubble charts. For example in the heart disease plot, the value of the largest bubble is 251.1 (Mississippi in the top left corner) while for the smallest it’s 119.4 (Minnesota) or an easier one to see, Hawaii at 134.7, but the disparity in bubble size for these values was grossly over-exaggerated as you can see in the following image which is the original, uncompensated, unedited plot from within RAW.
Heart-Disease-RAW
Notice how small the bubbles are for Colorado and Hawaii compared to Mississippi. They’re much too small. I submitted a ticket to Density Design via GitHub and received the following reply:

I think it is an issue with the remapping function. Seems that it uses the minimum value to set the smallest bubble and set the maximum value to the “max value” inserted, remapping all the areas between these two values. We should find a better way to do it. Meanwhile, a quick trick to solve the issue – put a line with fake values like “test; 0;0;0” to set the minimum values. you’ll see that areas will be correct.

That’s a nice trick that I should have thought of since I’ve used a similar trick with other data visualizations in other tools in the past. I followed the advice and it worked like a charm, re-mapping the bubble sizes to something much more realistic and accurate.

This little trick also works well to overcome another little quirk of RAW that you can see in the image above. RAW tends to plot the bubbles of the extremes on the edges of the graph. Notice how Tennessee and Colorado are on the edge in the last image and Colorado is even cut off. Adding some fake values and then bringing the plot into Adobe Illustrator to delete the geometry generated by that data fixes that problem quickly and easily as can be seen in the first three images of this post.

Nevertheless, if you have spreadsheets of data that you’d like to visually explore beyond the standard, bar charts, pie charts, and line charts, check out RAW. I’ve also just discovered another online collaborative data analysis and graphing tool, Plotly, and I’m currently exploring this same data set with that tool.

For those interested in the original data that I used in these visualizations:

State	Obesity Rate	Physical Activity	Heart Disease Death per 100000	Cancer Death per 100000	Cancer Rate per 100000
Alabama	67.7%	42.4%	236	191.7	472.9
Alaska	64.8%	58%	151.5	176.9	451.4
Arizona	62%	52.4%	146.7	154.2	387.1
Arkansas	68.7%	45.8%	222.5	194.7	426.7
California	60.3%	58.2%	161.9	156.9	434
Colorado	55.7%	61.9%	132.8	149.5	430
Connecticut	62.3%	52.8%	155.7	163.4	509.1
Delaware	66%	48.7%	175.7	185.7	502.6
Florida	62.1%	52.9%	162.3	165.5	439.5
Georgia	64.6%	50.8%	192.6	174.8	461.4
Hawaii	56%	58.6%	134.7	140.9	453.3
Idaho	62.5%	57.3%	159.3	159.9	452.3
Illinois	64%	51.7%	181.7	178.6	479.3
Indiana	65.5%	46%	191.8	188.6	443.9
Iowa	64.7%	47.7%	173.3	171.9	493.4
Kansas	65.6%	46.8%	164.9	171.3	464.5
Kentucky	66.9%	46.9%	210.1	208.3	508.7
Louisiana	69.6%	42.1%	229.4	197.6	493.8
Maine	64.2%	56.8%	151.1	187.9	494.1
Maryland	63.8%	48.8%	182.2	171.2	437.6
Massachusetts	58.8%	56.5%	150	171.3	476.3
Michigan	65.6%	53.6%	204.2	182.9	471.2
Minnesota	63%	54.1%	119.4	167.2	475.9
Mississippi	68.9%	40%	251.1	201.4	475.7
Missouri	65.7%	49.6%	201.8	185.6	461.3
Montana	61.3%	55.4%	154.2	161	462.6
Nebraska	65%	49%	154.2	167.4	444.2
Nevada	62.5%	52.8%	197.3	174.2	459.6
New Hampshire	62.1%	56.2%	152.7	167.9	482.4
New Jersey	61.6%	53.4%	182	169.5	488.8
New Mexico	62.8%	52.3%	151.2	152.4	410.7
New York	60.6%	51.6%	199.9	163.1	493.9
North Carolina	65.8%	46.9%	174.9	179	480.1
North Dakota	66.2%	47.5%	158	157.1	462.3
Ohio	65.3%	51.7%	192.4	187.7	448.1
Oklahoma	67.8%	44.8%	235.2	191.3	466.3
Oregon	61.2%	61.2%	137.9	173.9	449.1
Pennsylvania	64.9%	49.5%	187	181.6	497.3
Rhode Island	62.9%	48.9%	167.1	178.3	497.4
South Carolina	66.1%	50.1%	189.9	183.6	428.7
South Dakota	66.1%	46.2%	155.2	171	415.3
Tennessee	65.4%	39%	217.4	195.7	477.7
Texas	65.1%	48.3%	181.1	165.9	434.7
Utah	57.8%	55.9%	143.2	133.7	391.1
Vermont	60.3%	59.3%	153.6	183.2	480.8
Virginia	63.6%	52.5%	168.5	172.4	434.7
Washington	62.3%	54.3%	151.5	170.5	483.5
West Virginia	68.3%	43.1%	211.2	198	470.9
Wisconsin	66.4%	57.5%	165.1	174.5	NR
Wyoming	63.3%	53.3%	169.8	172.6	436.5
Ztest	54	38	0	0	0
Ztest2	72	64	0	0	0