As the recent animated video I created for MD Revolution pointed out, we face an epidemic of chronic disease in America. Many people aren’t aware that a large percentage of chronic disease is preventable through lifestyle changes like exercise and diet. There are direct correlations between physical activity, obesity and chronic disease rates such as heart disease and cancer.
In addition to animated videos, I believe data visualizations, both static and interactive, are a powerful tool to communicate and explain. So, I’ve been wanting to explore some of the data available related to physical activity, obesity and chronic disease in America. I’m also a big fan of scatter plots and bubble charts to show correlations and the new data visualization tool from Density Design, RAW, is well-suited for quick exploration of data.
Much has been written about the burgeoning obesity rates in the United States. On average, one third of Americans are obese or overweight. But this rate is not distributed evenly throughout the country. Some states and regions have a much higher rate than others. And, as the bubble charts below show, the states with the higher rates correspond to states with the lowest physical activity (Tennessee, Mississippi, Louisiana, Alabama, West Virginia). In the bubble plots the follow, physical activity levels (Percent of Adults who Participated in Moderate or Vigorous Physical Activities) are shown on the x-axis and obesity/overweight rates are shown on the y-axis. Also shown in each of the three plots below are bubbles corresponding to heart disease deaths per 100,000, cancer deaths per 100,000, and cancer rates per 100,000 for each state. The larger the bubble, the higher the rate of heart disease deaths, cancer deaths or cancer rate for the state.
Data Source: All data for these visualizations came from the Kaiser Family Foundation State Health Facts
Correlation of Physical Activity to Obesity and Deaths Due to Heart Disease
Correlation of Physical Activity to Obesity and Deaths Due to Cancer
Correlation of Physical Activity to Obesity and Cancer Rate
The bubble charts are a great way to get a very quick sense of relative relationships state-by-state, of obesity to physical activity and the chronic disease. For example, it’s very easy to see that states like Colorado and Hawaii are doing much better in terms of physical activity, obesity and heart disease deaths than states such as Mississippi, Tennessee, Alabama, and Louisiana. These charts also make it very evident that physical inactivity correlates well with higher obesity rates and higher number of deaths to heart disease and cancer.
The charts are also good for pointing out the outliers very quickly. For example, Wisconsin has an activity rate that is nearly as good as California but has a much higher rate of obesity/overweight.
The downside of bubble charts is that circles are a relatively poor way to compare absolute values of one state to another. As humans, our perceptual bias is to compare the circle sizes by their height, rather than the area of the circle as bubble charts ought to be interpreted according to Alberto Cairo, a leading educator in the data visualization community. As Alberto Cairo points out in his book The Functional Art, “… the human brain is not good at calculating surface areas.” [page 52] Regarding bubbles, Cairo continues, “You want readers to compare areas, but they tend to compare heights.” [page 53]
Something else that’s a little harder to see initially is the comparison of cancer rate and deaths due to cancer. For example, look at cancer deaths for Hawaii and Colorado and they’re among the lowest on the plot. But if you look at those the cancer rate for those two states, they’re not quite as exemplary. Is that because the types of cancer that residents of Hawaii and Colorado experience are less lethal than those in other states?
While this data may not come as a surprise to seasoned healthcare industry veterans, it raised some questions and issues for me. The main question that comes to my mind about this data relates to how digital health companies like fitness device makers are using this data in their marketing. If you’re FitBit or Jawbone, MisFit Wearables, or some other fitness device maker how does the physical activity level by state affect your marketing plans? Do you focus on states like Colorado and Oregon where people seem to be more inclined to engage in vigorous physical activity? Do you ignore Tennessee, Mississippi, Louisiana, Alabama, and West Virginia because they seem so disinclined to exercise? And if you do, what are the ethical implications of that decision? Should you focus more on the low physical activity states because that’s where it’s needed most?
The other aspect of this exploration for me was to explore Density Design’s free online data visualization tool, RAW. I had previously experimented with RAW using some baseball data, but I wanted to try something a bit more serious. RAW is a very easy tool to work with but there’s currently a flaw in it’s remapping algorithm that can generate some very misleading graphics. In my initial exploration of the heart disease data in RAW I noticed something seemed a little bit odd in the sizes of the bubbles in bubble charts. For example in the heart disease plot, the value of the largest bubble is 251.1 (Mississippi in the top left corner) while for the smallest it’s 119.4 (Minnesota) or an easier one to see, Hawaii at 134.7, but the disparity in bubble size for these values was grossly over-exaggerated as you can see in the following image which is the original, uncompensated, unedited plot from within RAW.
Notice how small the bubbles are for Colorado and Hawaii compared to Mississippi. They’re much too small. I submitted a ticket to Density Design via GitHub and received the following reply:
I think it is an issue with the remapping function. Seems that it uses the minimum value to set the smallest bubble and set the maximum value to the “max value” inserted, remapping all the areas between these two values. We should find a better way to do it. Meanwhile, a quick trick to solve the issue – put a line with fake values like “test; 0;0;0” to set the minimum values. you’ll see that areas will be correct.
That’s a nice trick that I should have thought of since I’ve used a similar trick with other data visualizations in other tools in the past. I followed the advice and it worked like a charm, re-mapping the bubble sizes to something much more realistic and accurate.
This little trick also works well to overcome another little quirk of RAW that you can see in the image above. RAW tends to plot the bubbles of the extremes on the edges of the graph. Notice how Tennessee and Colorado are on the edge in the last image and Colorado is even cut off. Adding some fake values and then bringing the plot into Adobe Illustrator to delete the geometry generated by that data fixes that problem quickly and easily as can be seen in the first three images of this post.
Nevertheless, if you have spreadsheets of data that you’d like to visually explore beyond the standard, bar charts, pie charts, and line charts, check out RAW. I’ve also just discovered another online collaborative data analysis and graphing tool, Plotly, and I’m currently exploring this same data set with that tool.
For those interested in the original data that I used in these visualizations:
State Obesity Rate Physical Activity Heart Disease Death per 100000 Cancer Death per 100000 Cancer Rate per 100000 Alabama 67.7% 42.4% 236 191.7 472.9 Alaska 64.8% 58% 151.5 176.9 451.4 Arizona 62% 52.4% 146.7 154.2 387.1 Arkansas 68.7% 45.8% 222.5 194.7 426.7 California 60.3% 58.2% 161.9 156.9 434 Colorado 55.7% 61.9% 132.8 149.5 430 Connecticut 62.3% 52.8% 155.7 163.4 509.1 Delaware 66% 48.7% 175.7 185.7 502.6 Florida 62.1% 52.9% 162.3 165.5 439.5 Georgia 64.6% 50.8% 192.6 174.8 461.4 Hawaii 56% 58.6% 134.7 140.9 453.3 Idaho 62.5% 57.3% 159.3 159.9 452.3 Illinois 64% 51.7% 181.7 178.6 479.3 Indiana 65.5% 46% 191.8 188.6 443.9 Iowa 64.7% 47.7% 173.3 171.9 493.4 Kansas 65.6% 46.8% 164.9 171.3 464.5 Kentucky 66.9% 46.9% 210.1 208.3 508.7 Louisiana 69.6% 42.1% 229.4 197.6 493.8 Maine 64.2% 56.8% 151.1 187.9 494.1 Maryland 63.8% 48.8% 182.2 171.2 437.6 Massachusetts 58.8% 56.5% 150 171.3 476.3 Michigan 65.6% 53.6% 204.2 182.9 471.2 Minnesota 63% 54.1% 119.4 167.2 475.9 Mississippi 68.9% 40% 251.1 201.4 475.7 Missouri 65.7% 49.6% 201.8 185.6 461.3 Montana 61.3% 55.4% 154.2 161 462.6 Nebraska 65% 49% 154.2 167.4 444.2 Nevada 62.5% 52.8% 197.3 174.2 459.6 New Hampshire 62.1% 56.2% 152.7 167.9 482.4 New Jersey 61.6% 53.4% 182 169.5 488.8 New Mexico 62.8% 52.3% 151.2 152.4 410.7 New York 60.6% 51.6% 199.9 163.1 493.9 North Carolina 65.8% 46.9% 174.9 179 480.1 North Dakota 66.2% 47.5% 158 157.1 462.3 Ohio 65.3% 51.7% 192.4 187.7 448.1 Oklahoma 67.8% 44.8% 235.2 191.3 466.3 Oregon 61.2% 61.2% 137.9 173.9 449.1 Pennsylvania 64.9% 49.5% 187 181.6 497.3 Rhode Island 62.9% 48.9% 167.1 178.3 497.4 South Carolina 66.1% 50.1% 189.9 183.6 428.7 South Dakota 66.1% 46.2% 155.2 171 415.3 Tennessee 65.4% 39% 217.4 195.7 477.7 Texas 65.1% 48.3% 181.1 165.9 434.7 Utah 57.8% 55.9% 143.2 133.7 391.1 Vermont 60.3% 59.3% 153.6 183.2 480.8 Virginia 63.6% 52.5% 168.5 172.4 434.7 Washington 62.3% 54.3% 151.5 170.5 483.5 West Virginia 68.3% 43.1% 211.2 198 470.9 Wisconsin 66.4% 57.5% 165.1 174.5 NR Wyoming 63.3% 53.3% 169.8 172.6 436.5 Ztest 54 38 0 0 0 Ztest2 72 64 0 0 0