Thursday, 7 August 2014

Resources on Data Visualization (I)




Yesterday during the facilitation of the inaugural workshop for IDA's Data Science MOOC, I mentioned about Data Visualization and how important it is with regards to the work of the data scientist. So in this blog post, I will put in more of my thoughts on it and also some of the very popular resources on Data Visualization.


What is my definition of data visualization? My definition of it is using visual aids such as graphs and maps to make important actionable insights easily comprehended by target audience. Given this day and age, target audience have very short attention span and to be able to squeeze so much information within such a short span gets more challenging as we go by. Thus it is important that data scientist also understand the importance of visualization and also the know the pros and cons of each visual aids.

Many a times, data visualization are sold as visual analytics but both of them should be totally different in my opinion. Data visualization is much more closer to descriptive statistics (where things have happened) whereas if you look at the definition of just 'analytics' alone, besides descriptive it should also contain predictive component as well. So visual analytics should be using visual methods to make certain predictions (at least this is my opinion for it to be call "Visual Analytics"). While doing research for this blog post, I came across the definition of Visual Analytics in Wikipedia but alas, it would take a few hours for me to digest its definition. Hopefully someone can go through it and come up with a simpler definition.
  
And also there are some cognitive blind spots when we use visual aids to present information as compared to predictive statistical models. If I plot the outbreak of a disease on a map over time, and it seems to move from east to the centre, our brain would start to link the points up, extrapolate , coming to a conclusion that the disease will continue to move through the centre of the map towards the west side. Really? Do you have data to support that the disease outbreak may not 'suddenly' appear at the west side of the map and move towards the centre? This is not true in statistical model because through the statistical models, we are very sure the independent variables would predict the outcome with a higher probability based on training data as compared to the simple 'extrapolation' on visual aids.

Now I am not saying that Data Visualization is not important. It is definitely important. After generating so much insights from data, the very last step is to share the insights so buy-in can be gained and actions can be taken to generate value from Analytics done. If at this stage, there is no planning of the visual aids to show the actionable insights, it would be like a striker that has the ball with him in front of a goalkeeper-less goal post and fail to score. ALL efforts wasted.

So what are the resource available to learn more about  visualization? Well, there are two Masters on Visualization, namely Edward Tufte and Stephen Few.

Their websites are as follows:

Their books:
Stephen Few

Edward Tufte

Have fun learning about Data Visualization in your Data Science Journey!

No comments:

Post a Comment