At the Gartner Data & Analytics Summit that took place in 2018 in Grapevine, Texas, it was reiterated the importance of data literacy for taking advantage of the emergence of data analytics, artificial intelligence (AI) and machine learning (ML) technologies. Gartner expected then that by 2020, 80% of organizations will initiate deliberate competency development in the field of data literacy [1] — or how they put it — learning to ‘speak data’ as a ‘second language’.
Data literacy is typically defined as the ability to read, work with, analyze, and argue with data. Sure, these form the blocks of data literacy, though what I’m missing from this definition is the ability to understand the data, even if understanding should be the outcome of reading, and the ability to put data into the context of business problems, even if the analyzes of data could involve this later aspect too.
Understanding has several aspects: understanding the data structures available within an organization, understanding the problems with data (including quality, governance, privacy and security), respectively understanding how the data are linked to the business processes. These aspects go beyond the simple ability included in the above definition, which from my perspective doesn’t include the particularities of an organization (data structure, data quality and processes) — the business component. This is reflected in one of the problems often met in the BI/data analytics industry — the solutions developed by the various service providers don’t reflect organizations’ needs, one of the causes being the inability to understand the business on segments or holistically.
Putting data into context means being able to use the respective data in answering stringent business problems. A business problem needs to be first correctly defined and this requires a deep understanding of the business. Then one needs to identify the data that could help finding the answers to the problem, respectively of building one or more models that would allow elaborating further theories and performing further simulations. This is an ongoing process in which the models built are further enhanced, when possible, or replaced by better ones.
Probably the comparison with a second language is only partially true. One can learn a second language and argue in the respective language, though it doesn’t mean that the argumentations will be correct or constructive as long the person can’t do the same in the native language. Moreover, one can have such abilities in the native or a secondary language, but not be able do the same in what concerns the data, as different skillsets are involved. This aspect can make quite a difference in a business scenario. One must be able also to philosophize, think critically, as well to understand the forms of communication and their rules in respect to data.
To philosophize means being able to understand the causality and further relations existing within the business and think critically about them. Being able to communicate means more than being able to argue — it means being able to use effectively the communication tools — communication channels, as well the methods of representing data, information and knowledge. In extremis one might even go beyond the basic statistical tools, stepping thus in what statistical literacy is about. In fact, the difference between the two types of literacy became thinner, the difference residing in the accent put on their specific aspects.
These are the areas which probably many professionals lack. Data literacy should be the aim however this takes time and is a continuous iterative process that can take years to reach maturity. It’s important for organizations to start addressing these aspects, progress in small increments and learn from the experience accumulated.
[1] Gartner (2018) How data and analytics leaders learn to master information as a second language, by Christy Pettey [Online] Available from: https://www.gartner.com/smarterwithgartner/gartner-keynote-do-you-speak-data