The Data Literacy Imperative - Part II: The Data Literacy Body of Knowledge
Read - The Data Literacy Imperative - Part I: Building a Data Literacy Program
In the first article of this series, I described the nature and importance of a data literacy program and presented a process for building the program. Assessment and planning are core activities in that process—assessment of literacy levels both individually and organizationally, and planning to fill knowledge gaps identified through assessment. A comprehensive Data Literacy Body of Knowledge (DLBOK) is the foundation for assessment, gap analysis, and development of learning plans. This article provides an overview of the DLBOK that will soon be published by eLearningCurve in support of their upcoming data literacy certification program.
A common misconception equates data literacy with the ability to understand and create charts and graphs, limiting the concept to literacy as it applies to data visualization. Full data literacy encompasses all of the skills to understand, find meaning, interpret, and communicate with data. A fully data-literate individual has a working knowledge of where data comes from, how it is processed, how it is organized, how it is managed, and how it is used. I need to emphasize here that there is a temptation to view body-of-knowledge topics as specialized knowledge where each area is oriented to a specific subset of data stakeholders. Don’t be caught in that trap.
Everyone who works with data needs to understand—at least to a limited degree—all of the topics. Data consumers need to have a working knowledge of what constitutes data management and why it is necessary. Similarly, data managers need to understand the activities, needs, and interests of data consumers. No person is truly data literate without sufficient knowledge to collaborate both in management and in use of data. No organization is truly data literate without collaboration. A fully data-literate organization has practical knowledge of how to manage, organize, consolidate, govern, prepare, analyze, and derive value from data. Figure 1 provides a high-level overview of the DLBOK covering topics from basic data concepts to applied analytics.
Figure 1. The Data Literacy Body of Knowledge (DLBOK)
Let’s take a look at the five major topic areas of data literacy and why each is an important part of the disciplines of data literacy.
Data and Databases
Data Fundamentals. Understanding the basics of data is the foundation for all things connected with data literacy. Distinguishing between data and information, and recognizing the many different kinds of data are important first steps. Knowing where your data comes from and how it is organized, then understanding data contents are keys to proficiency when working with data.
Database Fundamentals. A database is a collection of data that is organized to be stored, accessed, and processed electronically. A common database misconception is the tendency to equate a database management system with a database—to refer to Microsoft SQL Server, for example, as a database. SQL Server is the database management system or DBMS; a set of software used to create and manage databases. Another common misconception is the belief that a database must contain relational or structured data. A database is any organized collection of digital data whether structured, unstructured, semi-structured, or multi-structured. Many different types of databases exist. The most common types in use today include flat files, spreadsheets, relational databases, multi-dimensional databases, and NoSQL databases. Data literate individuals need to be capable of working with databases of many different kinds.
Data Knowledge and Data Governance
Managing Data Knowledge. Data knowledge is the collective understanding of data by everyone who works with or has a stake in the data. Data knowledge includes understanding of content, meaning, location, structure, quality, privacy and security requirements, and much more. No single individual has all of the knowledge about any data collection, so sharing of data knowledge is essential to achieve maximum data understanding and appropriate use of data. Some data knowledge is managed as metadata in tools such as dictionaries and glossaries, but much of it is “tribal knowledge” held only in the minds and memories of people. Capturing tribal knowledge and collecting it as shareable metadata is one of the many important roles of data cataloging. Shared knowledge is a key factor for growth of organizational data literacy.
Data Governance. Data governance addresses the policies, processes, and practices to actively manage data assets. The ultimate purpose of data governance is to ensure that data is available when needed, useable and appropriately used, secured and protected as needed, and of high quality. Governance methods vary among organizations with practices differing based on culture, industry, data management maturity, and data management priorities. Data policy management is fundamental to data governance. Policies may focus on any or all of data protection, data utility, and data value. Typical data governance activities include assessing and improving data quality, collecting and sharing rich and reliable metadata, minimizing data risks, protecting and securing sensitive data, and ensuring regulatory compliance. Everyone who works with data is a stakeholder in data governance with responsibilities to comply with policies and use data appropriately.
Data Resource Management
Data Resource Consolidation. Data resources are the collections of data that are shared and reused across organizations and among people who use data for analysis and reporting. Data resource management is the set of architectures, processes, and practices that are used to consolidate disparate data, resolve inconsistencies, support controlled data sharing, and organize data to support many different users and uses. Data science, for example, typically prefers raw and unrefined data while basic reporting works well with data that is consolidated and cleansed. Well-managed data resources are able to meet both kinds of data requirements as well as many variations between the two extremes. Knowledge of various data resource management methods helps data consumers to choose the best sources for their data needs and to understand the characteristics of data that they use.
Managing the Data Resource. Managing the data resource involves processes to develop and apply architectures, and to define and execute the processing that is used to manage data throughout its lifecycle. Well-managed data is organized to be easily found, understood, accessed, and processed to meet the needs of many different users and use cases. There is not a one-size-fits-all way to organize data that satisfies all data uses. Data that is optimized for day-to-day business analysis, for example, is not ideally suited for machine learning applications. Data organized for data science doesn’t work well for enterprise reporting. The goal of data resource management is to provide optimal data for all uses, from basic reporting to advanced data science, without unnecessary redundancy, and with as much data sharing and data reuse as is practical.
Using the Data Resource. Data is at the core of business management and operations. Data-driven businesses use data in many different ways to
inform decision-making processes,
monitor progress toward achieving goals,
understand and analyze events and outcomes,
know why things happen and how to shape what happens,
predict future outcomes, and
recommend actions and automate processes.
Business intelligence, performance management, analytics, and data science all depend on managed and shared data resources to deliver information, reduce uncertainty, and enhance organizational learning.
Data Provisioning
Finding and Evaluating Data. Data usage—whether for basic analysis and reporting, advanced data science applications, or somewhere in the middle—isn’t possible until you have the right data. You need to find data that is well-matched with analysis goals. Data seeking is not practical without first stating the goals and understanding the business, information, and data requirements. With known requirements, you can search for data. The datasets that you find should be evaluated for quality and trustworthiness, and sometimes to select the best fit among multiple datasets that are available.
Data Preparation. Data preparation may be performed by data analysts working with self-service tools, or by data engineers building data pipelines for analytics and data science processes. Preparing data for analysis is most commonly an iterative process of data exploration and data transformation. Data exploration is necessary to identify and understand the main characteristics of the data. What things (customers, products, employees, accounts, transactions, etc.) are represented by the data? What facts (names, dates, amounts, etc.) about those things are described and how are they described? Data transformation changes data to create the contents, structures, and formats that are needed for analysis. The three primary reasons for data transformation are improving data, enriching data, and formatting data.
Data Analysis
Data Analysis Techniques. Data analysis is a process of organizing data for exploration, using statistical methods to explore and find patterns, visualizing patterns to understand and communicate them, interpreting patterns and visualizations to describe meanings and insights, and acting on insights to achieve results. Descriptive analysis develops statistics to illustrate the shape of the data, describing characteristics such as the distribution of values. Inferential analysis develops conclusions, showing relationships and dependencies among variables and value trends over time.
Data Visualization. Data visualization illustrates patterns and trends in data as images—charts, graphs, and maps—that make it easy to see things that are not readily visible in large amounts of tabular or textual data. Charts and graphs are the predominant way of visualizing data. Every data analyst needs to understand the common types of charts and graphs and how they are used to communicate. Every manager, decision-maker, and data dependent worker should understand how to find the meaning in charts and graphs and how to avoid misinterpreting them. Data visualization skills are a fundamental part of data literacy.
Analysis to Action. Data analysis only creates value when we can translate analytics into action. Conventional thinking typically says that the purpose of analytics is to create insights. But insights not acted upon are pointless and the effort to produce them is wasteful. Getting from analytics to action doesn’t happen automatically or accidentally. It is a proactive process that involves human factors such as trust and buy-in. It all begins by interpreting the analysis and using it to drive conversation, communication, and collaboration.
Using the DLBOK
This article provides a high-level overview of the scope of knowledge for data literacy. In upcoming articles, I’ll discuss putting the DLBOK to work to assess data literacy, identify knowledge gaps, develop and execute learning plans, measure and monitor literacy, and grow data literacy both individually and organizationally.
Read - The Data Literacy Imperative - Part III: Data Literacy Assessment