Data Wrangling, Information Juggling and Contextual Meaning, Part 2

Data Wrangling, Information Juggling and Contextual Meaning, Part 2

The pragmatic definitions presented in part 1 of information as the subjectively interpreted record of personal experiences physically stored on (mostly) digital media, and data/metadata as information that has been modeled and “deconstructed” for ease of computer processing, offer an explanation as to why data wrangling or preparation can be so time-consuming for data scientists. If the external material is in the form of data (for example, from devices on the Internet of Things), the metadata may be minimal or non-existent; the data scientist must then complete or deduce the context from any available metadata or the data values themselves. In the case of external, loosely structured information such as text or images, the data scientist must interpret the context from within the content itself and prior experience.

Image title

The importance of context for data wrangling (and, indeed, analysis) cannot be over-estimated. As seen above, context may exist both within formal metadata and elsewhere. I coined the phrase

However, context is only one step on the path to data preparation and analysis. We must now move from the physical locus of information storage and manipulation to the realms of the human mind and social interactions that form the real basis of all decision making. The accompanying figure shows the complete modern meaning model (m3) that builds on the information layer pictured in part 1. It adds two further loci: mental and interpersonal.

The mental locus shows that information must first be transformed into tacit and/or explicit human knowledge before it can be acted upon. Explicit knowledge offers an understanding of the information available and an ability to document it directly in a usable form. Tacit knowledge, on the other hand, is more internal and difficult to document; the usual example of tacit knowledge is how to ride a bike. In decision making or data preparation terms, tacit knowledge equates to the insight one has into the information that goes beyond the explicit physical information that is immediately available. Gut feel is sometimes used to cover this area. The insight may come from prior digested information or internal mental heuristics informed or prompted by the information now available.

The discipline “knowledge management” is, in this sense, a misnomer. Such practice focuses almost entirely on collecting and managing the information artifacts of the physical locus, rather than directly addressing the knowledge that resides in business users’ heads. In reality, this can only be done in the third, interpersonal locus by collaborative and social means.

This topmost locus reflects the reality that we humans are, at heart, social animals and that business is a social enterprise. Decisions about what information means and what actions should be taken always occur in a social context. Meaning is, in the final reckoning, the stories we tell ourselves and others about the information we gather and the knowledge we hold. Decisions are seldom, if ever, fully rational. They are influenced by our emotional states, social conventions, and especially by our intentions, swayed by our own and others’ expectations.

The modern meaning model implies that data wrangling requires the wrangler to move up and down through these three loci in order to make sense of incoming data or information. The initial phase is to work with the data/information itself, exploring possible definitions and relationships. It is this phase that most emerging data wrangling tools address. Next is the application of personal knowledge and experience to the contextual information derived in the first phase. Finally, the social and organizational aspects of the third locus must be addressed. Collaborative function can help in some aspects of this. Most importantly, we can see that these three steps are highly iterative and deeply personal in nature. Successful data wranglers must be able to think outside the box but also bring their own experiential bubbles to make the most meaningful interpretation of the information.

In part 3 of this series, we look at this iterative and personalized exploration of information both generally and in the context of some of the tools available or emerging.

Barry Devlin

Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988....

More About Barry Devlin