The Data Modeling Conundrum: Do You Create a Model from the Real or the Ideal?
I’ve cribbed the title of this piece from Sarah Polley’s 2012 film Stories We Tell. Stories We Tell has a pretty neat conceit: a filmmaker’s take on storytelling, it’s a project in which Polley enlists the efforts of father and siblings alike to share stories about her mother, Diane, who died when she was just 11. Stories We Tell seems to unfold as a kind of enquiry into the Rashomon effect: as Polley encourages the members of her family to tell stories about Diane, at times prompting them from her perch behind the camera, we come to see that each of these persons knows a different Diane. Nor is that all. At the film’s conclusion, Polley herself hasn’t ended up quite where she thought she would – not, at least, when she first conceived her project. It isn’t just that she now has a larger or more nuanced picture of Diane – e.g., a warts-and-all composite view that’s somehow cohered for her in the process of editing her film – but, instead, that what Polley has discovered about her mother has profoundly altered her understanding of her own life.
To avoid the risk of spoiling this film for would-be watchers, I won’t say more.
I cribbed my title from Stories We Tell because (a) I think it’s just a criminally over-looked film and because (b) I’ve seen a similar phenomenon at work in the way organizations are being forced to – there’s no other way to put this – confront themselves in their use of analytics. As they become more data driven, organizations are finding that what they thought they knew – about themselves, about their customers and competitors, about their business worlds – isn’t actually true.
Thanks to analytics, then, organizations are discovering that they don’t really know themselves. More precisely, they’re discovering that certain kinds of biases – let’s go ahead and call them “epistemic preconceptions” – aren’t just hiding in plain sight, but are, in some cases, hard to let go of. On the plus side, self-discovery of this kind is a prerequisite for self-improvement. But what happens when an organization looks into the analytical mirror and refuses to recognize itself?
Beware Business Folklore
At last year’s Pacific Northwest BI and Analytics Summit, Jill Dyché – an industry thought-leader-slash-force-of-nature who needs no introduction – explored the problem of what she called “folklore” in decision-making. Folklore, Dyché explained, is knowledge about the world that everybody in an organization just assumes is true. What’s essential about folklore is that it’s accepted as true irrespective of empirical corroboration. After all, why would you need to corroborate what you already know to be true? “There’s so much folklore in the [canine] rescue community, so much mythology,” said Dyché, a prominent advocate for canine rescue efforts. She cited the common belief that owner-initiated surrenders of dogs tend to spike in the summer months – and July, especially – because of the prevalence of fireworks and thunderstorms.
According to rescue folklore, some dogs, terrified of loud noises, become difficult to manage, flee their homes, or otherwise prove disruptive. The truth is at once more mundane and (considerably) more troubling: low-income families, unable to afford the cost of boarding their dogs, will surrender them to municipal shelters before they leave for vacation. There’s a kind of awful logic to this: if the dog hasn’t been adopted out – or euthanized – by the time they return, there’s a chance they can still reclaim them. This activity is consistent throughout the summer months, but tends to spike in July. Hence the folklore-ish correlation with fireworks.
Non-profits in the rescue community, like for-profit businesses of all kinds, are amassing ever larger volumes of data. And, like their for-profit counterparts, they’re beginning to mine this data in order to better understand themselves and their operations in the context of the world they’re actually operating in. “We’re finally [developing the ability to see] what happens to shelter dogs not just based on the a priori assumptions that are so endemic to the rescue community but based on newly discovered data,” she explained. “We are going to partner with every animal shelter in the U.S. and get their data and roll it up and finally have authoritative information about shelters and rescues and how crowded everybody is and what types of dogs make it out.”
The problem was that shelters didn’t know – and couldn’t imagine – that much of what they “knew” wasn’t actually true. Even though shelter folklore didn’t have any basis in everyday reality, it was nonetheless “true” for people in the rescue community … until it wasn’t.
Folklore in this sense isn’t a new idea. In the book for which she’s probably best known, Jane Jacobs decried the role that folklore played in the city planning orthodoxy of the first half of the 20th Century. “These orthodox ideas are part of our folklore,” she writes. “They harm us because we take them for granted.” Jacobs’ critique of folklore in the context of city planning is trenchant, to the point, and – in a broad sense – applicable to virtually all dimensions of human planning and doing: “[I]t is futile to plan a city’s appearance, or speculate on how to endow it with a pleasing appearance of order, without knowing what sort of innate, functioning order it has. To seek for the look of things as a primary purpose or as the main drama is apt to make nothing but trouble.”
Dysfunctional Data Modeling
This tendency to believe in folklore is by no means confined to non-profits. It’s endemic in and across all sectors and verticals. It affects companies of all sizes. It’s expressed not only in how a company understands itself, its activity, its mission, etc., but in how it represents itself to itself, too.
With this in mind, think about how folklore (or something like it) operates in a practice such as data modeling, where a top-down emphasis on formal order, consistency, and what political scientist and anthropologist James C. Scott has called “legibility” tends to predominate. Think of data modeling as the process by which a business becomes lucid, legible, to itself. The entities, rules, relationships, etc. instantiated in a conceptual model are representations of the business as it sees and understands itself – representations that, moreover, the business uses to manage and control itself. The problem with legibility is two-fold, however; in the first place, the business might not adequately understand or apprehend itself, with the result that the models it produces incorporate – how best to put this? – mistakes. In the second place, the process of modeling the business has the potential to (and usually does) change something about the business, not just as a result of the model- or data-driven management of business activities, but even prior to that. As with the observer effect in physics, the process of making some-thing legible changes that thing.
In still another way, data modeling, has its own folklore – namely, in the form of orthodoxy and shibboleths. It can begin innocuously enough – e.g., during the conceptual modeling phase – with the way the organization wants to understand itself. It persists in the logical data modeling phase when the organization instantiates this understanding in core data structures and analytics systems. Almost every data modeler can cite at least one experience in which the business has fundamentally misunderstood (and/or misrepresented) its structure or operations.
At the conceptual level, people will disagree about the significance of key concepts and terms, as well as the relationships that obtain between and among them. That’s okay. The premise of conceptual modeling is to get this stuff out in the open. The basic bare fact of disagreement, while jarring or discomfiting, has a positive function, too, because it gives stakeholders a way to address misunderstandings or inconsistencies and to forge a consensus view of the business and its operations. That’s the ideal. The reality tends to be much messier, however.
A logical model translates the conceptual model into facts, dimensions, attributes, and hierarchies, as well as into the business rules that determine the relationships between facts and dimensions. This is where the trouble begins. In many cases, a logical model is designed before modelers have had a chance to explore, discover, and profile the underlying data. It is, then, a projection of how the business understands itself: an ideal representation. In reality, there is no necessary correspondence between business facts and relationships as they’re represented in a conceptual model and the business facts and relationships that are derived from data.
According to a data architect with a prominent business intelligence (BI) and enterprise data management consultancy, it isn’t uncommon for business people to cling to the validity of cherished ideal representations even when they’re confronted with empirical evidence to the contrary. “It is literally the case that you will show [the business customer] something and they say, ‘No, that cannot be. I see what you’re showing me, but it cannot be.’
“We just had a meeting where we had conversations about the future design [of their data architecture]. We showed them a model of their business as we reconstructed it [by profiling data and processes]. They said, ‘That’s not the way it is.’ I said, ‘Yes, it is,’” this person says.
This data architect cites a litany of other common types of orthodoxy, too, such as an insistence on the completeness and/or consistency of data extracts from operational systems – or (at a higher level) a reification of the canonical data structures that are supposed to represent key terms and concepts. Even when people are presented with evidence that the underlying data doesn’t actually correspond with design orthodoxy or business folklore, they’re still loath to change their preconceptions. “I’ve had people assure me, ‘We never have a financial transaction that doesn’t have a vendor number.’ Then you go to the actual data and you find numerous examples of this [phenomenon],” this person says. Their recommendation is to eschew top-down models or formalizations of how the business should be and to begin with the data itself – by exploring, profiling, and analyzing it. By, in effect, reverse-engineering it from the bottom up.
If the business doesn’t take the time to do this at the beginning, when it’s defining its core data structures and analytics systems, it must do so after the fact – i.e., when data modelers and analysts are tasked with “fixing” the mismatch between ideal representation and reality.
This is eerily similar to the problem that Jacobs identified with orthodox city planning.
As a counter to top-down orthodoxy, she championed a similar Rx: the bottom-up exploration and discovery of cities (in our analogy, organizations) as they actually are: how they happen, how they live, how (to put it philosophically) they have their being. “The way to get at what goes on in the seemingly mysterious and perverse behavior of cities is, I think, to look closely, and with as little previous expectation as is possible, at the most ordinary scenes and events, and attempt to see what they mean and whether any threads of principle emerge among them,” Jacobs writes.
(Trigger Alert: This is a wee bit philosophical!)
Human beings have difficulty with evidence-based analysis: when we’re confronted with information that contravenes or contradicts our beliefs, we’re apt to dismiss it. This is true of our average-everyday beliefs – i.e., things we think we know about the world – and it’s especially true of beliefs that are, in some sense, core to our identities. It’s as if our capacity to critically analyze and interpret evidence did not evolve in the same way, or to the same degree, as our capacity to identify and distinguish discrete things (sky), abstract concepts (color) and events (the reddening of the sky) in the physical world – or to synthesize information by imputing relationships between concepts (e.g., if the sky is red at dusk, tomorrow has a good chance of being a clear day).
This isn’t actually surprising. As Ian Hacking, Lorraine Daston, Barbara Shapiro, and others have noted, our modern concept of “evidence” is, at most, a mere five centuries old. And as Mary Poovey has argued, the “fact,” in its modern sense, is of a similarly recent vintage. We have trouble reconciling belief with fact – with evidence – because we’ve only been dealing with facts and evidence for a few hundred years. We’re still learning – and we’ve a long way still to go.
One lesson we’ve learned is that we tend to interpret our beliefs, needs, and preferences, in addition to our prior experience, into our understanding of reality. This is as true in business as it is in everyday life. Donald Farmer, a principle with information management consultancy TreeHive Strategies, calls this quasi-a priori understanding of a business and its world an “ideology.”
All businesses, and all business people, have ideologies, says Farmer. Think of ideology as a kind of automatic perspective – an organizing framework that’s compounded of knowledge, intuition, and, yes, feeling – that helps us quickly process and make sense of experience.
It’s is a way of seeing – and knowing – that is prior to sense-perception and cognition: a seeing that structures or frames things in a certain way; an Ur perspective that is prior to perspective.
“You have an ideology of business that influences how you understand yourself. It is an organizing framework,” Farmer explained to attendees at the Pacific Northwest BI and Analytics Summit.
Ideologies are at once explicit – e.g., a start-up company might understand and interpret its world on the basis of The Innovator’s Dilemma, among other influences – and also operate implicitly, too. For most of us, ideology is, on balance, a net positive: it makes the world at once intelligible and navigable; it gives us a ready-made framework for interpreting – for acting in and responding to – the stuff of experience. In this way, ideology has the power to determine what is normal or abnormal, appropriate or inappropriate, possible or impossible – what is reasonable, as distinct to unreasonable – in or for our world. Implicate in this is something else, too: the role ideology plays in determining what is thinkable or unthinkable in our world. This last has to do with what I call its “involuntary” dimension, or the way in which an ideology imposes limits on our thinking. For example, reading philosophy in college introduced me to the radical difference between the Greek and the Roman experiences of what we today call “nature.” The Latin word natura is, nominally, a translation of the Greek word phusis (or physis); the Roman understanding of natura is nothing like the Greek experience of phusis, however. Without getting into particulars, it’s the difference between an understanding that’s grounded in a static representation and reification of the natural world (natura) and a sense of or for the dynamism and instability of the world (phusis). The Roman ideology simply could not think the phusis of the Greek experience.
The upshot is that we cannot think certain kinds of thoughts when we’re “inside” an ideology. Nor can we think ourselves “into” another ideology. To understand why this is the case, think of an ideology (in the particular) as analogous to a predictive model. Not only are both quasi-a priori representations of the world, but an ideology, like a predictive model, is constrained by the parameters that determine the makeup
of this world. The usefulness of an ideology, like that of a predictive model, tends to diminish over time as real-world conditions diverge from the ways in which they’re instantiated in the model and the world that the model represents. So, for example, a retailer which blithely assumes that people will always need to come to a physical location to shop for goods could be blindsided by the effects of social, economic, and technological change. Ditto for a terror-prevention strategy that cannot conceive of a 9/11-type scenario.
Ideology as Farmer describes it isn’t quite the same thing as bias. Biases are incorporated into (and operate in) ideology, of course. But ideology is more the coherence of all reflective and pre-reflective modes of understanding, encompassing everything from conscious values, beliefs, and knowledges to non-conscious influences such as biases, folklore, and the assumptions that are implicit in systems of orthodoxy/orthopraxy. Ideology operates pre-reflectively in that it always already determines both the content and the character of the world in which we live.
One challenge posed by ideology is that of reconciling existing knowledge or belief with new experience – new “evidence,” to put it in other terms. Instead of framing the present and/or predicting the future, it’s a problem of knowing when what-is has become what-was. Ironically, human beings tend to have as much trouble with this as do predictive models.
“The representations [an ideology] creates are based on a presumption of what the business should be. Because this is a presumption, a belief, it’s based on what is already known. It can’t represent, it can’t anticipate, phenomena that doesn’t yet exist,” said Farmer.
Farmer, too, could easily have been channeling Jacobs. Elsewhere in The Life and Death of Great American Cities, she writes of the power of folklore, orthodoxy, and other types of epistemic preconceptions to obscure – to hide, to elide, to oversimplify – the incredible complexity of city life. Not just the “life” of the city itself, but the lives of the people who inhabit it. Jacobs found that the requirements, priorities, preferences, and desires of a city’s inhabitants not only did not have to but (in certain respects) clearly did not align with the requirements, priorities, preferences, and desires of orthodox design ideology. In many if not most cases, she found, the ostensible “disorder” of city life often masked a highly adapted organic order: “There is a quality even meaner than outright ugliness or disorder, and this meaner quality is the dishonest mask of pretended order, achieved by ignoring or suppressing the real order that is struggling to exist and be served.”
This has salience not just for what we do with analytics but for how we build analytics. At a basic level, an “analytic” is always designed: i.e., it’s the product of the critical, interpretive, and speculative work of one or more human beings. “Analytics” is actually an umbrella term for a multi-step interdisciplinary process that involves (1) the analysis and decomposition of a problem; (2) the identification of that problem's core or constitutive elements; (3) the enumeration of these elements, along with an interpretation of how they relate to one another and to the problem itself; and (4) the instantiation of this interpretation, usually in software and usually in the form of a model. There’s a sense, then, in which the task of using analytics to drive business transformation can be reduced to the problem of modeling (an) experience and representing (a) reality.
In other words, the better our ability to apprehend and model experience, the better – the richer, the more true-to-life – the reality we’re able to represent in the analytics we produce.
Postscript and Conclusions, of a Sort
This gets at something else, too. Analytics aren’t magical. They aren’t miraculous, infallible, or oracular. And because analytical technologies are created and maintained by human beings, they aren’t in any real sense “unbiased” or “objective,” either. Biases and other kinds of epistemic preconceptions are baked, as it were, into the analytics we produce. Always have been, always will be. (This recent paper provides an absolutely fascinating demonstration of this.) And the challenge of ferreting these out isn’t akin to a game of epistemological Whac-a-mole, such that we have only to identify a (finite) set of epistemic preconceptions, correlate them with (e.g.) the epistemic structures or human drives that are their causal antecedents, and, voila, we’ve modeled and represented our reality. We can all go home now. Our work here is done. Komplett. Fin.
As Farmer’s work with ideology shows, perspective itself comprises a kind of Ur bias that we aren’t even aware of and which we can’t control for. In the 1950s, philosopher of science Norwood Russell Hanson introduced the concept of “theory-ladenness" to describe this problem: to paraphrase Hanson, the practice of scientific observation is itself inescapably “theory-laden,” with the result that what he (later) called a “thematic framework” is always already operating in the very experience of seeing or observing. (Hanson, like his contemporaries, was concerned with the problem of obtaining objective access to observational phenomena, although he stopped somewhat short of questioning the assumption that the human mind is congruent with or sufficient to the apprehension of “objective” reality. Friedrich Nietzsche, on the other hand, did just this – 70 years before Hanson.) We can’t escape perspectival framing. Or, rather, we can – but only to the degree that we take up residence in another, (hopefully) richer perspectival frame.
I’ll conclude where I began. Thanks to technologies such as predictive analytics and machine learning (and, especially, their application in advanced methods such as deep learning and AI), more and more organizations are discovering that they don’t really know themselves. I’ve described a few real-life cases in which an organization discovered that what it thought it knew about itself and its business world wasn’t actually true. I’ve also described a couple of real-world cases in which stakeholders in an organization refused to confront their use of analytics. Presented with the evidence of analytics, they refused to recognize themselves.
What are we to make of this?
I keep going back to Sarah Polley and Stories We Tell. It’s as if we (like Polley) set out to tell a certain kind of story with a certain foreseeable outcome only to discover that we not only didn’t know and couldn’t have anticipated the whole story, but that the whole story, once known, would fundamentally change who and what we are: how we understand ourselves. That the whole story, so to speak, would give us an entirely new (richer) perspective. And that, however rich it might be, maybe this isn’t a perspective we necessarily want to have. That didn’t deter Polley, however.
Will it deter us? Should it deter us? This is the question of our time as I see it, and it shouldn’t surprise you that I have some thoughts on this. But I’d welcome your feedback, too.
So what think you?
The debate over the validity or invalidity of human-generated climate change is a more accessible example of what I’m talking about: a person who understands herself as a movement conservative simply cannot accept that climate change is real and/or that human activity is its cause. Similarly, a person who understands herself as an ecosocialist cannot accept that this isn’t the case. Opposition to or support for climate change (and its link to human activity) is, then, a thematic element of both movement conservatism and ecosocialism. But the practical effect of this thematic element is to make certain possibilities of thinking inaccessible. This inaccessibility arises out of the (voluntary and involuntary) implications of that which, ideologically, is known without question to be true.
“Pre-reflective” is a philosophical term that means prior to reflection or to conscious, intentional thought.
Our biases aren’t just the products of physiological and/or evolutionary processes. Thought itself produces biases in the form (again) of what is thinkable, imaginable, or possible – as against its opposite.
James Scott has a great discussion of this in relation to the planned city of Brasilia, capital of Brazil. See Scott, James C. (1998) Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. New Haven: Yale University Press. “Virtually all the needs of Brasília’s future residents were reflected in the plan. It is just that these needs were the same abstract, schematic needs that produced the formulas for Le Corbusier’s plans. Although it was surely a rational, healthy, rather egalitarian, state-created city, its plans made not the slightest concession to the desires, history, and practices of its residents. In some important respects, Brasília is to São Paulo or Rio as scientific forestry is to the unplanned forest. Both plans are … planned simplifications devised to create an efficient order that can be monitored and directed from above. Both plans, as we shall see, miscarry in comparable respects. Finally, both plans change the city and the woods to conform to the simple grid of the planner.” (p. 125)
See, e.g., Hanson, Norwood Russell (1958) Patterns of Discovery: An Enquiry into the Conceptual Foundations of Science. New York: Cambridge University Press. “There is a sense, then, in which seeing is a ‘theory-laden’ undertaking. Observation of x is shaped by prior knowledge of x.” (p. 19)
See, especially, On the Genealogy of Morals (1887), Third Essay, Section 12, where Nietzsche skewers the rationalist-Kantian position (characterizing it as a will to the “intelligible character of all things”) and gives a quasi-perspectivist account of experience and knowing: ‘There is only a perspective seeing, only a perspective “knowing”; and the more affects we allow to speak about one thing, the more eyes, different eyes, we can use to observe one thing, the more complete will our “concept” of this thing, our “objectivity,” be.’ [This is excerpted from Walter Kaufmann’s 1967 edition. There are newer translations, but Kaufmann’s is still my favorite.] In Nietzsche’s account, “objectivity” doesn’t correspond to or with anything real, so to speak. Our sense of or for what is objective is more or less nuanced, more or less rich, more or less subtle depending on the contributions of the many, varied, and sometimes contradictory perspectives that determine its shape, character, and content. These perspectives are themselves always already constituted by many, varied, and sometimes contradictory “lineages” (theory- and practice-laden inheritances), too. Nietzsche is less a perspectivist than a philosophical genealogist.