Data is Data. It is not Oil or Gold or Labour or anything else!
This is also published on LinkedIn and Medium as well
Data is Data. It is not Oil or Gold or Labour or anything else!
Words, in general, are a creative symbolic linguistic invention through which people invoke concepts and meanings that are flexible enough to enable we Homo sapiens to shortcut detailed explanations. A dog = mammal, furry, four legs, barks, teeth etc. However, words; because they are a shortcut, often lack context and relationship that add “meaning”. Words are “data” which requires the addition of meaning derived from context to “inform” the listener - to become “inform-ation.”
Love, for example, can mean, or be interpreted to mean, many propositions depending on context and relationship. The 2019 update to the New Oxford Dictionary brings in the words agender and intersexual to help define better and enable more nuanced conversations about sexuality and gender identity, as society has words without the specific context and better words help avoid conflict and confrontation.
Words allow us to explore and debate wider and deeper concepts but also their misunderstanding leads to fights, war, turmoil, anger or innovation, problem solving and creativity. Sometime we don’t have a word for something and therefore we need to spend a lot of time using metaphors and add context and relationships. The rhetorical question is say, how did we describe competition before the word competition. In 1996 Nicholas Negroponte wrote a book called “Being Digital”. In the book he spends an entire chapter to explain Broadband, another to explain Social Media, another to explain what is e-commerce. It took a lot of time for everyone.
Humans process words depending upon a rich tapestry of context, relationship, mood, how it was said, when it was said and by whom. We interrupt combination of words with our own bias so they make sense to us individually. This abstract view of sense making comes from and is baked into our experience and the order and weight we give to it such experience. Shakespeare's plays are all written using the same 26 alphabet letters. Knowing the symbols and even the words does not allow easy access to meaning and at best reading is a starting point in trying to determine the message.
Economics, biology, physics, psychology, maths have all created their own language and words to explain the order of things. This has allowed us to explain better and create value, wealth and prosperity. However, in economics for example the order given by words we have, are based on the general concepts of scarcity and abundance with the equations of supply and demand. Therefore the words have certain limitations and assumption, which also means that these established words may not work well to describe new models, theories or markets. The words and descriptions breakdown.
The reason for this long introduction and context is that we lack words to describe the new activities, models and functions in a data driven digital world. Our current wordset may constrain us and slow us down because of some of the ambiguities inherent in the words we currently use. For example, consider the word identity as in ‘name’ vs identity as in ‘provider’ and identity as used to mean ‘access’ - context and relationship matter. Or consider the internet descriptor language we have. We say there are sites, domains and locations that we visit and browse, so framing the Internet as real estate, which is something we relate to. When we speak of pages that we author, publish and syndicate, we are framing the Web as a publishing system. When we speak of content comprised of packets that we move, upload, download and store with addresses, we're framing the underlying infrastructure as freight forwarding between storage facilities. Analogies such as these inevitably have their limitations.
The word data is a particular problem as it is a word that we want to constrain by context and relationships but ‘aata’ does not comply to the same boundaries, field, domain, graph, market or constraints. As much as we would like to explain data and its functions with a metaphor or analogy – it is unique. Data is closer to the discovery of a new core element for the periodic table with new properties, a new energy concept for quantum that allows us to understand something we could not explain, a new model for dark matter.
Every model we use to explain data fails. Data is not oil, we don’t mine or refine it. Data is not gold; there is more data than there are atoms in the universe. Data is not labour, it does not pass with time. Generically data is not a commodity. Commodities, at least of the sort that get bought and sold in stores and in commodities markets, are both rivalrous and excludable by nature. Data is, by its nature, non-rivalrous and non-excludable. This simple fact are that you cannot declare ownership of data (though many people try), you cannot control it, you lose nothing when you copy it. It is why data is data.
By design all metaphors are wrong. For example, time is not money, but we use money to frame our understanding of time. That's why we save it, spend it, waste it, invest it and put it aside. Likewise life is not travel, yet birth is arrival, death is departure, choices are crossroads, we get stuck in a rut, lost in the woods, get back on track, and so on.
Those metaphorical frames make full sense to us as humans because our experiences of time and life are very much ones of valuable commodities (time as money) and movement (life as travel). As explored in the “Mind is Flat” by Nick Chater our brains as built to create and make sense, we need the metaphor to make the jump to make sense, but then words to move us forward.
But the words and metaphors we use for the Internet, Web and Data insult all, and that's a problem. Our digital world is too radically new and different to be fully conceptualised, understood, explained and honored by the metaphors we apply to them, limited by words that have the wrong meaning, therefore it is time to build a new wordset!
We talk about data as a commodity, just as we talk about time as one. But while our experience of time is of a finite non-thing, our experience of data is something like the Sorcerer's Apprentice's experience of magic: it gets way out of control.
Joyce Searls points out that our experience in the Web is one of “no gravity” (because the Internet isn't a place, and we are incorporeal chimeras to each other there: damn fine ghosts or holograms, but not physically real) and of no distance. She also thinks we'll adapt to those conditions, but it's still too early to generalize with full confidence from our experience so far.
Creating value from data requires an entirely new wordset, just like the breakdown when we talk about the concept that “data is oil”, data storage, data consent, data analysis as a few example functions, when put into context and relationship they all fall apart. As an example Data Storage is not the same as it was when we had an economic model for the storage of documents in 1980. In 2018 digital data storage has a relationship and content to security, access, rights, liability, control, sharing, conflicting national compliance laws and privacy changes. However we continue to use old economic framing, thinking and words to describe these new data functions that then fail.
There is a wider point with data storage, is storing data useful or useless? Is data as useful or as useless as a bad memory that stops a person from falling in love, taking a promotion or starting their own business? These add depth and shows how our words fall apart all too quickly.
As Doc Searls puts it “We are now digital as well as physical beings, and our habitat as digital beings is very new, strange and has no history so we are forming new human experiences, even though we live in a digital world almost as much as we live in the natural world.
From the 10th to the 21st century our thinking built a word set that was based on an economic model that existed in the here and now; physical limited by space and time. The relationships and formula could be discovered, explained and modelled. In our new data world these word sets that described the constrained physical are holding us back as we are having to explain more and more context and relationship. The objective value of a word is to create a shortcut and using the wrong words that have the wrong meaning means we waste more time explaining than we are doing creating. Our new data world needs new words to describe the new functions, as the world of data is not constrained by the vocabulary we have developed to understand the relationship between time and space (we will probably discover that vocabulary to be limited too!). Our new world is messy, interwound, interconnected, interdependent, driven, causal, relationships, immediacy and feedback driven.
Our history has provided context and relevance which has become baked into laws based on experience over a very long time, TRUST in a data world has new meanings where new dependent relationships and contexts change the understanding. The MIT study that led to the “Privacy Paradox” is a good example that the word PRIVACY is broken when we talk about data. Do people want Privacy to avoid exploitation and danger? To mitigate their sense of vulnerability?
Are people willing to trade Privacy in order for the control to manipulate their data and live out their life fantasies in a digital world? Are people so addicted to the control they have in shaping their digital lives, that a privacy, security breach or fraud brings them back to their analogue life. A person owns (maybe) their body, mind and thoughts, but do they own their data?
Let’s explore one idea that needs a word, as a starter. DATA OWNERSHIP. If we had a better word that describes the context and relationship we could save pages of debate; Can you actually own data? It would be good if the answer was “yes”, however the reality is ‘no’, but you can own the machine and software that stores data and different players do have different rights to the data.
In fact, the non-rivalrous nature of data plays havoc with modern notions of ownership. The Romans had a much subtler understanding of the nuances of ‘ownership,’ when they created separate legal rights and processes for ‘usus’, ‘fructus’ and ‘abusus’.
Usus (use) was the right to use or enjoy a thing directly, without altering it. For example, to walk on a piece of land or eat a fig off a fig tree. Fructus (fruit, in a figurative sense) was the right to derive profit from a thing possessed: for instance, by selling crops (but not the land on which they were produced), taxing for entry, etc. And abusus: (literally abuse) was the right to alienate the thing possessed, either by consuming or destroying it or by transferring it to someone else (e.g. sale, exchange, gift). These notions of usus, fructus and abusus imoly, when applied to territory, not ‘private property,’ provides a notion that different rights apply inside and outside clearly delineated boundaries.
When the 18th century constitutionalist William Blackstone observed that an Englishman’s home was his castle, he wasn’t talking about absolute rights of private property. Rather, he was talking about Englishmen defending a piece of territory where they were safe. Englishmen didn’t just have their castles, they also shared the fruits and benefits of commons, public rights of way and so on. Each of these different territories had different rules and rights associated with them. In contrast to this subtle ecosystem of rights and responsibilities, Blackstone characterised modern notions of private property as “the sole and despotic dominion, which one man claims and exercises over the external things of the world, in total exclusion of the right to any other individual in the universe.”
In a new data age we need to establish a new concept in digital and data that builds appropriate boundaries each with their own rules, rights and responsibilities. Critically, individuals’ rights to ‘usus’, ‘fructus’ as well as ‘abusus’ in relation to their own data need to be clearly delineated. This is very different to current debates about ‘control’, virtually all of which relate to individuals trying to control what other parties do with their data rather having the right and ability to use their own data for their own purposes.
As a footnote DATA itself as a word is also a problem, and by extension so is the entire emerging world of Cryptocurrency. “Data” has many different definitions (a quick search gives well over 50 to play with) and individual “labels” and “biases”. However we can be specifically clear about what we are talking about with data types; as long as a verb is included. Flat, big, meta, real-time, old, static, new, current, statistical, empirical, computer, binary, linked etc However, not all “data” is created equal and as such data is contextual to where value may lie, which can be either good for humanity or good for the value of one of the players who are able to exploit it. As yet we have not been able to add context to types of data such as rights, ownership, providence, trust, privacy, security, faithfulness, correctness.
Is there a similar problem elsewhere that provides precedent ?
Whilst risk, beauty or compassion are useful thought experiments they lack the direct linkage to value creation which data provides. Data is Data!
From Wikipedia: Risk is the possibility of losing something of value. Values (such as physical health, social status, emotional well-being, or financial wealth) can be gained or lost when taking risk resulting from a given action or inaction, foreseen or unforeseen (planned or not planned). Risk can also be defined as the intentional interaction with uncertainty. Uncertainty is a potential, unpredictable, and uncontrollable outcome; risk is a consequence of action taken in spite of uncertainty.
Possible that “Risk” as a conceptual framework has some similar properties which could help us. Risk cannot be owned or held physically (it can be accounted for), it cannot be controlled or touched, it changes continually, it has no value in of itself, it cannot be weighed or measured in the real world, it can be passed, sold and assigned, but cannot be “copied”; we can only describe outcomes and assign a risk measure. Risk is totally subjective and we all make different judgments make about the severity and probability of any and all risks. All human endeavour carries risk, but some can be defined as being much riskier than others depending on the lens.
So What …..
Creating words will not happen, however there are companies who are solving and delivering solutions in our new data world who have ideas of privacy, consent, rights, ownership, sharing, storage as core functions. Like Hoover became a generic name for a function, Google for search, Text for messaging and many others. Should we (the digital community) start to adopt names of companies, which have a pure single function that delivers context and relationship in this new data world to allow us to describe functions in a clear and crisp way.
Would such adoption get us to value, models, growth and fun a whole lot quicker; avoiding the words that prevent us from agreeing the same solution because we insist on using the same language with different words?
Extending this to AI - given that AI needs data.
As a further thought, does the lack of current wordset descriptors for data provide a rationale as to why AI will be slower to become adopted than perhaps the technology will enable? Is the timing such that we will spend too much time debating words that cannot describe the concepts and therefore, we cannot provide the assurance or governance?
/# please contribute #/
We need you to offer suggestions, ideas, wordsets and brands - these will get debated at forums and meetings. All input will be open and shared through our community and member at mydata.org, IIW, VRM, Kantara and other initiatives such as W3C, Open Intelligence, Open Knowledge forum, WEF and other committee where our members met and collaborate.
Let’s start to find better words but it can only happen if you contribute and help.
Please share and make suggestions and help refine. As a best practice please provide a mandate for any company, brand, word set as an activity that allows for a public debate to claims and what are the constraints of the new function.
Someone may want to take on a role and survey the players in the space and ask what notions they are trying to convey in their platform/ software that they are having trouble explaining, etc. we could come up with a list that everyone could work on reducing to the new language. For example, we could take our Consent Access Contracts and Consent Receipts and outline the individual items that make them up, then try to explain each as simply as possible and finally assign a word as a shortcut to it.
Antti Jogo Poikola