What is the purpose of data? V2

We continually move towards better data-led decisions; however, we can easily ask our dataset’s wrong question. Without understanding “What is the purpose of data” on which we are basing decisions and judgements, it is easy to get an answer that is not in the data. How can we understand if our direction, Northstar or decision is a good one?  Why am I interested in this? I am focusing on how we improve governance and oversight in a data-led world. 

I wrote a lengthy article on Data is Data. It was a kickback at the analogies that data is oil, gold, labour, sunlight - data is not. Data is unique; it has unique characteristics. That article concluded that the word “Data” is also part of the problem, but we should think of data as if discovering a new element with unique characteristics.  

Data is a word, and it is part of the problem.  Data doesn’t have meaning or shape, and data will not have meaning unless we can give it context. As Theodora Lau eloquently put it; if her kiddo gets 10 points in a test today (data as a state), the number 10 has no meaning, unless we say, she scored 10 points out of 10 in the test today (data is information). And even then, we still need to explain the type of test (data is knowledge) and what to do next or how to improve (data is insights).  Each of these is a “data” point, and we don’t differentiate the use of the word “data” in these contexts.

Data’s most fundamental representation is “state” where it represents the particular condition something is in at a specific time.  I love Hugh’s work @gapingvoid (below) representation  Information is knowing that there are different “states” (on/off). Knowledge is finding patterns and connections.  Insight knows comparatives to state. Wisdom is the journey.  We live in the hope that the data we have will have an impact.

For a while, the data community has rested on two key characteristics of data: non-rivalrous (which plays havoc with our current understanding of ownership) and non-fungible (which is true if you assume that data carries information.)  Whilst these are both accurate observations; they are not that good as universal characteristics.

  • Non-rivalrous. Economists call an item that can only be used by one person at a time as "rivalrous." Money and capital are rivalrous. Data is non-rivalrous as a single item of data can simultaneously fuel multiple algorithms, analytics, and applications.  This is, however, not strictly true. Numerous perfect copies of “data” can be used simultaneously because the marginal cost of reproduction and sharing is zero.   

  • Non-fungible. When you can substitute one item for another, they are said to be fungible.  One sovereign bill can be replaced for another sovereign bill of the same value; one barrel of oil is the same as the next.  So the thinking goes, data is non-fungible and cannot be substituted because it carries information.  However, if your view is that data carries state (the particular condition that something is in at a specific time), then data is fungible. Higher-level ideals of data that is processed (information, knowledge, insights) are increasingly non-fungible.

Money as a framework to explore the purpose of data  

Sovereign currency (FIAT), money in this setting, has two essential characteristics.  It has rivalrous and fungible.  Without these foundational characteristics, money cannot fulfil its original purpose (it has many others now); a trusted exchange medium.  Money removes the former necessity of a direct barter, where equal value had to be established, and the two or more parties had to meet for an exchange.  What is interesting is that there are alternatives to FIAT which exploit other properties.  Because of fraud, we have to have security features, and there is a race to build the most secure wall.


[Just as a side note - money is an abstraction and part of the rationale for a balance sheet was to try to connect the abstraction back to real things. Not sure that works any more]

Revising the matrix “what problem is to be solved?” 

Adding these other options of exchange onto the matrix, we have a different way to frame what problem each type of currency offers as a method of exchange mechanism. This is presented in the chart below.  Sand and beans can be used, but they provide a messy tool compared to a sovereign currency.  Crypto works, and it solves the problem, but without exchange to other currencies, it had fundamental limits.  

If we now add digital data and other aspects of our world onto the matrix, we have a different perspective. We all share gravity, sunsets and broadcast TV/ radio on electromagnetic waves.  However, only one atom can be used at a time, and that atom is not-interchangeable (to get the same outcome.)  The point is that digital data is not in the same quadrant as sovereign currency and electrons as a beautiful solution based on being fungible and rivalrous.  

In the broadest definition of data which is “state”; chemical, atoms, gravity, electrons have state and therefore are also data.  To be clear will now use Digital Data to define our focus and not all data. 

These updates to the matrix highlight that, if data is non-rivalrous and non-fungible, these characteristics mean that is is very unclear to what problem digital data is solving.  We see this all the time in the digital data market, as we cannot agree on what “data” is, it is messy. 

The question for us as a digital data community is; “what are the axis [characteristics] that mean digital data is in the top corner of a matrix? This is where digital data is a beautiful solution to a defined problem, given that digital data is at its core is “knowing state.”  I explored this question on a call with Scott David, and we ended up resting on “Rights and Attestation” as the two axes 

Rights in this context are that you have gained rights from the Parties.  What and how those rights were acquired is not the question; it is just that you have the rights you need to do what you need to do.

Attestation in this context is the evidence or proof of something.  It is that you know what you have is true and that you can prove the state exists. How you do this is not the point; it is just you know it is provable.

As we saw with the money example, data will never have these (rights and attestation) characteristics exclusively; it is just when it has them, data is most purposeful.  Without attestation, the data you have is compromised, and any conclusions you reach may not be true or real. Continually we have to test both our assumptions and the provability of our digital data.   Rights are different as rights are not correlated with data quality, but rights may help resolve ownership issues.  A business built without rights to the data they are using is not stable or sustainable.  How and if those Rights were obtained ethically are matters to be investigated.  Interestingly, these characteristics (rights and attestation)  would readily fit into existing risk and audit frameworks. 

I have a specific focus on ESG, sustainability, data for decision making, and better data for sharing.  Given that most comparative ESG data is from public reports (creative commons or free of rights), it is essential to note there is a break in the attestation.  ESG data right now is in the least useful data bucket for decision making, but we are making critical investment decisions on this analysis data set. It is something that we have to address. 


In summary

If the purpose of data is “to share state” then the two essential characteristics data must have are rights and attestation.   Further, as data becomes information (knowing state), knowledge (patterns of states), insight (context in states) and wisdom - these characteristics of rights and attestation matter even more.  If you are making decisions on data that you don’t know if it is true or have the rights to it, becomes a dangerous place. 

As a side, there is lots of technology and processes to know if the state is true (as in correct - not truth); if the state sensing is working and the level of accuracy; if the state at both ends has the same representation (providence/ lineage ); if it is secure; if we can gain information; if we can combine data sets and what the ontology is.  But these are not fundamental characteristics; they are supportive and ensure we have a vibrant ecosystem of digital data.   

I am sure there are other labels for such a matrix and interested in your views, thoughts and comments.