When Complex multi-dimensional data creates users that cannot exist

It is all about averages. 


If you have a single dimension data set, say height, with a large data set, it is probable that one user will be the average of the entire set.

In a two-dimensional data set, height and weight, it is probable that one user may have these two characters as the average of the entire set.

In a three-dimensional data set, inside leg measurement, height and weight, it is tending toward impossible that one user may have the three average characters of the entire set.

More data makes it more probable, but also more characteristics make average persons more improbable, as mean, mode or medium.


Facial recognition uses about 80 nodal points, it is (im)possible that a single data subject in large data set will be average on all points.   

The point is that we think more data will create a better understanding of our users. This is unlikely to be the case.

What we need to determine are the boundary conditions where the data we have access to enables better decisions.   Therefore we need to determine the sum of all the biases in the data set, the sum of all the possible errors in the data set and then determine the unknown and known gaps in the data.  Whilst qualitative and not quantitative, it will at least help us start to frame that more data does not get us better insights about all users.