What is the difference between voice, data and Google Buzz ?

This post is more about a question that I am struggling with than insight, the rhetorical question is “can you hear me thinking/”. Ignoring the obvious difference and the all TCP/IP arguments, yes voice can be VoIP and data is clicks are IP packages, I am interested in why our response to voice recording (the recording and interpretation of what you said) differently to data gathering from location, attention, clicks, content creation etc)

It seams that we generally accept that our digital footprints will be recorded (collected and stored), this data will be analysed and value will be created from new service discovery or improvements to existing services. We give up the rights to our click data, our blog post creations and facebook entries in exchange for free services (in general)

There appear to be 3 types (broadly) of data that can be gathered or harvested from your conversations (voice)

·         “Meaning” what is the meaning of the words spoken.  Hearing the whole conversation and recording it, or using human mediated TTS or STT as in the case of SpinVox or Relay Services – The issue here is that humans will interpret the meaning not machines, they can not only understand but could act on the information. . Often we are forced to cede our rights if we want to have access to services e.g. financial services call centres recording for ‘training purposes’.  As always with identity data, if the value exchange benefits the customer enough he or she will cede their privacy. SpinVox is therefore quite odd in reality.  I pay for a service; that service is to allow me to record your voice and get your message.  I am asking you to give up your rights so that I get a more convenient service

·         “Identification” using your voice to open up locked services. This is understood in the same privacy framework as other biometrics and accepted by users.  After all, they keep control of the key data in this instance so the privacy parameters are different

·         Intent, Phrases, volume, speed, mood and redundant information.  Can be interpreted by machines and can be de-personalised to extract value.  For example you might use a combination of these to interpret mood and add further contextual information such as location, last action, social context to personalize services or off next best actions.


When we consider something listening to our conversations to provide the same value added service argument as digital footprint data we go off the deep end about privacy. Example Voice to Text or Text Relay Services.  Although there has been controversy about the company’s use of humans to train and support the technology, it is clear that the level of privacy infringement experienced by Spinvox customers is nothing compared to that which deaf and hard of hearing customers experience every time they place a call. “


Why is this?  Is it that the voice is ours and can be uniquely identified, it could be replayed out of context, we say things for effect that we don’t mean, it is more personal or is it that this is what we are used to.

Voice is certainly personal, I think from a user perspective, it goes back to voice being a human interaction, conversation, relationship, hence our expectations are that you will not be able to recall word for word what was said, voice/audio is non-persistent in the mind for humans.  For this reason ‘phone bugging, recording, or listening & then using out of context is a very clear risk, but this is also not where the value is for most companies.    When we type we try to be explicit, social media is changing that as it becomes more like voice and is reactionary or not meant and therefore could easily loose context later in time. 

Do we believe that something listening to our conversations is more easily misinterpreted than the binary click? The question is could equal value could be created from looking at our email and listing to our voice calls and if so, why do we hate with such passion the idea of listening or snooping. 

Most would agree that more personal, human information can be gleaned from voice. There is a discipline of Voice Analytics which is mostly focused on call centres so out of scope.  Also Content mining is scary for customers if we know something they don’t know, and they know we know it.  There are the usual privacy arguments around this, but awareness appears fairly low in the (mass) market.  Is anyone really doing this and apply the patents in this area e.g. for targeting advertising based on mood.

This appears to be a very complex area as an individual’s perception of privacy changes depending on social context and the nature of the transaction or information conveyed and there is a need to retain a sense of privacy.  If there is clear user-consent and controls, all will be well but is the potential abuses of voice are more significant. 

So what is the link to Google Buzz and is it important – more thinking required.


Jean Paul Satre:  “…as speech is the expression of thought.  Speech is thought made outward in sound.  Can you hear thinking?  Perhaps speech is the nearest possibility.”  Book Link