Will sentiment analysis break through in 2012 - probably not

Image002

Image source

The current estimates are that there are over 500 tools that will listen, monitor, track or analyse your business, product, service, brand, PR, reach, influence or customer digital interactions and deliver a dashboard of “data analysis”. The question is now not should we do it or what to listen for but how to read the analysis and decipher what customers are saying or trying to tell you; but not going as far an assuming customer know what they want!? Hence the interest in “sentiment analysis” which aims to give better output analysis delivering better marketing, detection of opportunities and threats, protection of reputation and brand, and maintain or improve margin.

We know that analyzing natural language is difficult (even without accents) however sarcasm and other forms of derisive language adds additional complexity and we cannot assume that customer statements are true, as we know that  context has a great bearing on meaning.

Given I was working on video phones and natural speech algorithms in 1990/ 1991, these are tricky coding issues. One problem we faced then has gone, which was a zero cost method of collecting large quantities of data, but working out the algorithm and making the code efficient is in a different league. My view is that sentiment analysis tools will continue to evolve but am I expecting a major break through in 2012:  NO.

What is needed is beautiful data representation so we can look at the data and spot the trend.  Tools will get there but the jump is still too big for the level of trust and reliability we demand.

When Big Data says "Happy Christmas", what is the sentiment?

Image001

When Big Data says "Happy Christmas", what is the sentiment?

I always say "Happy Christmas," however, this year as I write my chosen Christmas messages, I am forced to consider what someone else's algorithm will imply about me, based on my use of digital words.

I want to explore in this ViewPoint, through the use of a "Happy Christmas" message, the level of TRUST already granted to something we cannot touch in a digital world.

Scene setting - Trust and Sentiment

Let's consider the word happy and what it could imply.  If we think about it, we know that taking the use of the word 'happy' out of context from Happy Christmas, we could imply wrongly that from its current abundance of use that everyone is now more happy.  It would not only be misleading but could lead to personalisation errors later. The same principle applies for the word 'merry', it would be wrong to assume that the current use of it means that we have all drunk more. This simplistic view does demonstrate how such simple words can create complex data sentiment analysis problems.

Just to stretch the thinking further, let's consider the ethics of the person who wrote the computer program (code) on the device you are using to view this or the algorithm behind your favourite search index.  Not only can we easily misunderstand the words you use and take them out of context, but applying the analysis to determine or suggest something about you can be flawed either because the algorithm is flawed or the person who writes the code may have a different outlook or culture.  Therefore, just imagine how much we need to TRUST someone who is trying to provide a SENTIMENT analysis based on what you have written without the context of human signals and other environmental data.

What does data really tell me?

The honest and truthful answer is not much, but I like to pretend that only the smallest snipped of data can tell me everything and with some insightful tools, code and algorithms I can predict what you will think next.  If you think about your DNA as a tiny snipped of data, it can indicate many things about the physical you but it will never tell me what you are doing right now, who your friends are, what dreams your have or what you will eat tomorrow.  An important question is what can I really extract or imply from data or your digital footprint. From this we need to determine what crosses the creepy line and whose culture and ethics are we working to?

What data can I collect from your Christmas time digital interactions?

I can collect your words (verbal and written), who you send messages to, who responds, what time you sent and responded, how often, the location, time to prepare messages, web sites visited, clicks, links, data volumes, who influences you, TV viewing, music listened to, which device .... in reality everything you do in a digital world I can gather/ harvest/ collect or be given. It is not easy but I can do it.

Given that this ViewPoint is exploring TRUST from the stance of data analysis and the ability to derive your sentiment or intent from your data, then knowing that gathering data is possible where next. To be clear when I use sentiment I am seeking to understand and present your emotion; what you really mean (meant) or what you wanted to infer (imply) and what level of TRUST is assumed in my interruption (how close am I)

All of this only has value to you and me if I deliver a personal report after Christmas saying how many cards you send and received, from who and what the sentiment of what you said and what was said to you.  Hence my interest in TRUST, do you think I got it right and if so, do you believe what I am saying about others sentiment towards you.

It now gets complex.

Let's assume you have presented on some social network a faith or religious preference.  Using this snippet of data (knowledge) and "Happy Christmas" what could I infer and at what point does a digital interpretation of your data become creepy and dangerous. Here is a scenario....

A Jewish Orthodox friend of mine responds to my "Happy Christmas" message. Does the algorithm that analyses my data say that I am not sensitive to someone else's views or that they wishing me "Happy Christmas" back is undermining their belief. What happens when my friends post Christmas report finds its way to the chief Rabbi who now wants to know why he is wishing everyone "Happy Christmas." Was my friend being sensitive to me, enjoying the warm wishes, happy to hear from me or something else.  Would/ should/ can the analysis be different if my friend is a fellow Christian, progressive Jew, a Muslin, a Hindu or an atheist?  Consider the same issues when I am wishing my friends a Happy Diwali or asking how Ramadan is going?

Writing an algorithm to understand human nature which takes into account our own experiences and personal history and considering others is not simple. The algorithm (even if it worked) is also likely to diverge from reality as we tend to deny the output (reality) if it is too close to being true. But….

Who wrote the algorithm and who wrote the code? 

One aspect we get worried about is the collection and storage of data and we can see, touch and understand it. The range is very wide and includes those worried about CCTV and data from our mobile phones. I can easily gather data about your “Happy Christmas” messages. Some worry for you about PII (Personally Identifiable Information) where it is and how it is protection. Others get concerned about how anonymous data is and even a few about how I can re-construct data to identify you. We should all be very grateful that some great minds worry about these important issues and debate the impacts of your data. However, I am currently thinking about who is writing the algorithm and code that takes this data and creates value for you and someone else.  Should we do this analysis, is your sentiment more private than your public views?

Valuable, intrusive, creepy or wrong.

Is sentiment analysis valuable, intrusive, creepy or wrong? Everyone will have a view and I am sure that we can segment the market and with data I can tell where you fit in the range. However, your view could be based on what you don't want to face.

Imagine you are about to buy a present for you partner, and based on your location or the web site's you used immediately before the one you make a purchase, I could determine a sentiment towards that person and they could have access to this analysis.  Does understanding how you spent the time before an action help in a decision about your sentiment of care, love or affection? 

Do you want Apple, Google, Samsung, your bank, your mobile operator, your loyalty card provider to know that you are single and at the office party your send a text that was fun at the time but the person who got it and their service providers know the sentiment due to the circumstance and that your credibility, influence or reputation has been increased or reduced? Now you can argue with me that this is not possible, invasive, removes all human dignity and that what you do is unique; so when you have read "Predictably Irrational" - let's have that chat. 

The Semantic Web

If the next phase of the web (Web 3.0, the intelligent web, the semantic web) is where the web knows what you want to do before you do there are some complexities we need to face when we wish someone a "Happy Christmas!"  Even if we could ignore the global economic crisis we live in tricky digital times where we now have the data, but are we ready to understand how to use the data and accept it for what is it. When the web has an understanding, insight, view, opinion or knowledge about you, can we accept that it may tell us something we don't want to face up to. Sentiment is more than a word or a phrase and is linked to what we do, when we did it, with whom, with thought and with time.  One issue is the algorithm that takes this data and creates a view about sentiment - another is the bias/ culture/ views/ opinions/ motivations of the programmer/ data scientist or coder who you cannot do anything about but TRUST. 

Politically we ask who polices the police, maybe it is time to ask how do we confirm our Trust is correctly placed in those building web services? The power is not with a regulator or in public or private law but in how we accept transparency and live with the fact we as human are all different but all the same. Much like DNA, data is all the same at one level (ones and zeros) but the bigger the data gets the more unique it becomes, just like us.

This issue is not about what is Private or Public but Rights

A New Year provides time to reflect and look forward.  From my narrow view of digital identity, data, reputation, sentiment, devices and networking; I would say that 2011 was driven by privacy issues at many different levels.  Going forward will, I believe, be a time when there becomes a much wider realisation and acceptance that private, privacy and public are not the debate but the issue is that no-one has control of my, your, our data and that we need to start thinking about rights; who grants them, who provides command and governance, who has access, how your data can be use and how digital citizens can get value from their data. 

You cannot control it and your data is out there, however, should you have the rights to revoke your phone number from someone else’s phone book or should they be able to access your sentiment for the message just sent to you?

I wanted to explore in this ViewPoint, through the use of a "Happy Christmas" message, how much TRUST we have granted to something (the algorithm and coder) which we cannot touch in a digital world. I hope that you can see that there are those who worry about privacy of data, but is so many ways this is just the tip of the iceberg.

Here is your chance to vote on the integrity of my "Happy Christmas". Do you believe in the sentiment of my "Happy Christmas" message - please vote here !


 

 

What are we worth if 1M Facebook fans only turns up c.826 likes and 309 comments per post

Image001

Simplify360 has been exploring the relationship between the number of Facebook fans and engagement level to reveal that on an average, each new post generates 826 likes and 309 comments. The starting point was 50 Facebook fan pages with a random mix of brands from all over the world from consumer brands, to sports teams, to celebrities.

What it tells us that there is some engagement but not a lot.  I would like to see the coloration of the noisy ones to see if it is many people or just a few.

This also misses sentiment and they miss a lot by defining Liking Rate and Commenting Rate as the average ‘likes’ and ‘comments’ a post would generate if the number of fans for the page is normalized to one million. Only posts by the page admin are considered for the study.

Overall – it says that the content was not written to generate comment, engagement, conversation or relationship….

Missing the point by analysing the data and not sentiment

Image001

"I love being able to pay bills"  doesn't mean I like actually paying them.

I am on the search for phrases and word sets that allow me to test a number of algorithms to see if there is actually understanding/ correct interpretation.  The point is to look at the data (metadata) and determine insight and not facts.  The trick to engagement based on what the data tells your; is how the insight is presented back

All ideas welcome.

What are the definitions for social signals, pulses and waves ?

Image001

 

This is about social media using engineering terms to try and define/ categorise patterns being seen or looked for in your data.

For the purposes of this blog I am currently defining the following:-

Social signal (physical) - think physical behavioural signals you give off when interacting. A seemingly erratic behaviour that routine, regular, repeatable and actually has a defined pattern irrespective of who you are.

Social signal (digital) - think digital behavioural signals that are a continual feed from digital interactions.

Social spike - think spike from a crowd doing the same thing for a short time and then moving on.

Social pulse - think regular pattern or behaviour from a crowd when stimulated.

Social wave - think growing sentiment of change from a crowd doing something different and moving to a new normal.

Social trend - think underlying slow change in the crowd.

Viewpoint - Generating wealth from the Web. Is follow the new economic model poised to take on search?

I wrote that Social filtering is deeply human at the beginning of November and I knew that there was more to the topic/ theme/ thought then but I could not articulate it. Since then I have been juggling with various ideas, these have often been driven by my necessity to justify Twitter. Twitter, get it or not, provides a function called “follow” – you can follow who you like, and you get updates/ insight/ information/ attention from them. However, can you turn “follow” into value and is following your social filter based on those you trust.

Follow has an obvious value to the person who follows the leader. You gain free insights/ selection/ value/ updates/. This social filter is based on trust and it is different from curators and editors who have specific agenda’s and income/ profit requirements. In the original post I quoted David Armano “Often times the quality of links and information I get on Twitter is better than what I would have gotten from Google because the knowledge of the human feed is deep, niche, and fickle.”

 

Scenarios

Here are several scenarios to consider when thinking how we could turn follow into value and comparing outcomes from search and social networking, they are not exhaustive but should provide a good place to start a train of thought.

1. I am looking for a great Thai restaurant

Option 1. Search.   Type in “Great Thai restaurant” into Google, my mobile sends my location and Google takes a guess I want food tonight and near to where I am search, reasonable assumptions driven from our need for context and personalisation. From the “unknown algorithm based results” that favours Google, I then read some third party reviews which I cannot judge if they are paid, biased or just vocal. Is the selection any better than walking past and seeing how many people are sitting in the restaurant?

Option 2. Post to Facebook and ask my friends and my network where a “Great Thai restaurant is” – there is more work to this one and I am wholly dependent on someone helping. Size of network helps at this point.

Option 3. Twitter/ follow. I love Thai and I am already following others who love Thai. I Tweet to my network of same minded followers who can deliver a recommendation. 

In option 1 – Google wins. In option 2 – Facebook wins. In option 3 – the community wins and the person who helped me may get a discount on their next meal.

 

2. I want to invest some money

Option 1. Search.   Type in “Great Investment fund” into Google. From the “unknown algorithm based results” that favours Google I will click on some links and read, subject to many legal notices, about the performance of various funds. If I invest I will have watch and wait for the results

Option 2. Post to Facebook and ask my friends and my network about their experiences with “Investment funds.” Not sure I would really be that happy with this for many reasons including telling the world about my desire to invest.

Option 3. Twitter/follow.  I love to invest and I am already following others who love investment. I follow a service that allows me to manage my own money (never give up control) and I invest based on what the best in class is doing (www.covestor.com) To follow the best investor I share some of the upside. No management fees, no overheads, risk on my terms, stop and start when I like. Worth noting that J.P Morgan funds investment advice is now on iTunes

In option 1 – Google wins. In option 2 – no-one wins. In option 3 – the person who I follow gets a share of my upside, assuming that they want to create value over time and not destroy it once.

 

3. What is hot in tech/ service/ my industry

Option 1. Search.   Type in “what is hot in tech” into Google. From the “unknown algorithm based results” that favours Google I will click on some links and read. The top tech web sites are there with breaking news. I can use various tools to determine what is hot and trending or I can use my “reader” to filter from my own favourites.

Option 2. Post to Facebook and ask my friends and my network about what they think is hot. Day 1; I will get a few views. Day 10; I will get a less help and probably a polite note telling me not to ask again. 

Option 3. Twitter/ follow. I look at what is trending and select a few “trusted” people to follow and follow updates as and when they occur. I add value to my network by adding my own opinion, or pay to sit there and listen.

In option 1 – Google wins. In option 2 – no-one wins. In option 3 – the community/ cluster wins.

 

Logical response

The obvious contention to these three and very simple scenarios is; to Quote Paul Rodriguez who commented,  “lemmings, pied piper, following somebody the wrong way up a one way street, jump off a cliff if I told you, following the falling domino in front and having the falling domino behind follow you, following somebody you trust, who is following somebody they trust who is following somebody they trust who is following somebody stupid, the list is endless...the risk is that instead of having the madness of crowds, maybe the 21st century equivalent is the madness of tweets? Laws such as the snowball effect and the law of unintended consequences become far more amplified in an interconnected world. In which case market (and wealth) fluctuations become more volatile, but then you only *truly* make money on the gradient.”

I expect that there is a lot of empathy for the logic of this response, however, is follow (Twitter or other tech based follow services) any different from what we have today with editors/ press/ celebrity and broadcast as we all believe everything from the red top tabloids and sky/fox news!

 

Context

However, putting follow into context Researchers at HP Labs discovered that Twitter can predict, with astonishing accuracy, how well a movie will sell. The researches at HP started by monitoring movie mentions in 2.9 million tweets from 1.2 million users over three months. These included 24 movies in all, ranging from Avatar to Twilight: New Moon.  Then they took two different approaches, dealing with two very different performance metrics: the first weekend performance, which is largely built on buzz and the second weekend performance, which is largely built whether people actually like the movie. To predict first weekend performance, they built a computer model, which factored in two variables: the rate of tweets around the release date and the number of theatres its released in. Lo and behold, that model was 97.3% accurate in predicting opening weekend box office. By contrast, the Hollywood Stock Exchange, which has been the gold standard for opening box-office predictions, had a 96.5% accuracy. “

What should be even more alluring to business strategists and CEO’s; as Tech Review points out, Twitter might be more than just a mirror of mass sentiment - the service might also influence it. In other words, could you actually make a product launch far more successful with a really smart Twitter/ Follow strategy?   However are we measuring or observing the results of a system in motion and in the process influencing those results? For anyone with a science background this will bring up Werner Heisenberg and The Uncertainty Principle

Heisenberg determined that “both the position and momentum of a particle cannot be known simultaneously.”   The dichotomy raises the mind-boggling prospect that unless we observe an event or thing, it hasn’t really happened, that all possible futures are quantum probability functions waiting for someone to notice them - trees falling unheard in a forest. Maybe this Viewpoint never existed until you searched for it and Google created it as you wanted it!

(Yes for those who have mastered QM I am confusing the observer effect of with the uncertainty principle. Technically the uncertainty principle has nothing to do with "observing", it has to do with measuring. The observer effect is a supposed effect of observing an event and the influence of your observations on the event. No one would ever have to actually observe a particle's position to obfuscate its momentum, the mere act of using the photons to measure its position, even if nobody ever observed it, would suffice. It's the act of measuring, not actually observing that causes the uncertainty principle, but when observation requires something that may cause change the problems occur)

Anyway, how does this relate to the analysis and feedback within my framework of thinking about Follow?  Think about it this way:  The mere act of observing a social change, changes the behaviour of that social object.  In “reality TV” they put cameras in front of “real” people for the viewer to watch how “real” people behave, date, compete, etc.  But this in fact makes those on camera less and less real.   They’re not actors, nor are they behaving like normal people.  They are somewhere in between the two. 

In the case of Twitter predicting a movie success, could an editor or critic have the same effect, if they could do it in real time and not on paper? How does Google real time search affect your searching habits and techniques.  You no longer have freedom in the web, as the recommendation is based on what the crowd says is important and therefore we are actually just lemmings.

 

Restating the Problem

Therefore the problem (Generating wealth from the web) is far more complex, multifaceted and inter-twangled, as there is unlikely to be a single source.

  • Do I want to be directed by people I trust but I may not be able to determine their source – Follow
  • Do I want to be directed by an unknown algorithm that can change at any time and could be biased to their own needs – Search
  • Do I want to be directed by Brands – Marketing/Ads
  • Do I want to be directed by the media/ editors/ critics where I may be able to determine their bias – Broadcast/ News
  • Do I want to be directed by the fashion/ celebrity – Sales

 

This complex dependency is an issue which editors and bloggers have faced time over. Do I write based on what people want to read, based on clicks and response data or what I find interesting – are we (am I) adaptive or reactive, do we want to be individual or loved or make money or provide democracy or lead?

I really don’t need to know what you had for lunch and I don’t have to follow you.  Follow would put me in control and can seek out value from the community and not some bland algorithm that controls what part of the web I can see. However the issue facing follow is how will I pay the platform that underpins the service?

Effort

A reasonable concern would be that the 'follow' theory is weakened if the 'followed' account generates little content, or at the wrong time. e.g. If I follow five Thai restaurants but only one puts messages out, at 1am, I am not going to that delighted with the experience (unlike search that does not provide real-time as everything has to have been indexed). This low level of activity from Follow has two effects, you give up early or I personally have to exert a lot of effort as I need to continually add/prune/curate.  This takes time, as humans we are inherently lazy and would therefore prefer for someone else to do this for us.  Editors rule!

Wrapping up

This long Viewpoint started with the idea that “follow” is the new economic model poised to take on “search” and I believe that there is substantial value in “follow.” Reading that Google offered $3bn for Twitter makes be believe that there are other strategists who are struggling with the same issues and the value!

Researchers at HP Labs discover that Twitter can predict, with astonishing accuracy, how well a movie will sell.

Image001

Original Article is from Fast Company

“Asur and Huberman started by monitoring movie mentions in 2.9 million tweets from 1.2 million users over three months. These included 24 movies in all, ranging from Avatar to Twilight: New Moon.

Then they took two different approaches, dealing with two very different performance metrics: the first weekend performance, which is largely built on buzz and the second weekend performance, which is largely built whether people actually like the movie.

To predict first weekend performance, they built a computer model, which factored in two variables: the rate of tweets around the release date and the number of theaters its released in. Lo and behold, that model was 97.3% accurate in predicting opening weekend box office. By contrast, the Hollywood Stock Exchange, which has been the gold standard for opening box-office predictions, had a 96.5% accuracy. “

But what should be even more alluring to marketers: As Tech Review points out, Twitter might be more than just a mirror of mass sentiment - the service might also influence it. In other words, could you actually make a product launch far more successful with a really smart Twitter strategy?

@WeFeelFine: will sentiment lift with the news of a Royal Wedding?

Image002

Harvesting the data we submit to the social web and using it to make a judgement about how “we” feel. 

We Feel Fine has been harvesting human feelings from a large number of weblogs and every few minutes, the system searches the world's newly posted blog entries for occurrences of the phrases "I feel" and "I am feeling". When it finds such a phrase, it records the full sentence, up to the period, and identifies the "feeling" expressed in that sentence (e.g. sad, happy, depressed, etc.). Because blogs are structured in largely standard ways, the age, gender, and geographical location of the author can often be extracted and saved along with the sentence, as can the local weather conditions at the time the sentence was written. All of this information is saved.

I wonder how it changes with the announcement of a Royal Engagement?

Have a play at their web site

You can buy the Book  An Almanac of Human Emotion: Sep Kamvar and Jonathn Harris

Peer Index - who are the authorities on the web.

Image002

 

Just been reviewing  @PeerIndex  http://www.peerindex.net/ - this is a web site that looks to rate "Who are the authorities on the web?"   Their claim is that PeerIndex helps you discover the authorities and opinion formers on a given topic.

 

I cannot work out how much is based on what you say about yourself vs how much it is biased towards how much others say about you, nor how sentiment is taken into account.  Obvious is that it is only that material that is public....