New text mining applications

December 5, 2011

Linguamatics are probably best known for our application in the Pharma industry, and perhaps our Twitter mining project in the May 2010 election.  Applications of I2E have expanded significantly over the last 18 months so we’ve highlighted a couple of interesting examples below.

Healthcare Linguamatics’ high performance text mining platform, I2E, allows healthcare providers to identify, extract, synthesize and analyze relevant facts, connections and correlations from unstructured or semi-structured textual information with precision and at scale, radically improving speed to insight. More information…

Text Mining within a Biotech Setting Having access to advanced text mining capabilities to make target selection decisions was identified as a key informatics technology to integrate into Syntaxin’s selection process. This case study outlines how combined text mining queries via I2E allowed an informed disease selection process to be implemented. More information…


Linguamatics Spring Users Conference 2011

May 16, 2011

This years Spring Users Conference is being held in Cambridge, UK between 17th-19th May. It promises to be another great event with attendance from many of the top names in the sciences industry attending. As usual it provides a great forum to share ideas, learn about the developments in text mining and provides a great opportunity to mingle with professionals from the industry. Read this to find out more


Linguamatics announces the release of I2E 3.2

May 16, 2011

Linguamatics has released I2E 3.2 – a faster, more powerful version of its award-winning text mining software
The company continues its record of innovation and growth in life sciences with the latest release of its market leading text mining solution. Read more…

Find out what’s new in I2E 3.2


Trend Analysis – Can a prediction be made?

November 1, 2010

By looking at the popularity of the leaders during each of the televised debates is it possible to make a prediction on who would be the eventual winner in the actual general election? This is a difficult question to answer but if we look at the statistics there are some conclusions that appear to be possible from the extrapolation of the data that was mined.

These graphs show the percentage popularity of each the leaders during the three televised debates as well as the final election result in terms of percentage of overall votes cast.

The most striking thing to note is that if the Twitter data is extrapolated out (linearly) it corresponds very closely to the actual election results. The extrapolated result for Cameron was 37%, the actual election results was 36% (Conservatives’ share of total votes cast). The extrapolated result for Clegg was 25%, the actual election result was 23% (Liberal Democrats share of total votes cast).

Clegg’s declining popularity and Cameron’s correspondingly increasing popularity stands out quite clearly from the number of positive Tweets about each potential leader. The TV popularity effect of Clegg did not translate into actual votes but the trend analysis if extrapolated would have predicted that. The increasing popularity for Cameron was conclusive as history now shows.

This case study shows how the power of using NLP with the I2E software platform can be used to gain quite powerful insights on what is likely to happen based on opinions expressed by people using social media platforms.

Linguamatics’ I2E text mining software was used to find and summarize tweets that have the same meaning, however they are worded. I2E identifies the range of vocabulary used in tweets and uses linguistic analysis to collect and summarize the different ways opinion is expressed.


Positive or Negative?

May 4, 2010

It was nice to see our Twitter analysis of the final UK election debate make it onto the BBC’s Rory Cellan-Jones blog . RCJ’s post is an interesting one which reports several analyses from different sources, raising the question about what to publish from the Twitter feeds.

In his blog, RCJ /was surprised that we found “Nick Clegg had 37% of positive tweets, followed by Gordon Brown with 32% and David Cameron in 31%,” finishing with “My hunch is that the volume of tweets may have been higher for Gordon Brown than David Cameron – and for all those positive ones, there were actually more that were negative.”/

Our issues-based analysis found nearly *4 times as many positive tweets* as negative ones. Rory was correct that Gordon Brown had a higher volume than David Cameron overall but not on the positive vs negative scores.

Here is a quick breakdown of what we found:


Look before you leap – avoiding irrelevant Tweets in your analysis

April 30, 2010


The recent UK election debates have provided a great platform for monitoring social media networks to get ‘instant’ reaction to both the personalities and the issues of the future prime ministerial candidates. This analysis adds a new perspective to the tried and trusted methods of opinion poll analysis. It (social media monitoring) in no means competes with the traditional opinion polls which are backed up with rigorous methodology and proven record. However, to ignore what people are saying in an unconstrained and free environment is missing out on an important dimension.

Twitter provides this sort of environment and as you can see below we have been doing a lot of work in finding out what Twitterers were saying during the recent election debates.

Having done the debate analysis over the past few weeks, combined with months of research on other subjects (Haiti earthquake, Swine flu vaccine) previous to that, we’ve learnt a lot about the nature of tweets and how to extract meaning from them. One of the challenges of Twitter text analysis is to remove irrelevant noise. Take a look at the graph below which plots positive sentiment of each of leaders during the BBC debate on 29th April 2010,

It seems there was huge surge for David Cameron at about 8.45pm. What did he say or do that caused such an up swell of public opinion? Did he make a cutting comment? Did he reveal a revolutionary new policy? If only…

Twitter, due to its very nature, is prone to ironic and witty remarks which can easily be misinterpreted by some systems (and humans). This spike was caused by someone tweeting “@mrchrisaddison: Sky poll just in! David Cameron won the debate! …”. It was a sarcastic remark that sparked a huge amount of re-tweeting. People were re-tweeting as a joke and some must have thought it was real ground breaking news. Chris Addison is a comedian who has around 24,000 followers.

Here is what the graph looks like with the irrelevant noise filtered out,

The moral of the story? Use the right tools to filter out rubbish. Make sure you have a good understanding of the data. Finally never ever forget that human analysis is always needed as a final step, using good quality tools will reduce the human effort but not take it away completely.


Linguamatics reveals instant reactions on Twitter to final televised election debate

April 30, 2010

Linguamatics’ linguistic analysis provides immediate insight into tweet sentiment towards party leaders during the final televised UK election debate, April 29 2010. The preliminary results from tweets sent during the debate, including a new view on the instant reactions to particular issues (Figure 1), showed a further narrowing of the gap between the leaders’ performances (Figure 2) but with Nick Clegg still performing best overall.

Figure 1 – How twitterers reacted to particular issues in the final debate

 

  

 

The overall tweet analysis (Figure 2) for the three debates shows the percentage of tweets in favour of each of the leaders. Nick Clegg’s share has dropped from 43% in the second debate to 37%, Gordon Brown down from 35% to 32%, while David Cameron rose from 22% to 31%.

Figure 2 – Number of tweets showing positive sentiment towards each party leader

  

 

Top issues for the twitterers in the third debate (Figure 3) were immigration, banking, economy and tax. Clegg and Brown shared the lead on immigration, Clegg was ahead on banking and tax, whilst Brown clearly won on the economy.

Figure 3 – Winner per topic from number of relevant positive tweets

 

Tracking positive sentiment towards each of the leaders during all three debates (Figure 4) also reflects the narrowing gap between their performances.

 

Figure 4 – Positive sentiment towards leader over time during the debate

 

The published results come from the deep analysis of 187,000 tweets sent by 43,656 twitterers from 8.30pm – 10.00pm on the night of the third televised UK election debate.

Linguamatics’ I2E text mining software was used to find and summarize tweets that have the same meaning, however they are worded. I2E identifies the range of vocabulary used in tweets and uses linguistic analysis to collect and summarize the different ways opinion is expressed.

Description of the figures in the press release

Figure 1 shows how the twitterers reacted to particular issues during the debate.

This is a timeline showing the positive tweets made about each leader in relation to audience questions or key statements made by a leader.

Figure 2 shows the number of tweets that expressed a positive sentiment towards each of the party leaders.

The analysis identified tweets saying that a particular leader was doing well or made a good point, or that they like the leader, etc. Linguistic filtering removed examples which were about expectations, e.g. “I hope the leader will do well”, questions, such as “anyone think the leader is doing well?”, and negations, such as “the leader did not do well” or “the leader made no sense”.

Figure 3 shows winner per topic from number of relevant positive tweets.

The analysis identified a list of topics by identifying words or phrases which described the discussion subject, for example Trident, nuclear weapons, armed forces, military, and Eurofighter are assigned to defence. The tweets were then analyzed to find out who was saying positive things about each leader in relation to a specific topic.

Figure 4 shows Figure 1 (positive sentiment towards leaders over time during the debate) compared with the positive sentiment results from the two earlier debates.


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: