NLP & Big Data Symposium in San Francisco

June 29, 2015

Life sciences and healthcare professionals gathered at the UCSF Mission Bay campus for the West Coast Natural Language Processing (NLP) & Big Data Symposium on June 18th. The symposium, co-hosted by UCSF, featured presenters from UCSF, Merck, City of Hope, Copyright Clearance Center and Linguamatics and delegates from a diverse range of organizations.

The central theme of this year’s symposium was “From bench-to-bedside, unlocking key insights from your data”. Healthcare delegates were keen to find new ways to address meaningful use and accountable care leveraging NLP text mining of electronic health records. Life sciences delegates were keen to increase the efficiency and effectiveness of their business operations by mining real world data. There was also a strong interest in forging partnership opportunities between pharma/biotech and hospitals/cancer centers.

Sorena Nadaf, the CIO and Director of Translational Informatics at UCSF Helen Diller Family Comprehensive Cancer Center delivered the welcome address and highlighted the foundation of clinical NLP and its common uses for extracting and transforming narrative information in EMR’s to support and accelerate clinical research.

Sorena Nadaf at the NLP & Big Data Symposium in San Francisco.

Sorena Nadaf at the NLP & Big Data Symposium in San Francisco.

Wendy Cornell, retired from Merck, described Merck’s development of a natural language processing (NLP) workflow to extract conclusions and interpretations from their large corpus of internal reports using the Linguamatics I2E software and the integration and analysis of the data using the ANZO platform from Cambridge Semantics. Automated extraction of conclusions and interpretations from internal preclinical safety reports using I2E was the primary use case discussed and generated a lot of interest and discussion.

Joyce Niland, Chief Research Information Officer, & Rebecca Ottesen, Biostatistician, from City of Hope (COH) presented a recent project with Linguamatics where they created a disease registry using Iterative Interactive Enrichment (IIE) of NLP queries shared across institutions. The I2E queries, initially written by the Huntsman Cancer Institute (HCI) and Linguamatics, identify immunohistochemistry (IHC) marker results from unstructured pathology dictations on malignant Non-Hodgkin’s Lymphoma patients. They were shared with COH, to assess their exportability from one institution to another. Linguamatics, COH, and HCI applied an IIE process through several phases to improve the IHC queries while sharing the improvements between institutions. Precision and recall were measured for each phase to assess the completeness and accuracy of information extraction, and to identify the most critical NLP features that impact these results. Final F Scores for both COH and HCI were .91 and .94 respectively. This impressive level of precision and recall across two institutions validates the Linguamatics approach of sharing it’s wealth of existing healthcare queries with I2E customers to help accelerate research and improve patient outcomes.

Chris Hilbert from Copyright Clearance Center presented CCC’s new RightFind XML for Mining service, the integration with Linguamatics’ I2E and how the combined solution improves the results of text and data mining queries and mitigates infringement risk. Chris demonstrated how customers can obtain and index full-text XML articles from multiple scientific publishers in I2E and avoid many of the data format and licensing issues associated with working with PDF’s. As existing licenced literature does not have to be repurchased, delegates saw this service as highly effective way of leveraging existing full text investments and extracting more value via I2E text mining.

To complement our customer and partner presentations, Linguamatics led presentations including an introduction to NLP text mining; healthcare NLP strategies to improve patient care, reduce costs and enhance population health; and Real World Data and text analytics.

It was wonderful to catch up with many of our customers, meet some new ones and help foster introductions and discussions between the various delegates. Keep an eye out for upcoming opportunities to meet with Linguamatics at our events page including our Princeton seminar on July 16 and Text Mining Summit and I2E Healthcare Hackathon in October.


Linguamatics I2E users lead the way in text mining for patents, safety and more at this year’s Spring Users Conference

April 28, 2015

We are always amazed and impressed at the inventiveness of Linguamatics customers, in their applications of text analytics to address their information challenges. Our annual Linguamatics Spring Users Conference showcased some examples of their innovation, with presentations on text mining used for patent analytics, chemical pharmacokinetics and pharmacodynamics data extraction, creating value from legacy safety reports, and integrating open source tools for advanced entity recognition. We had a record-breaking number of attendees this year, representing over 20 organizations, ranging from our most experienced I2E users to text mining novices.

A record-breaking number of attendees enjoyed the opportunity to experience Cambridge and share insights with one another at this year's conference.

A record-breaking number of attendees enjoyed the opportunity to experience Cambridge and share insights with one another at this year’s conference.

Patent analytics featured in two of the presentations, demonstrating the value of NLP in extracting critical information from obtuse and lengthy patent documents. Julia Heinrich (Senior Patent Analyst, Biotechnology at Bristol-Myers Squibb, Princeton, New Jersey) asked the question: “Can the infoglut of biotech patent publications be quickly reviewed to enable timely business decisions?”. She admirably demonstrated that with smart use of I2E’s NLP queries, BMS have been able to search the patent body for information on antibody-drug conjugates and convert “unstructured data” into user-friendly, analysis-ready data sets. Thorsten Schweikardt (Senior Information Scientist, Boehringer Ingelheim) gave an overview of workflows developed using KNIME to create patent landscapes for specific disease areas, target identification, and discovery of tool compounds.

Wendy Cornell (former head of the Merck Proprietary Information and Knowledge Management Group), like Julia Heinrich, flew over from the US for the meeting. Wendy presented on the automated extraction of conclusions from internal preclinical safety reports using I2E. These internal safety assessment reports contain a wealth of historical data around safety and toxicity of developmental compounds, and many pharma organizations have sought ways to gain benefits from these valuable legacy documents. Wendy’s group developed a strategy to access Documentum-based safety assessment reports, and were able to pull out histopath findings, organ toxicities, haematological and blood biochemistry results, even pulling out toxicokinetic parameters from tabular sections. Three use cases were presented, showing significant business impact within the Safety Assessment organization.

Wendy Cornell details how she used I2E to create NLP-driven workflow tapping into the large body of valuable knowledge located in structured and unstructured internal documents.

Wendy Cornell details how she used I2E to create NLP-driven workflow tapping into the large body of valuable knowledge located in structured and unstructured internal documents.

Linguamatics’ speakers gave an update on future innovations in the I2E roadmap, the new features in I2E 4.3, and the software’s applications in the life sciences and healthcare. Guy Singh showed how I2E 4.3’s Connected Data Technology allows users to exploit big data better no matter where the data are located (on premise, on the cloud), whatever structure they have, and doing this at speed, with digestible results. Phil Hastings gave a brief overview of Linguamatics I2E in Healthcare; and NLP Specialist James Cormack took us through Linguamatics’ approach and results for our submission to the i2b2 2014 Cardiac Risk Factors challenge. You can find out more about what we’re doing in healthcare via this short video.

We heard from a few of our partners in 5-minute lightning round presentations: IFI Claims Patent Services, ChemAxon, Copyright Clearance Center, Thomson Reuters and KNIME discussed their solutions and how they integrate with Linguamatics I2E.

In addition to the presentations, the Linguamatics Spring Users Conference provided opportunities for hands-on training, with workshops aimed at different levels of text mining experience. And of course, there was plenty of time for networking and idea sharing. Our evening events were hosted in the Old Combination Room at Corpus Christi College and the Pembroke College Old Library. We enjoyed beautiful, warm spring evenings at two of Cambridge University’s oldest colleges. One delegate remarked ‘It’s so nice to be shown hidden Cambridge treasures like these, which we would never know about if it wasn’t for the events at the Linguamatics conference.’

Evening social events at Cambridge University's historic colleges

Evening social events at Cambridge University’s historic colleges

The whole event was a great success that brought together the text mining community from across Europe (and across the pond!).

Presentations which have been approved to share are available on I2Edia and by email request.

Thanks to everyone who attended and contributed to the Linguamatics Spring Users Conference 2015, we look forward to seeing you in October at the Text Mining Summit in Newport, RI or in Cambridge, UK next year.


CENTILE Colloquium

July 14, 2014

Ask Jonathan!

Last year Georgetown University Medical Center launched the Center for Innovation in Leadership and Education (CENTILE). In June I presented a poster at the first CENTILE  Colloquium for GUMC Educators in the Health Professions. My poster Using iPads to Enhance Teaching and Learning on Patient Rounds explained how I have used iPads over the last four years on patient rounds to improve the education of medical students and residents at GUMC. I plan to continue to be involved with CENTILE in the future as I explore further innovative uses of technology in education.

Poster (2)

View original post


News: Queen’s Award for Cambridge Text Analytics Software Company Linguamatics

April 23, 2014

New text mining applications

December 5, 2011

Linguamatics are probably best known for our application in the Pharma industry, and perhaps our Twitter mining project in the May 2010 election.  Applications of I2E have expanded significantly over the last 18 months so we’ve highlighted a couple of interesting examples below.

Healthcare Linguamatics’ high performance text mining platform, I2E, allows healthcare providers to identify, extract, synthesize and analyze relevant facts, connections and correlations from unstructured or semi-structured textual information with precision and at scale, radically improving speed to insight. More information…

Text Mining within a Biotech Setting Having access to advanced text mining capabilities to make target selection decisions was identified as a key informatics technology to integrate into Syntaxin’s selection process. This case study outlines how combined text mining queries via I2E allowed an informed disease selection process to be implemented. More information…


Linguamatics Spring Users Conference 2011

May 16, 2011

This years Spring Users Conference is being held in Cambridge, UK between 17th-19th May. It promises to be another great event with attendance from many of the top names in the sciences industry attending. As usual it provides a great forum to share ideas, learn about the developments in text mining and provides a great opportunity to mingle with professionals from the industry. Read this to find out more


Linguamatics announces the release of I2E 3.2

May 16, 2011

Linguamatics has released I2E 3.2 – a faster, more powerful version of its award-winning text mining software
The company continues its record of innovation and growth in life sciences with the latest release of its market leading text mining solution. Read more…

Find out what’s new in I2E 3.2


%d bloggers like this: