Patent literature is a hugely valuable source of novel information for life science research and business intelligence. The wealth of knowledge disclosed in patents may not be found in other information sources, such as MEDLINE or full text journal articles.
Patent landscape reports (also known as patent mapping or IP landscaping) provide a snap-shot of the patent situation of a specific technology, and can be used to understand freedom to operate issues, to identify in- and out-licensing opportunities, to examine competitor strengths and weaknesses, or as part of a more comprehensive market analysis.
These are valuable searches, but demand advanced search and data visualization techniques, as any particular landscape reports requires examination of many hundreds or thousands of patent documents. Patent text is unstructured; the information needed is often embedded within the body of the patent and may be scattered throughout the lengthy descriptions; and the language is often complex and designed to obfuscate.
Text analytics can provide the key to unlock the value. A recent paper by a team at Bristol Myers Squibb describes a novel workflow to discover trends in kinase assay technology. The aim was to strengthen their internal kinase screening technology, with the first step being to analyze industry trends and benchmark BMS’ capabilities against other pharmaceutical companies, with key questions including:
- What are the kinase assay technology trends?
- What are the trends for different therapeutic areas?
- What are the trends for technology platforms used by the big pharmaceutical companies?
The BMS team built a workflow using several tools: Minesoft’s Patbase, for the initial patent document set collection; Linguamatics I2E, for knowledge extraction; and TIBCO’s Spotfire, for data visualization. The project used I2E to create precise, effective search queries to extract key information around 500 kinases, 5 key screening technologies, 5 therapeutic areas, and across 14 pharmaceutical companies. Use of I2E allowed queries to be designed using domain specific vocabularies for these information entities, for example using over 10,000 synonyms for the kinases, hugely improving the recall of these patent searches. These I2E “macros” enabled information to be extracted regardless of how the facts were described by inventors. Using these vocabularies also allowed semantic normalization; so however the assignee described a concept, the output was standardized to a preferred term, for example, Pfizer for Wyeth, Warner Lambert, etc.
Using I2E also meant that searches could be focused on specific regions of the patent documents for more precise search; for example, the kinase information was extracted from claims (enhancing the precision of the search).
Using the novel approach the patent analysis team mined over 7100 full text patents. That’s approximately half a million pages of full text looking for relevant kinase technology trends and the corresponding therapeutic area information. To put this business value into perspective, it takes ~1h to manually read one patent for data extraction and a scope this large would require around 175 person-weeks (or nearly 3.5 years!) to accomplish. The authors state that innovative use of I2E enabled a 99% efficiency gain for delivering the relevant information. They also say that this project took 2 patent analysts 3 months (i.e. about 25 weeks) which is a 7-fold saving in FTE time.
The deliverables provided key actionable insights that empowered timely business decisions for BMS researchers; and this paper demonstrates that rich information contained in full text patents can be analyzed if innovative tools/methods are used.