work towards a quantitative model of risk in patent litigation

Proceedings of the American Society for Engineering Management 2016 International Annual Conference

S. Long, E-H. Ng, C. Downing, & B. Nepal eds.

Copyright, American Society for Engineering Management, 2016

ANALYTICS IN POST-GRANT PATENT REVIEW: POSSIBILITIES AND

CHALLENGES (PRELIMINARY REPORT)

Kripa Rajshekhar Metonymy Labs

Walid Shalaby*

Wlodek Zadrozny University of North Carolina at Charlotte

*[email protected] ____________________________________________________________________________________________

Abstract Recent analysis of litigation outcomes suggests that nearly half of the patents litigated to judgment were held invalid.

Commonly available patent search software is predominantly keyword based and takes a “one-size-fits-all” approach

leaving much to be desired from a practitioner’s perspective. We discuss opportunities for using text mining and

information retrieval in the domain of patent litigation. We focus on post-grant inter partes review process, where a

company can challenge the validity of an issued patent in order, for example, to protect its product from being

viewed as infringing on the patent in question. We discuss both possibilities and obstacles to assistance with such a

challenge using a text analytic solution. A range of issues need to be overcome for semantic search and analytic

solutions to be of value, ranging from text normalization, support for semantic and faceted search, to predictive

analytics. In this context, we evaluate our novel and top performing semantic search solution. For experiments, we

use data from the database USPTO Final Decisions of the Patent Trial and Appeal Board. Our experiments and

analysis point to limitations of generic semantic search and text analysis tools. We conclude by presenting some

research ideas that might help overcome these deficiencies, such as interactive, semantic search, support for a multi-

stage approach that distinguishes between a divergent and convergent mode of operation and textual entailment.

Keywords patent management, litigation, text analytics, semantic search

Introduction With patent data being publicly available and practitioners trained to think in terms of search, a number of solutions

exist in the market that provide patent research and analysis services (Outsell, 2014). However, the potential of

multistage semantic search is yet to be fully realized in this domain, particularly in high value areas of patent

litigation and licensing work. Commonly available patent search software is predominantly keyword-based and takes

a “one-size-fits-all” approach for practitioners across areas as diverse as prosecution, licensing, portfolio

management or litigation. A popular litigation support tool, LexMachina.com, now part of LexisNexis shows several

relevant statistics, but is yet to integrate semantic features that could more closely align with the work flow of

litigation practitioners in the field. Recent analysis of litigation outcomes suggest that "nearly half of all patents

litigated to judgment were held invalid" (Allison et al. 2014). Furthermore, the need for more thorough research and

preparation of quality patents is perhaps as strong as ever: US Patent quality appears to be lagging international peers

and the USPTO initiated its quality improvement initiative with pilots announced in May 2016. Semantic search will

potentially help quality improvement in select pilot areas, for example - determining whether similar claims are being

treated dissimilarly in different art units. The purpose of this paper is to illustrate the gap between practitioner requirements and current search

technology in select tasks, for example to understand licensing opportunities and provide litigation support. Given

the known limitations of commercial tools, we will focus on emerging technologies of semantic search and text

analytics. For our tests of semantic search we use a collection of patents explicitly mentioned in decisions of the

Rajshekhar, Shalaby & Zadrozny

2

USPTO Patent Trial and Appeal Board (PTAB); our analysis of text analytics is based on about 4000 patents in the

domain of resilience and sustainability and on review of relevant scientific literature. We report three results; the

first two concern finding prior art potentially invalidating a patent at hand. We observe

Result 1. At least 20% of highly relevant patents do not share explicit technical terms (concepts) with a patent at

hand.

This result suggests that concept expansion (adding broader sets of related concepts to queries, and not only

synonyms) might help finding such patents. However, using Mined Semantic Analysis or (MSA) (Shalaby and

Zadrozny, 2015), concept expansion on the descriptions of highly related patents revealed over 40% of pairs without

matching semantic concepts. This suggests the need for additional methods for association, a greater variety of

semantic connections, and perhaps more sophisticated interpretation of the patent claims language. For instance, two

patents responsive to the topic “foundry” -- that both deal with metal forming -- may not be highly related to each

other (in the sense of contributing to patent invalidation), although they may be a strong semantic match. However,

one that deals with slurrying and foundry processes, which is related, but not a strong semantic conceptual match,

does indeed invalidate one of the patents in our test case.

Result 2. Even advanced semantic search with provable best conceptual query expansion retrieves only about 10%

of highly relevant prior art. Thus not only we fail to retrieve the 20% of point (1), but we are missing about 90% of all highly relevant

patents. Results (1) and (2) might be suggesting we need deeper semantic understanding techniques, which, when

combined with semantic search, might give us a chance to solve the retrieval problem. However, current natural

language processing tools would likely have to be reengineered first. Namely,

Result 3. We are estimating that the structure of the majority of the main claims (Claim 1) cannot be correctly

analyzed by existing natural language parsing tools. We conclude that providing intelligent litigation support is going to be very challenging, and many research

problems would have to be solved before such technologies become a reality. Some of such promising research

avenues are discussed in Conclusions. The rest of the paper is organized as follows: We first discuss practitioners’ requirements in searching for

prior art. Afterwards we provide an overview of the data we use for our experiments, namely USPTO Patent Trial

and Appeal Board (PTAB) proceedings. Then we provide a description of our experiments. Since the experiments

are about limitations of current approaches, the question is how we go about building better tools. We sketch a

possible approach in Conclusions.

Practitioner Requirements While limited scholarly attention has been given to the requirements of practitioners in patent litigation and related

areas, we were able to use informal interviews and literature in the area of complex search to identify a few themes

of interest. We seek to explore these themes further in this paper and in future work. Prior work on information search requirements of lawyers revealed that their complex tasks required a

constructive process of interpreting, learning and creating. These requirements were found to be neglected by current

search approaches. For instance, an emphasis on well-specified requests without offering an option to examine a

wide range of information at one time. “These lawyers called for an active potential role for mediators in `just for

me’ services ... [that] provide a wider range of access more compatible with the process of construction, applying

and developing principles of classification that would offer a more uniform system for organizing and accessing files,

and providing direction in altering the overwhelming amount of information available on electronic resources”

(Kuhlthau and Tama, 2001). Consistent with these observations, the work of other scholars (Huurdeman and Kamps,

2014) suggest that current search systems predominantly use a “one-size-fits-all” approach, even in complex domains

where a “multistage” search informed by a practical multi-step information seeking models appropriate for a

particular field may be better. Patent cases have substantial uncertainty (Schwartz, 2012), primarily due to the challenges implicit in

knowing the entire universe of prior art before litigation commences and reconciling the case at hand with relevant

prior case law: “difficulty in knowing the relevant facts to the dispute and difficulty in knowing how a trier of fact

will evaluate the facts… knowing the entire universe of prior art is impossible before litigation commences”


3

To address the demanding information landscape in patent litigation, we note that the typical workflow is

accompanied by a diverse set of requirements at different phases of the process -- for instance, exploration of case

law and the technology landscape at the outset of a case, followed by an analysis of semantically and contextually

linked outcomes relevant to the matter at hand, and then assistance in selecting and narrowing in on more specific

artifacts (for example, highly relevant patents) to be used in the preparation for potential litigation. Restricting

ourselves to the last step of the patent litigation workflow, identifying highly relevant prior art is a particular use case

of interest in this paper. A search request (query) to find relevant prior art can be a sequence of keywords or a longer

text document, e.g., even a whole patent application. In this paper we focus on the situation when the purpose of the

search is finding related work that could help invalidate one or more claims of a particular patent. After a patent is granted, its validity can be challenged in litigation or in several post-grant proceedings. In

the majority of these challenges, it is necessary to find and examine a number of documents from a potentially very

large pool of patent and technical literature; that is, establish the relationship of the invention to the prior art. Note: For the purpose of this paper, we do not need to get into the legal differences between these

proceedings. The USPTO has a number of resources describing different types of proceedings, for example,

http://www.uspto.gov/sites/default/files/ip/boards/bpai/aia_trial_comparison_chart.pptx ,

http://www.uspto.gov/patents-application-process/appealing-patent-decisions/trials/post-grant-review and

http://www.uspto.gov/patent/laws-and-regulations/america-invents-act-aia/america-invents-act-aia-frequently-

asked#heading-8. In addition, many law firms provide brief overviews of similar topics, for example

http://www.pillsburylaw.com/post-grant-proceedings or http://fishpostgrant.com/post-grant-review/. See also

https://en.wikipedia.org/wiki/Patent_Trial_and_Appeal_Board and http://www.uspto.gov/patents-application-

process/patent-trial-and-appeal-board-0. Also, we do not need to attend to the differences between different patent

jurisdictions, because the technical problems of text analytics and information retrieval are the same. Finding references potentially invalidating a patent is perhaps more challenging than finding (some)

relevant prior art. For example, the average number of cited references in a patent is about 40

(http://patentlyo.com/patent/2015/08/citing-references-alternative.html), while the number cited in invalidation

decisions is usually less than 5. Arguably, any patent search supporting invalidation has to be very precise. Finding such relevant documents is non-trivial, because many documents refer to the same concepts that

describe the invention at hand, and these documents can appear in multiple patent classes and broad scientific and

technical literature. Moreover, similar concepts, relations and functionalities might be expressed in different words,

so key-word search is not sufficient to find all relevant documents. Therefore this search process is labor intensive,

costly and possibly error prone, even with the support of modern information retrieval tools. Analyzing a collection of patents and related product or scientific literature is also costly, mostly because it

takes time and requires highly trained workforce (lawyers and domain experts). What is important from our

perspective, there are few analytic tools that can support this process. Most of the patent analytics tools analyze

metadata (e.g. https://lexmachina.com/legal-analytics/), for example probabilities of finding a patent invalid based on

statistics on trial location, examination art-unit, etc. (Allison et al. 2014) provide an in-depth analysis of the

“Realities of Modern Patent Litigation” relating “the outcomes (…) to a host of variables, including variables related

to the parties, the patents, and the courts”.

Our goal as technology developers lies in improving patent analytic tools; our goal as researchers is to

understand the obstacles on this path, and finding ways of avoiding them. In this paper we report on some initial

experiments and analyses. As a data set we use the collection of the Patent Trial and Appeal Board (PTAB)

proceedings (https://bulkdata.uspto.gov/data2/patent/trial/appeal/board/). Given the relatively structured form of the

data available and the more streamlined process used in adjudication, we believe that PTAB data represents a unique

training corpus to develop and improve customized tools used in the areas of patent litigation and licensing.

Patent Trial and Appeal Board (PTAB) Data Sets Post grant review and Inter Partes Review (IPR) is conducted at the USPTO Patent Trial and Appeal Board (PTAB)

and is aimed at reviewing the patentability of one or more claims in a patent. It begins with a third party petition to

which the patent owner may respond. A post grant review is instituted if it is more likely than not that at least one

claim challenged is patentable. If the petition is not dismissed, the Board issues a final decision within 1-1.5 year

(http://www.uspto.gov/patents-application-process/appealing-patent-decisions/trials/post-grant-review). (Chien and Helmers, 2015) discuss “Inter Partes Review and the Design of Post-Grant Patent Reviews”

processes and key statistics, including the statistics of case dispositions. USPTO notes that 80% of the IPR reviews

http://www.uspto.gov/sites/default/files/ip/boards/bpai/aia_trial_comparison_chart.pptx

http://www.uspto.gov/patents-application-process/appealing-patent-decisions/trials/post-grant-review

http://www.uspto.gov/patent/laws-and-regulations/america-invents-act-aia/america-invents-act-aia-frequently-asked#heading-8

http://www.uspto.gov/patent/laws-and-regulations/america-invents-act-aia/america-invents-act-aia-frequently-asked#heading-8

http://www.pillsburylaw.com/post-grant-proceedings

http://fishpostgrant.com/post-grant-review/

https://en.wikipedia.org/wiki/Patent_Trial_and_Appeal_Board

http://www.uspto.gov/patents-application-process/patent-trial-and-appeal-board-0

http://www.uspto.gov/patents-application-process/patent-trial-and-appeal-board-0

http://patentlyo.com/patent/2015/08/citing-references-alternative.html

https://lexmachina.com/legal-analytics/

https://bulkdata.uspto.gov/data2/patent/trial/appeal/board/

http://www.uspto.gov/patents-application-process/appealing-patent-decisions/trials/post-grant-review


4

ending with some or all claims invalidated (http://www.uspto.gov/patents-application-process/patent-trial-and-

appeal-board/statistics).

What is in the PTAB data? Patent Trial and Appeal Board (PTAB) publicly available dataset, as of May 2016, has

64 .zip files containing 10.5 GB of data. These files are either image or text .pdf files with PTAB decisions. Each

decision pertains to the validity of claims of one patent. Why care about PTAB data? Because each case has a relatively small collection of highly relevant

documents used as evidence. The outcomes are clear and the reasoning can be modeled. There’s enough data for

statistical inference (although perhaps not enough to train a neural net from scratch).

Roadblocks to Overcome for Effective Patent Retrieval and Analytics A range of issues need to be overcome for patent analytic solutions to be of greater value. They range from

document conversion (from images and pdf to semi-structured text), recognition of patent numbers and other named

entities, text normalization, etc. to support for semantic and faceted search and predictive analytics. The problems of

data preparation, even when non-trivial are solvable, and predictive analytics, based on metadata, was discussed

earlier. Thus the main open question we focus on is the value of semantic search. Earlier we showed how

visualization can improve quality of semantic search for relevant prior art patents (Shalaby et al., 2016; Shalaby and

Zadrozny, 2016). In this context, we evaluate our novel semantic search solution focusing on highly relevant prior

art based on PTAB data. The results we are presenting are preliminary, but informative. We plan to repeat the

experiments on larger data sets, and perhaps quantify better some of the conclusions, but we are not expecting

substantial changes of the results.

Experimental Results For two experiments, we use a sample of data from the USPTO Final Decisions of the Patent Trial and Appeal

Board. We have taken ~90 patent pairs appearing in PTAB proceedings. We were trying to see to what extent we

can use semantic search to find patents cited in the decisions. That is, given a patent whose validity was questioned,

will the other patents quoted in the decision appear among 100 semantic search results, and in what position?

Because of the limitations of our patent corpus we restricted ourselves to patents and applications published from

2002 to 2015. The searches were done using as the query the text of the abstract combined with claim 1. The retrieval

results were filtered to make sure only the patents or applications published earlier were taken into consideration.

Experiment 1. We used 88 patent pairs appearing in PTAB proceedings. Each query consisted of the text of the

abstract and claim 1 of patent under litigation at PTAB. We measured the position (rank) of the patents quoted in

PTAB decision among 1000 retrieved results. Averaging among different parameters of the MSA semantic search

we obtained the following: The overall recall was 21% (that is a highly relevant patent has a 21% chance to be found

among 1000 retrieved documents). However, 10% can be found among top 100 retrieved results; 9% among top 50;

and 8% among top 25. The mean reciprocal rank of for semantic search was 0.037, with no query terms vetting by a

human. The baseline for keyword search was 0.042. Thus neither keyword search nor semantic search give us the

capability to retrieve all, or even the majority of the highly relevant documents.

Experiment 2. To evaluate the semantic relatedness or “conceptual similarity” of the patent pairs in Experiment 1,

we utilized the MSA semantic search algorithm to generate a 20 concept representation of each of the patents. We

did this using different parameters of the algorithm while utilizing abstract and claim 1 of patent text as one input

scenario and then using the first 5000 characters of the patent description as another input scenario. An intersection

of common concept representations between the two patents was performed to evaluate how “conceptually similar”

they were. Across a range of algorithm parameters and input representation, we found the fraction of pairs with

conceptual overlap ranged from 40% to 80%. Using the highest number of overlapping concepts for a pair across

trials, the extent of overlap varied significantly: Less than 5% of the pairs with a perfect match (all 20 concepts the

same) and over 50% of the pairs where at least 5 concepts that matched. As a baseline reference, no overlap was

found between the concept representations of the abstract and claim 1 text and the Description text (first 5000

characters) in over 20% of the patents evaluated.

http://www.uspto.gov/patents-application-process/patent-trial-and-appeal-board/statistics

http://www.uspto.gov/patents-application-process/patent-trial-and-appeal-board/statistics


5

We had hypothesized a much higher semantic or “conceptual” relatedness than was discovered. This

negative result using concept overlap as a measure for relatedness, suggests the need for additional approaches to

capture the likely nuanced judgement of human practitioners in patent litigation cases.

Experiment 3. In this experiment we investigated whether deeper text analysis could help in identifying highly

relevant concepts, which in turn could automatically provide a better focus for a semantic search engine. We started

by parsing a small number (20) of patent claims (Claim 1) using both on-line and server-based parsing tools (Chen

and Manning. 2014, ver. 3.5.2; https://opennlp.apache.org/ vers. 1.6; http://nlp.stanford.edu/software/lex-

parser.shtml; http://demo.ark.cs.cmu.edu/parse; ; http://www.alchemyapi.com/; https://www.textrazor.com/demo ;

http://www.link.cs.cmu.edu/link/submit-sentence-4.html). Our initial impression seems to confirm extrapolations

from computational linguistics literature that parsing of patent claims would be challenging. None of the parses was

correct, however many segments were analyzed correctly. We attribute these results to the fact that (a) leading parsers are not being developed using patents as the

underlying corpus of data (b) patent claims are long and even with a small probability of an error per segment of text,

the multiplicative effect of these errors is large. Average sentence length in the Wall Street Journal Corpus is 19.3 words, ranging from 3 to 20

(Strzalkowski, 1999). And most natural language parsers are trained on similar corpora. In contrast, average patent

claims are longer. For example, on a sample of about 4000 patent related to resilience and sustainability, the average

length of Claim 1 is 147.3 words, ranging from 7 to 1449, with a standard deviation of 91 words, as depicted in this

distribution:

Exhibit 1. Distribution of Claim 1 length in 4000 patents related to sustainability

Note that 93% of the claims in this series are longer than 50 words. We get similar results looking at a week

of granted patents (5200 patents from 2015) where we get the average Claim 1 length to be close to 190 words. Prior research in this area showing how parsing accuracy decreases with the length of the sentence. For

example, Fig.2. in (McDonald and Nivre, 2007; esp. in Fig2.) shows a parsing accuracy drop of 10 points or more

per 40 words. This means that an analysis of the structure for an average Claim 1 is likely to be wrong; similar

results appear in Fig.4. of (Choi et al. 2015). Actually the situation might be worse than these sources suggests. An

analysis of parsing of sentences up to the length of 156 by (Boullier and Sagot 2005) entertain a possibility that

“(full) parsing of long sentences would be intractable”.

Semantic similarity. These experiments point to the difference between different kinds of semantic similarity. In

particular they point that the highly positive results from experiments with word similarity (Shalaby and Zadrozny,

2015) do not carry over to account for semantic similarity of patent segments quoted in PTAB legal decisions.

Understanding semantic similarity of patent segment is on the agenda of our future work, and we imagine it

could be used to support textual entailment (Dagan et al. 2013) analysis on PTAB or litigation data. Thus it is an

https://opennlp.apache.org/

http://nlp.stanford.edu/software/lex-parser.shtml;http:/demo.ark.cs.cmu.edu/parse

http://nlp.stanford.edu/software/lex-parser.shtml;http:/demo.ark.cs.cmu.edu/parse

http://www.alchemyapi.com/

https://www.textrazor.com/demo

http://www.link.cs.cmu.edu/link/submit-sentence-4.html


6

open research problem how to make parsing of patent claims reliable, and similarly it is an open problem how to

combine deeper semantic analysis of claims with semantic search.

Conclusions: Towards a Multistage Approach in Semantic Search Based on the results of our preliminary semantic search experiments and through interviews with practitioners, we

believe that a one-size-fits-all semantic search approach is incapable of capturing the highly nuanced relevance

judgements made in the domain of patent litigation. We propose rather to map patent litigation/licensing work-flow

to relevant Multistage Information Seeking models, for example the ISP model (Kuhlthau and Tama 2001) and , to

inform a multistage approach to search, building on user-driven semantic, pragmatic and interpretive considerations. In particular, we propose to:

● Experiment with different modes of taking the user through the search process - facet based filtering,

visualization of results, classification of claims content, etc.

● Evaluate options for incorporating user feedback into the operations of the relevance ranking of results -

e.g. clustering of output concepts, other recursive/iterative semantic tagging

● Establish the value of interactivity. Have a human use the tool to limit the scope of query expansion, search

documents and gather new statistics

The above will be pursued in the context of specific practitioner use cases, for instance considering applications of

our work to: ● Cognitive assistance for obviousness claim assessment based on relevant point-in-time knowledge proxies

(e.g. given state of industry knowledge in 2003 an average skilled practitioner would know that a novelty in

Blue-tooth could be conceptually extended to WiFi)

● Claim-level predictive inputs for adjudication (probability of instatement/denial), to inform filing and

defense strategies based on prior case law and technical knowledge inference.

● Petition quality assessment and drafting support based on rule checking and classification using prior PTAB

case history and expert guided feature engineering in context of prior rulings, statutes and guidelines.

All of this requires solving several problems, the chief among them is to figure out what to do in the subsequent turns

of the interaction. The other problem area is support for entailment analysis. PTAB data provides examples of entailment, that

is, it gives examples of argument invalidating one or more claim based on specific prior art. It also gives examples

when the claims remain despite relevant prior art. These are examples of sophisticated textual entailment and

reasoning processes. The question is to what extent (a) we can find relevant prior art (b) find potential avenues of

argument (c) help the lawyers explore and judge these avenues. A research question to be asked is whether to do so will likely require completely new methods, or whether

the prior work on textual entailment (Dagan et al. 2013) and/or medical diagnosis support (Lally et al. 2014) might

be relevant.

References Allison, J.R, MA Lemley, & D.L. Schwartz (2014). Understanding the Realities of Modern Patent Litigation . Texas

Law Review 1769 (2014;. Available at SSRN: http://ssrn.com/abstract=2442451 Boullier,P. & B. Sagot (2005). Efficient and robust LFG parsing: SXLFG. Proceedings of the Ninth International

Workshop on Parsing Technologies (IWPT). http://www.aclweb.org/anthology/W05-1501 Chen, D. & C.D. Manning (2014). A Fast and Accurate Dependency Parser using Neural Networks. Proceedings of

EMNLP 2014.

Chien, C.V & C. Helmers (2015). Inter Partes Review and the Design of Post-Grant Patent Reviews. Santa Clara

Univ. Legal Studies Research Paper No. 10-15.

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2601562

Choi,J.D., J. Tetreault and A. Stent (2015) It Depends: Dependency Parser Comparison Using A Web-based

Evaluation Tool. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics

and the 7th International Joint Conference on Natural Language Processing.

http://www.aclweb.org/anthology/P/P15/P15-1038.pdf

Dagan,I., D. Roth, M. Sammons & F.M. Zanzotto (2013). Recognizing textual entailment: Models and applications.

Synthesis Lectures on Human Language Technologies, 6(4), 1-220.

Eisinger, D. et. al. (2014). Developing Semantic Search for the Patent Domain, Proceedings of the First International

Workshop on Patent Mining and Its Applications (IPAMIN) 2014. Hildesheim.


7

Huurdeman, H.C & Jaap Kamps. (2014). From multistage information-seeking models to multistage search systems.

Proceedings of the 5th Information Interaction in Context Symposium (IIiX '14). ACM, New York, NY,

USA, 145-154. DOI=http://dx.doi.org/10.1145/2637002.2637020

Kuhlthau, C.C. &S L Tama (2001). Information Search Process Of Lawyers: A Call For ‘Just For Me’ Information

Services . Journal of Documentation.

Lally, A. et al (2014). WatsonPaths: scenario-based question answering and inference over unstructured information.

IBM Research (2014). RC25489 (WAT1409-048) September 17, 2014

Schwartz, D.L. (2012). The Rise of Contingent Fee Representation in Patent Litigation. Alabama Law Review 335

(2012). Available at SSRN: http://ssrn.com/abstract=1990651

Mangold, C. (2007). A survey and classification of semantic search approaches. Int. J. Metadata, Semantics and

Ontology, Vol. 2, No. 1, pp.23–34.

McDonald,R & J. Nivre (2007). Characterizing the Errors of Data-Driven Dependency Parsing Models. Proceedings

of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational

Natural Language Learning. http://www.aclweb.org/anthology/D07-1013.

Outsell (2014). Patent Research and Analysis Services (Outsell Report). Retrieved from

http://ip.thomsonreuters.com/sites/default/files/m/outsell-patent-research-analysis-rankings.pdf

Shalaby, W. & W. Zadrozny (2015). Measuring Semantic Relatedness using Mined Semantic Analysis. arXiv

preprint arXiv:1512.03465.

Shalaby, W., K.Rajshekhar & W.Zadrozny (2016). A Visual Semantic Framework for Innovation Analytics.

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.

Shalaby, W., & W.Zadrozny (2016). Innovation Analytics using Mined Semantic Analysis. Proceedings of the

Twenty-Ninth International Florida Artificial Intelligence Research Society Conference (FLAIRS-2016).

Strzalkowski, T. (ed.) (1999). Natural Language Information Retrieval. Springer.

About the Author(s) Kripa Rajshekhar founded Metonymy Labs to develop automated text understanding systems help knowledge

workers perform 10X better. Previously, Kripa was the cofounder of EY's Corporate Finance Strategy practice, and

the group's lead Tech Partner. He led over 150 projects for the world's leading corporations and private equity

investment funds, helping them with transformative growth planning, M&A strategy development and business

model diligence. Kripa holds MSEE, Pattern Recognition (Statistical Signal Processing) from the Rensselaer

Polytechnic Institute.

Walid Shalaby is pursuing his PhD in Computer Science at the University of North Carolina at Charlotte. He

graduated from Cairo University with BSc degree in Computer Science. His work focuses on information retrieval

and text mining of technical data. His research interests also include Machine Learning. He published about a dozen

papers on various problems in text analytics and data science. Walid has also worked as a data scientist at

Careerbuilder.

Wlodek Zadrozny joined the faculty of the University of North Carolina in Charlotte in 2013, after a 27 year career

at IBM T.J. Watson Research Center. His research focuses on natural language understanding and its applications in

business. From 2008 to 2013, he was responsible for textual resources in the Watson project, the Jeopardy! playing

machine. He was also the technical leader in building the first application of Watson (for customer care). Dr.

Zadrozny published over fifty refereed papers on various aspects of text processing, and he is an author of over forty

patents. Wlodek Zadrozny received PhD in Mathematics (with distinction) from Polish Academy of Science.

work towards a quantitative model of risk in patent litigation

Data & Analytics