Xpriori Technology Assisted Review including Predictive Coding

"OrcaTec's powerful, visual tools, including predictive coding and concept search, have proven themselves to be time, cost and effort savers for P&G in an IP dispute. Ultimately, in addition to helping meet the deadline, using OrcaTec meant fewer, more responsive documents were reviewed  by fewer attorneys, providing P&G with a tremendous cost savings in that matter as well."

— Lynne M. Miller
Senior Counsel, Litigation
The Procter & Gamble Company

Featuring OrcaTec's Patented Near-Dupe Processing
Now with Redaction!



Technology Assisted Review has been independently validated.

See Maura R. Grossman and Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, XVII RICH. J.L. & TECH. 11 (2011), jolt.richmond.edu/v17i3/article11.pdf. This article will change the way you think about eDiscovery and automated legal document review!

NEW — Rand Corporation Report endorses use of Predictive Coding — go to http://www.rand.org/pubs/monographs/MG1208.html.


Use Predictive Coding to Scale and Speed eDiscovery Process

With document productions often exceeding hundreds of thousands – even millions – manual review is too time consuming and expensive. Studies show that manual review is only 20% as effective as predictive coding and it is vastly slower. WITHOUT IT YOU WILL FAIL TO MANAGE DISCOVERY ON A DATASET TO ANY EFFECTIVE STANDARD...YOU WILL LOSE YOUR CASE!!!

Think of it. The predictive coding capacity has been shown empirically to have 5 times the “recall” accuracy of manual review. Recall (usually measured by document count) is measure of how much potentially relevant information is found in a dataset.  In this case, predictive coding has been shown to identify more than 80% of the potentially relevant information. By contrast, humans – even in iterative processes – were shown to identify no more than 20% of the potentially relevant documents.  By coupling disciplined methodology with predictive coding, recall can approach the elusive 100% success rate.

And predictive coding is only one aspect of Technology Assisted Review.

What is Technology Assisted Review?

For Xpriori, TAR is a broad term which encompasses Predictive Coding, Concept Search and Clustering, and (potentially) other algorithmically based technologies that associate files based upon content similarity. Such technologies are a significant improvement over earlier tools which only allowed users to search by keyword, name, proximity or file location.

Our Concept Search and Clustering fully automate the exploration on content and views content without regard to file location. In effect, folders and hierarchies are stripped away so the content of each document can be identified and compared without restriction. Context and other spatial concerns are brought back into the picture only after the content associations have been made.

Content is organized into new datasets to support an argument help prove a case or decide an issue; such facts, status updates, and scenarios are unlikely to exist in the source documents or reports generated by the source data, simply because the documents were originally created for a different purpose. 

Using Concept Searching and Clustering To Explore and Assess Content

Concept Searches

Search on a keyword, say Jones, and you find the documents that contain the Jones. Do a concept search on the key word and it will associate the content of the “Jones” documents with the content of all other documents in the data set. Documents containing similar or associated content will be returned even if they do not contain the word “Jones”. Suppose there was an email from Smith to Williams about the same subject. That email would be returned as well.

Concept Searching technology obviates the need for external thesauri, taxonomies, or ontologies which, while educated, were just guesses anyway. Using the internal language model, the system can be used to uncover previously unknown concepts and relationships of these concepts without guessing. Concept searching enables a more complete inquiry, allowing researchers to retrieve really relevant information more quickly and with greater accuracy than ever before. 



The Xpriori system independently produces clusters of documents with associated content; the reviewer looks at the contents of each cluster and rates it as responsive or not. This is especially useful where the reviewer has little knowledge of the dataset. It has been used successfully where reviewers take the time to review several documents in each of the clusters and rate them. The difference between Clustering and Predictive Coding is that, in Clustering, the system does the first pass review and independently creates the associations.

The System presents “clusters” in the form provided in the figure below. Reviewers select a “cluster” and review several of the documents presented. The clusters are presented in order based upon the number of associations recognized. Reviewers assess the clusters and retain those which have relevance.


Using Predictive Coding To Build Discrete Sets of Documents with Relevance

Reviewers are presented randomly generated subsets of the document set – 100 documents at a time -- and grade them as responsive or not to particular issues. The system then utilizes the content choices of the exemplars to associate them with other documents in the set to build document sets. Exemplars from other data can be used as well. 

As the review continues the system will start to reflect its opinion as to responsiveness of the documents presented. Differences are isolated and resolved by the reviewer. The reviewer is always in control. Results are presented on tables in a dashboard.


Typically, after a few hours of review on large document sets, the potential differences are resolved and the reviewer can stop his process and rely upon the system.




Social Sonar Visualization 

Social Sonar Visualization which combines a timeline with social network analysis – connecting senders and recipients of emails as a function of time and emails as they were distributed. Red dots are senders; white, recipients; blue lines, emails; and concentric circles represent dates in time.


Social Network Visualization

Social Network Visualization which automatically displays
communication patterns of emails.




▶ Search Suggestions automatically return a selectable list of words and phrases from the collection (including alternative spellings), significantly improving the “find ability” of documents.

Near Dupe — Featuring OrcaTec's Patented Near-Dupe Processing

▶ Near Duplicate Identification brings together documents that are nearly identical, enabling users to compare and make faster, more consistent document decisions.

Email Threading

▶Email Threading organizes emails into conversations, revealing the context of the communication and reducing review time by 50% or more.



Please contact us for more information, or click here to register and to get access to the product demonstration.