Banking Exchange Magazine Logo

Will "predictive coding" really trim legal bills?

Fact of life: Banks get sued. New method may help, but it's not perfect

  • |
  • Written by  Karen L. Stevenson, Esq. with J. Peter Del Valle
  • |
  • Comments:   DISQUS_COMMENTS
Will "predictive coding" really trim legal bills?


    Automated "predictive coding" of records subject to legal discovery doesn't really cut out the human factor completely. Sometimes, the old-fashioned way may be more effective.

"Big Data" is latest buzz word in the forefront of e-discovery discussions. But "Big Data" is not just for "techies." Lawyers have discovered it too.

Massive quantities of electronically stored information created on a myriad of digital devices has enabled institutions to accumulate an unprecedented volume of information. But it's one thing to have all this electronic data, it's quite another to have effective ways of using the information--especially when litigation arises. Big Data can mean a big headache when bankers and their counsel must search for, analyze, and produce potentially relevant information in discovery.

That's where "predictive coding" comes in.

Predictive coding can be a powerful tool to assist the search for relevant information in huge data sets and, more importantly, as a hedge against spiraling discovery costs by reducing the number of hours of individual attorney review needed to analyze large volumes of data. But this isn't as simple as uploading materials to the cloud for evaluation.

What is predictive coding? 

Predictive coding is a computer-assisted review process that uses algorithms to identify, organize, and prioritize documents based on their relative responsiveness to discovery requests and designated issues in litigation. (Discovery, in this context, is a fact-finding effort, the process of providing documents and records to the other side, in a lawsuit, on its request, according to parameters in the request.)

Now, if all that sounded like gibberish, here's a simpler explanation of the process: 

Traditional linear review for discovery only returns documents found by basic keyword searches. Predictive coding technology, on the other hand, automatically categorizes and prioritizes documents based not only on keyword frequency, but also on other factors such as document type, language, content, party, timeframe, and conceptual meanings to cluster and prioritize similar documents.

As a result, the most relevant, responsive documents are ranked for attorney review at the outset, while otherwise non-responsive and irrelevant groups of documents are culled from the review process.

Predictive coding addresses the shortcomings of traditional manual document review and keyword searches by combining an automated review process with a very involved human review element.

How predictive coding works

Algorithms used for predictive coding vary in sophistication, but their operation is similar. First, the coding tool creates a "seed set" of documents from a small randomized representative sampling of documents. Here the initial human touch becomes essential.

A team of reviewers evaluates the seed set and codes the documents for responsiveness. Using the input from the seed set reviewers, the algorithms can begin to assign predictions of responsiveness to the remaining documents in the database. This step is critical. The coding assigned to the seed set of documents--after the parameters have been entered into the computer to cull iterative sets of documents--serves as the basis for the software to "teach itself" to identify similar types of documents.

Predictive coding tools generally include an iterative function that allows the program to refine the categorization process along the way, where if the initial prediction calls are inadequate, the tool will identify further samples for human review in order to refine the coding criteria. This process continues until the tool can categorize most of the documents in the data set with relative confidence. Once this point is reached, the algorithm can then score or rank the documents in the review set based on likely responsiveness. But this is not the end of the review.

With a data set that has been greatly reduced from its initial volume and ranked to identify the most likely responsive materials, attorneys, skilled in the facts and issues of the case, will then conduct a review of this reduced data set for privilege, responsiveness, and, ultimately, production. By prioritizing relevant documents for review, predictive coding can increase the speed and consistency of the attorney review and thereby reduce the costs the bank would face for legal discovery.

Pitfalls include training and technology costs

Significant cost savings can be achieved by identifying and culling out patently non-responsive documents. These include, for example, emails from non-business, non-relevant, and "spam" senders and non-relevant file types. This would be done prior to application of the predictive coding technology. While upfront costs depend upon the vendor used and the volume of data to be analyzed, the significant reduction of the attorney review time greatly reduces costs.

If Gabriel Technologies Corp. v. Qualcomm Inc., is any indication as to the costs associated with predictive coding, a manual review may, in some instances, be more cost-effective. [Your counsel can find the case as follows: 2013 WL 410103 (S.D. Cal Feb. 1, 2013), Case Number 08cv1992 AJB MDD ]

In that case, the court awarded Qualcomm $2.8 million for fees associated with computer-assisted, algorithm-driven document review (predictive coding) and $392,000 for contract attorneys who reviewed the documents that the predictive coding technology determined were likely to be responsive based on the training it received. Note that that award of close to $3.2 million has been widely cited as an example of the "costs" of predictive coding. It does not include the cost of the front-end work necessary to make the predictive coding possible, for which Qualcomm incurred fees with its outside counsel, which developed and implemented the predictive coding solution.

Qualcomm, through its vendor, had a population of close to 12 million records and used predictive coding technology (in lieu of search terms or other more traditional approaches) to cull down the data set. Subjecting this subset of documents to a human review for further analysis (i.e., relevance) and then using the relevant grouping of documents as the "seed set" for predictive culling, it was determined that less than 10% of the entire population of documents were likely to be responsive.

Vendor selection is another crucial element in the process. The legal and financial consequences for mishandling electronic data are quite severe, as reflected in the rise of spoliation and discovery sanctions opinions. Yet, vendors have been quick to jump on the predictive coding bandwagon, only to have clients find that the offered solutions may fall far short of the expected results.

Cost considerations, while important, should not be the only driving force to determining the vendor selection.

Experience with the software, the professional backgrounds of the parties handling the data, the proven track record of working with populations of complex documents and complex clients, all must be carefully considered when evaluating the capabilities of a vendor and the likelihood of receiving the expected value for the services rendered.

Predictive coding doesn't eliminate attorney review

Predictive coding requires significant human involvement, both in training the technology and evaluating its results. It does not replace attorney review, but is designed to expedite the process and cull the number of documents subject to attorney review. As with any technology, the results of predictive coding will reflect the quality of the human input that teaches the algorithms.

Some remain hesitant to utilize predictive coding as a tool for large-scale document reviews, preferring the traditional manual review of all documents, usually by legions of contract attorneys. But in February 2012, Magistrate Judge Andrew M. Peck, became the first court, state or federal, to approve of the use of computer-assisted coding, in Monique Da Silva Moore v. Publicis Group & MSL Group, S.D. N.Y. Case No. 11 Civ. 1279 (ALC)(AJP). After Judge Peck's DaSilva opinion, skepticism about using computer- assisted review may be waning.

As "Big Data" continues to expand, more and more parties are beginning to look to predictive coding technology as a means of grappling with the scope and cost of massive data review in litigation. Still, predictive coding is not a "one-size fits all" solution. Not every case involves large enough volumes of data to warrant the use of the technology nor the associated costs.

And even when the volume warrants predictive coding technology, the lawyers and vendors you hire to do it must know how to do it right.

About the authors

Karen L. Stevenson is of counsel with the Litigation Practice of Buchalter Nemer, Los Angeles, Calif. She represents clients in e-discovery, complex business litigation, and insurance coverage matters.

J. Peter Del Valle is Practice Supporter Specialist for Buchalter Nemer, Los Angeles, California.

[This article was posted on April 19, 2013, on the website of Banking Exchange,, and is copyright 2013 by the American Bankers Association.] 

back to top


About Us

Connect With Us


Webinar: From KYC to IDV

How three leading banks are utilizing cutting-edge
digital tools to onboard, win, and wow customers

Time/Date: June 23, 2021 11:00 a.m. ET

Digital adoption, already moving at warp speed, accelerated seven years into the future during the COVID-19 pandemic. As the number of bank branches continues to fall, with at least one study predicting all branches will disappear by 2034 (Fox Business) and foot traffic declining (Vox), today’s most innovative banks are charting a new, digital-first path to win over customers while increasing security, meeting KYC compliance requirements, and winning customers to drive revenue.

In this webinar, you’ll hear from John Baird, Founder & CEO of Vouched, Tyler Crawford, COO of Bankers Healthcare Group, Anand Sathiyamurthy, CPO of Flagstar Bank and Daniel Sheehan, Chairman & CEO of Professional Bank as they describe their vision for digital transformation and how customer expectations are changing to digital first. They’ll also explore how fostering an innovation mindset creates new ways to tackle complex KYC problems and allows them to quickly compete in new markets and win customers.


This webinar is brought to you by:
Vouched Logo