Wednesday, November 27, 2013

Data Analytics Steal the Show at DC Technology in the Law Symposium

I was delighted to serve as a panelist at the Technology in the Law Symposium, held earlier this month by the DC Bar Association and McDermott Will & Emery, LLP.  Three panels spoke on the use of predictive technologies and analytics and their use in the courtroom, for eDiscovery and well beyond.  Panelists ranged from outside counsel litigators, to DOJ government attorneys, to service providers and consultants, with Hon. Judge John Facciola presenting the keynote.  The lively, and at times contentious event featured three broad topics:  What the Courts are Saying About Predictive Coding, Predictive Coding Pessimists v. Optimists, and the Use of Data Analytics in Other Areas.

“There’s no greater compliance training than grand jury subpoena.” John Kocoras, Partner, McDermott, Will & Emery

While much of the early discussion around Predictive Coding was, well, somewhat predictable by now, (Key messages:  PC is good, it is proven for doc review, courts are onboard, don’t be reckless with it), the panelists and audience really became animated about the uses of predictive analytics beyond simple relevance review.  Panelists John Kocoras (MWE), Kristian Werling (MWE), Sandra Serkes (Valora) and Kurt Michel (Content Analyst) described scenarios where predictive technologies were being used in multiple corporate settings to assess M&A documents, contracts or financial statements and seek out areas of corporate compliance exposure.  At one point, moderator Jason R. Baron (Drinker Biddle) jokingly asked the panel whether such technologies could accurately predict legal case outcomes.  Answer:  Yes, within reasonable margins of error.

In all, the inaugural Symposium was a roaring success for the 150+ attendees from all over Washington, DC and beyond.  MWE is hoping to expand on their initial success and present several more related symposia in 2014.  Well worth the free attendance!

Thursday, October 17, 2013


Some people might wonder why a “lit support vendor” would be attending the ARMA National Conference in Las Vegas.  Truth is, Valora’s capabilities have been exceeding “lit support” for a long time.  We find kindred spirits in ARMA, because we are looking at the larger world of corporate documents – for lots of purposes, litigation being just one of them.  In the last 18 months, we have seen tremendous convergence between traditional litigation and eDiscovery with Records Information Management and Information Governance.  In fact, last month I gave a presentation to the NYC chapter of ARMA on “5 Things Litigation Can Teach Records Management and 5 Things You Can Teach Them.”  (Let me know if you’d like a copy of the slides.)  We are going to see more and more of this kind global information management, where litigation is but one use of an organized and controlled data governance strategy.  Watch this space for more on this topic in the weeks to come.

Thursday, October 3, 2013

25 Cool Valora Things

I am often asked, “what’s the coolest thing Valora has ever done?”  That’s a toughie because Valora does a lot of cool things and I would be hard-pressed to pick just one.  Having just gotten yet another totally awesome request yesterday, I decided to compile a list of The 25 Coolest Things Valora Has Ever Done.  

If you think we forgot some, email me at:  And, if you’d like to learn more about any of the real-world scenarios on that list, just email or call.  We’d be happy to share our stories with you (to the extent we are able).

And, finally, here's a bonus cool thing:  an Auto-Generated Word Cloud for the content on this page.

  1. Re-orient and AutoCode documents presented in "mirror writing"
  2. Capture the Japanese "Showa" Date off of documents
  3. Assess long distance spending habits by analyzing multiple years of corporate phone records
  4. Create metadata for (paper) documents from 1901 - 1925, including Near Dupes
  5. Analyze credit card receipts to determine his & hers spending habits for a high profile divorce
  6. Automatically determine if documents are Classified
  7. Identify buildings by address, building number or building name (e.g., "Trump Tower")
  8. Index video files, with generated stills that correspond to key phrases & topics
  9. Uncover an "inappropriate relationship" within standard business communications
  10. Identify likely missing documents from email chains, custodians and shared drives
  11. Code work product documents that included Valora invoices and emails in them (talk about recursive self-reference!)
  12. Determine which applicants were lying on their hiring application
  13. Translate documents to/from Japanese, German, French & English to each of the other 3 languages
  14. Identify bodies of water in documents
  15. Analyze shipping records to identify unusual purchasing behavior
  16. Select "best" versions from multiple reports and coverage of the same event
  17. Audit the results of Onshore Doc Review vs. Offshore Doc Review vs. AutoReview
  18. Determine what type of information was likely underneath document redactions (blackouts)
  19. Identify the cell phone of an NBA player
  20. Match 25,000 index cards with appropriate database records
  21. Redact out ages of minors (no redactions for 21+)
  22. Review documents for 162 unique "Issues"
  23. AutoUnitize a 300,000 page PDF into "logical" documents
  24. Graph potential smuggling routes based on email traffic and news reporting
  25. Index 30 million records in 3 months (that's over 300,000 records every 24 hours)

Wednesday, July 17, 2013

Specialized Knowledge, Skill, Training and Education

This entry is provided by guest blogger, Aaron Goodisman, Valora’s Chief Technology Officer.

Oh, I feel for D4; I really do. Let me explain:

Law Technology News reports on a case in which D4 Discovery acted as litigation support vendor for both defendant and plaintiff, albeit at different times and performing different functions. Naturally, when defendants Nixon Peabody (working for Kaleida Health) found out, they objected to U.S. Magistrate Judge Leslie Foschio, but he refused to disqualify D4 as a vendor for the plaintiffs.

Sounds like a win for D4, no? As a vendor with many clients in the litigation support space, Valora doesn't like to turn away work any more than the next guy. And, as professionals with over a decade of experience in the legal field, I'm confident that we could maintain appropriate walls of confidentiality between project teams, as D4 asserts they have done.

The problem lies in judge Foschio's rationale for the refusal to disqualify. What the judge essentially said is that D4's scanning and objective coding for Nixon Peabody does not include expertise or consulting, and that it did not expose D4 to any confidential information about the case or Nixon Peabody's case strategy. As an experience scanning and coding provider, this is simply incorrect.

“Objective” coding refers to tagging documents with information that can be objectively determined, without rendering any kind of opinion. In this regard, at least, judge Foschio's rationale makes some sense. That type of information is sufficiently objective that Valora uses software to determine it for most documents. No opinions there.

On the other hand, the design of a scanning and coding project is absolutely a consulting activity: which information is captured for which types of documents, which collections get extra information tagged, which are fast-tracked, which get an extra quality control pass. How the various containment and attachment relationships are captured among documents, folders, binders, boxes. To an experienced litigation support person, those decisions speak volumes about the case.

For proper and accurate scanning and coding to have occurred, D4 had to have access to, and indeed looked at, every single one of the documents in the case, including any that Nixon Peabody later withheld as privileged.

Again, I have no reason to believe that D4 violated their confidentiality responsibilities to either party, nor does it appear that Nixon Peabody is claiming that. Rather, what's happening here is that a judge has said that the services D4 provides do not require “specialized knowledge, skill, training or education.” That's just wrong.

Valora's clients come to us precisely because we provide those things. Kaleida continues to maintain that D4 should have been disqualified from working with the plaintiffs. I'm sure it's standard legal practice, but it feels like somebody's defending the value of such services, at least a little.

Thursday, May 9, 2013

Technology-Assisted Essay Grading

The NY Times recently reported on the growing use of automated essay grading systems, what we in the legal & records space might call "Technology-Assisted Grading," or "TAG." This is yet another instance of the rest of the world utilizing predictive technologies in conjunction with statistical pattern-matching to create an ultimately subjective judgment of the content of a document. Even more interesting than the fact that MIT & Harvard are making this technology available for free via edX (my alumni donations at work??), is that the higher education community is having the same heated arguments that are occurring right now in the legal arena. Here is the best comment from the piece:

"Although automated grading systems for multiple-choice and true-false tests are now widespread, the use of artificial intelligence technology to grade essay answers has not yet received widespread endorsement by educators and has many critics.."

Sound familiar? It should. This is exactly the argument raging now by outside counsel (playing the part of professors in the article) attempting to hold onto their turf, once considered "untouchable" by technology. While it is true that computers can't "read" either student essays or litigation emails, they can be trained to recognize the salient elements that make the essay strong or the litigation email privileged. Those traits are easily describable as Rules. Either a document fits the Rules, or it doesn't. Nuances are accounted for with confidence scoring and sampling for accuracy (precision & recall). As long as there is sufficient auditing and exception handling, the work quality should be outstanding at a fraction of the time and expense of the purely manual method.

If recent advances have taught us anything, it is that nothing, and certainly no job function, is immutable. It doesn't matter whether the work task is rote (like tightening bolts), cerebral (like computation) or subjective (like analysis), it can all be done by the right algorithms, utilizing the proper training, feedback and statistical sampling.

Furthermore, when subjective work product is automated, society gains impartiality, consistency, speed and reduction of cost for the same services. That allows us to do more, work faster and create better results with fewer resources – the very definition of progress.

It is time to stop fighting the obvious, accept the reality, incorporate the efficiency gain and move on. I'm ready for my essay grade, please.

Thursday, March 14, 2013

Sprechen sie deutsch? Parlez-vous fran├žais? You do now!

Remember in Star Trek when the "away team" would encounter a new civilization and be instantly able to communicate with the alien beings by using their handy "UniversalTranslator"? No Tower of Babel in science-fiction! Well, there needn't be one in today's document environment either! With the recent great strides in pattern recognition and content translation, we effectively have a Universal Translator for document and files written in virtually any world language. With support for 65 world languages, Google is to thank for the raw translation effort, while Valora has taken things to the next lelve by implementing the raw capabilities into complex litigation and records management workflows.

For example, on a recent matter we rapidly AutoTranslated documents from 5 foreign languages into English, where they can now be easily understood and managed by the US litigation team. The whole effort took under a week and was 1/10th the cost of manually translating the same material! As with most Automated Solutions Valora offers, a little technology goes a long way! Learn about Valora's AutoTranslation services, by clicking here.

Wednesday, January 9, 2013

12 Tips To Get The Most Out of Technology-Assisted Review ("TAR")

  1. Decide which TAR approach best fits your needs and how you plan to deploy the solution: Do you want the seed set, predictive coding approach or the pattern-matching, rules-based approach? Seed set is good if you don't really know what you want, or you like to "decide on the fly." Rules-based is good if you know what you're looking for and can explain it (similar to how you would train contract attorneys for a large-scale review).

  2. Similarly, do you want TAR as a service or do you want to install a product? Products are typically lower-cost, but less featured or customizable to your specific needs. Services typically cost more, but usually include expert analysis and consulting as part of the package. One consideration in product vs. service is how frequently you encounter a need for TAR and how similar each instance is to the next. Higher frequencies would lead you towards a product, but low similarities across needs would lead you towards services. Remember to include both hard costs (typically dollars outlaid) and soft costs (such as training time and expenses, storage needs, platform support, etc.) in your analysis.

  3. Be realistic about how much you will rely on the coding performed by tool or process and what level of QC you will require. Will you eventually have "eyes on" every document or will you only put "eyes on" subsets of the documents based on relevance or issue criteria? Understanding this early will help you to make the right decisions on pricing, implementation and staffing.

  4. Get comfortable with pricing metrics conversions. Some solutions are sold per document or file, some per GB and some per hour. Here's how to translate between those metrics. Assume: ~ 6,000 docs/files per GB (post processing), and ~ 50 docs/files reviewed per person per hour. Now you can compare pricing for different solutions!

  5. Be explicit about your needs. Do you want a simple yes/no answer for privilege or do you want to know which types of privilege are being invoked? Ex: attorney-client vs. work product. Same for relevance. Is it enough to know simply that a document is relevant or do you need to know why it is relevant (and/or to what degree)?

  6. Map out your workflow and strategy. You (or your client) will need to defend your document production approach. For maximum defensibility, make sure your process is repeatable and transparent. Be wary of TAR solutions that do not disclose why or how propagated decisions are made. Be similarly cautious of solutions that yield different results when different people are "manning" them. Furthermore, make sure that the provider will back you up by providing tangible proof to support the defensibility of the process.

  7. Understand that TAR is an iterative process. The more guidance and feedback you provide, the stronger the results will be. Do not expect the first round to be perfect. You and the systems will both get better over time. As a rough rule of thumb, expect 4-5 iterations.

  8. Think about Exception Handling. Even the best TAR solutions will encounter "problematic" documents from time to time. How will you handle hand-written documents, custom application files or documents written in foreign languages? A good TAR solution should be able to easily identify the docs/files it cannot handle and remove them from the automated processing queue. In other words, don't pay twice for documents that will ultimately need manual processing.

  9. Make good use of Issue Codes. Most sophisticated TAR solutions can handle multiple Issue Codes, providing very helpful tagging and organizational information for Hot or Responsive documents. A consultative TAR solution provider can help you maximize your Issue Codes protocol so that it complements and enhances the production.

  10. Be aware of potential privacy concerns. Many document collections have sensitive or personally identifying information (PII) in their contents that cannot be openly shared. Sophisticated TAR techniques identify, cull and/or automatically redact this information prior to production. TAR approaches can save many hours of manual effort to cleanse data for production.

  11. Choose your solution carefully. Expect that your needs will change over time, both in general and across a single matter. Ideally, the solution provider has full, unfettered access to the TAR engines, so that they can be custom-tailored to your (or your client's) exact circumstances. Be wary of "one size fits all" solutions.

  12. TAR Beyond Document Productions. TAR has uses far beyond review for responsive and privilege. Consider utilizing the techniques when you (or your client) are on the receiving end of a large volume of data. TAR processes can be extremely cost-effective at organizing, cataloging and indentifying trends and data threads in incoming material.