Monday, November 12, 2012

Redaction? There’s an App for That..

Raise your hand if you've either participated in or managed a group of contract attorneys sitting in rows of cubicles (or "stations") redacting documents for production. Whether your tool of choice was a black marker or a cursor, I'm betting it was a thankless, tedious, pain-staking job and you hated it. Wouldn’t it have been nice if you could've taught the computer how to recognize the patterns of PII (private identification information) and have it make the redactions itself?

Well, guess what, Valora heard your anguished cries and we've built an AutoRedaction engine that rivals any group of manual redactors. With blazing speeds, impressive accuracy and astounding savings, our PowerHouseTM system literally autoredacts paper and ESI documents in seconds.

Capitalizing on our extensive experience with pattern-matching technology[1], Valora has built a custom software program that automatically determines the presence of sensitive PII, confidential or privileged information, and then redacts out that information on the image. AutoRedaction takes the form of a black block, with or without a representative stamp, such as "Redacted" or "Employee 123." Redactions can be made permanent, such as for production purposes, or kept temporary, with a technique for "lift and peek," when desired. Redactions can also be made to the underlying text, or on both text and image, if desired.

[1] For more on Probabilistic Hierarchical Context-Free Grammars, see this link on Google Scholar.

Thursday, November 8, 2012

Statistical Pattern Matching Accurately Predicts Presidential Winners and Electoral College Counts, Why Not Privilege and Responsiveness in Litigation?

The technology utilized by political statisticians is finally getting the attention it deserves.  Not because it is partisan, but because it is accurate.  The excellent article in today’s LA Times explains how mathematical models predicted the election outcome well before the first polls had opened. How? By taking the information from numerous sample sets and re-modeling over and over again with different assumptions and weightings. If this sounds a lot like statistical sampling and pattern-matching, then you have been paying attention! The techniques used by the Nate Silvers of the world to classify and label voting patterns are being used right now in litigation to “predict” (or diagnose, if you prefer) for privilege, responsiveness and issues.

At Valora, we call this technique Probabilistic Hierarchical Context-Free Grammars, but others have shortened it to Statistical Pattern Matching, which works just fine. The point is that information about documents (or voter behavior or music choices) has been available for a long time. The only missing piece is the human comfort level with statistics and probabilistic systems.

If the statisticians can call elections, baseball winners and consumer preferences, isn’t it time we let them loose onto document analysis and review? If you’d like a primer on or a demonstration of Probabilistic Hierarchical Context-Free Grammars in litigation, contact us at