The technology utilized by political statisticians is finally getting the attention it deserves. Not because it is partisan, but because it is accurate. The excellent article in today’s LA Times explains how mathematical models predicted the election outcome well before the first polls had opened. How? By taking the information from numerous sample sets and re-modeling over and over again with different assumptions and weightings. If this sounds a lot like statistical sampling and pattern-matching, then you have been paying attention! The techniques used by the Nate Silvers of the world to classify and label voting patterns are being used right now in litigation to “predict” (or diagnose, if you prefer) for privilege, responsiveness and issues.
At Valora, we call this technique Probabilistic Hierarchical Context-Free Grammars, but others have shortened it to Statistical Pattern Matching, which works just fine. The point is that information about documents (or voter behavior or music choices) has been available for a long time. The only missing piece is the human comfort level with statistics and probabilistic systems.
If the statisticians can call elections, baseball winners and consumer preferences, isn’t it time we let them loose onto document analysis and review? If you’d like a primer on or a demonstration of Probabilistic Hierarchical Context-Free Grammars in litigation, contact us at valoratech.com.