Friday, May 4, 2012

3 Drawbacks To Predictive Coding

Valora’s Response to LTN article: Take Two: Reactions to 'Da Silva Moore' Predictive Coding Order

What is missing there, and elsewhere, is a discussion of the specific weaknesses of the overall Predictive Coding technique.  Here are just three drawbacks of the technique: 
  1. PC tagging algorithms are not transparent.  No one really knows why the PC engine "chose" the documents it did.  Typically, the “choosing” algorithm is hidden and not disclosed.  All we know is that somehow the document recognized is a lot like another tagged.  
  2. PC has no checks or balances on the skill set, education, consistency or motivations of the seed set coder(s).  The entire Predictive Coding approach assumes that the seed set coder(s) know what they are doing, and that they are correct, consistent and honest.  Would you defend that position, particularly given that the “human being as gold standard" concept has been roundly deflated (see Blair & Maron, Grossman, TREC, etc.)?
  3. Typically, seed set creation and audit sampling for PC use a random sampling technique, the weakest of all types. 
Other sampling techniques (stratified, cluster, panel, etc.) are aware of document attributes and utilize intelligent groupings to create a much stronger, more representative sample for seed set coding and auditing purposes.

Since at present, all Predictive Coding solutions are products, which means they have limited functionality and flexibility for specific case matters, perhaps we should be thinking about the broader picture of Technology-Assisted Review (TAR) as a service – customizable, measurable and transparent.