Valora’s Response to LTN article: Take Two: Reactions to 'Da Silva Moore' Predictive Coding Order
What is missing there, and elsewhere, is a
discussion of
the specific weaknesses of the overall Predictive Coding
technique. Here
are just three drawbacks of the technique:
- PC tagging algorithms are not transparent. No one really knows why the PC engine "chose" the documents it did. Typically, the “choosing” algorithm is hidden and not disclosed. All we know is that somehow the document recognized is a lot like another tagged.
- PC has no checks or balances on the skill set, education, consistency or motivations of the seed set coder(s). The entire Predictive Coding approach assumes that the seed set coder(s) know what they are doing, and that they are correct, consistent and honest. Would you defend that position, particularly given that the “human being as gold standard" concept has been roundly deflated (see Blair & Maron, Grossman, TREC, etc.)?
- Typically, seed set creation and audit sampling for PC use a random sampling technique, the weakest of all types.
Other sampling techniques
(stratified,
cluster, panel, etc.) are aware of document attributes and
utilize intelligent
groupings to create a much stronger, more representative sample
for seed set
coding and auditing purposes.
Since at present, all Predictive Coding
solutions are
products, which means they have limited functionality and
flexibility for
specific case matters, perhaps we should be thinking about the
broader picture of
Technology-Assisted Review (TAR) as a service – customizable,
measurable and
transparent.