Thursday, June 26, 2014

Why Information Governance is Eclipsing eDiscovery

Everywhere you look right now, Information Governance, “IG,” has taken center stage.  I have had the pleasure of speaking twice in a week on the topic – once to records managers at ARMA’s Northeast Regional Conference and once to litigators at McDermott’s Technology in the Law Symposium.  Why is IG so hot and why is it overtaking the discussion on eDiscovery?

IG is hot, hot, hot
IG is the Next Big Thing because it is a catch-all concept that covers a lot of currently important ground:  compliance, data ethics and breaches, data management and intelligence, workflow, visualization and analytics.  All of these elements play a key part in IG, and always have.  So, why is it hot now?  The biggest factor is the Target data breach of over 70 million customers’ personal data.  The scope of the breach (nearly 25% of all American citizens were affected) and the wall-to-wall media coverage has helped propel responsible data management into the forefront of society’s concerns.  In fact, a Pew Research study in January found that over 50% of Americans are “worried about the amount of personal information available about them…”  The IG train of responsible data management has left the proverbial station and is speeding its way through the legal system, through Wall St., and through consumers’ concerns and buying behavior.  Nothing speaks louder than consumers and their wallets.  For Target, “Satisfactionwith the overall shopping experience was down almost 2 percentage points inMarch, with declines “most acute” among middle-and-upper-income shoppers as late as April, 2014 -four months after the breach was announced.

Why is the IG discussion eclipsing the eDiscovery discussion?
For starters, eDiscovery is old news.  The earliest uses of the phrase stem from 2004, nearly a decade ago, and well before the FRCP changes in late 2006.  Today most litigants, and certainly their outside counsel & advisors are very familiar with its concepts.  In fact, most service providers in legal, lit support or eDiscovery already have a wealth of tools and solutions to choose from.  Need Early Case Assessment?  ESI Processing?  Predictive Coding for Doc Review?  There are a plethora of solutions, all heavily vying for your attention.  The truth is, it’s just not that complicated anymore and the solutions have decreased so much in cost that almost all solutions are accessible to almost all matters.  In short, eDiscovery has become as exciting as word processing or scanning.

But the eDiscovery blahs are only half the reason for the decline in discussion.  The other half is that intelligent IG encompasses eDiscovery.  eDiscovery is subsumed by smart DDC (data, document & content) management, right along with litigation holds, retention policies, workflow routing, exception handling, data breach response and investigations.  Today, eDiscovery is but one of any number of critical activities undergone by major corporations all the time.  It’s just not the fire drill it used to be, and those implementing IG will see to eDiscovery’s needs along the way.  
So, where does that leave us?
Unfortunately, this leaves us woefully and inadequately prepared to handle IG.  The passel of eDiscovery tools do little to solve problems that are much larger than typical litigation matters every imagined.  The the records management side of the house is of little help with their diminished budgets, and dearth of tools available for large-scale data mining and management.  Thus there is promising opportunity for IG-oriented solutions that take the best of both worlds, with an eye towards intelligent DDC management from the outset.  Stay tuned, blog readers, and see where Valora heads next...

Wednesday, May 14, 2014

IBM Watson Runs a Food Truck?!

What to do after winning Jeopardy against the world’s best players?  Open a food truck, of course!  Yes, that Watson is now running a food truck, and apparently it creates some truly delicious dishes!  Confused?  Don’t be.  The intelligence behind the Watson engine that successfully answered hundreds of randomized Jeopardy questions is now the creative engine behind a gourmet food truck.  IBM is endeavoring to show that predictive analytics have uses in the most unusual of places!

As with most predictive analytics, there is still an important role for humans to play in providing balance, judgment and expertise.  Watson does  the data-crunching heavy lifting to find interesting and appealing flavor combinations, faster (better?) than human beings can do on their own, and then trained chefs implement the Watson directions.

This hybrid approach should have a familiar ring to it.  Let the software do the hard, data-intensive number-crunching and then marry that output with human skill and finesse.  It’s a winning combination and one that we employ here at Valora every day.  We utilize our analytics, indexing, and rules platform, PowerHouse, to organize, catalog and find relationships in content for us and then we add the human skill, the expertise, to refine the output and do custom things for specific projects.  

Here’s an example:  We run 50,000 emails and attachments through PowerHouse, which quickly finds well over 150 attributes about each item.  Then we ask PH to find important relationships and insights, such as trend data or topic clusters.  From there, we adapt the rules programming to customize the output so it yields middle initials, or zip + 4, or the top 3 issues per document, or whatever it is that any particular customer needs.  Load it up to BlackCat for easy, online review (often by the client’s workforce) and we’re done.  Predictive analytics mastery!

Now, if you’ll excuse me, I think pork belly moussaka sounds amazing!

Thursday, May 1, 2014

What Do Self-Driving Cars and Documents Have in Common?

In case you missed it, Google had an announcement earlier this week about the rapidly improving reliability of their self-driving cars.  The cars now automatically recognize, pedestrians, trucks, and construction areas and even when a cyclist suddenly veers in front of the car.  Having logged more than 700,000 accident-free miles, it’s an impressive demonstration of a potentially society-altering capability.

So, why mention it here, other than its super-coolness factor?  Because of how it works.  Google’s self-driving cars function because they are taught to recognize patterns.  Patterns of behavior, actions, appearance, movement and trends.  Once a pattern is recognized, the cars’ on-board computer systems run a series of rapid-fire statistical algorithms to determine what is happening (context), and thus what actions the car should take.  Sound familiar?  It should.  This is the same kind of technology behind IBM’s Watson and Valora’s PowerHouse.

Much like Google teaching its cars to recognize a stroller in a crosswalk, Valora teaches PowerHouse to recognize a patent application in Chinese or a break in privilege from an email string.  Google’s vehicles accurately assess and predict traffic behavior patterns almost 100% of the time, much better than human beings.  Valora’s PowerHouse sees similar marks for accuracy and prediction. 

In addition to one day allowing us to text messages or read an e-book while we “drive,” the autonomous vehicles have another enormous advantage:  they better utilize roads, gas and electricity.  These types of benefits have broad-reaching impact beyond whether any one person is using or not using the self-driving car.  The same holds true for autonomous data mining.  Once the document house is in order, everyone benefits from easy, organized search, to intuitive data visualization to automated notifications of significant events.

Too bad I couldn't write this blog entry while on my way to work this morning…

Thursday, March 13, 2014

Interesting Predictions about Data Analytics from Gartner

"Traditional vendors of analytic platforms recognize that in order to expand their reach beyond traditional power users, they must deliver packaged domain expertise and applications to enable self-service by a wider range of users. Service providers are seeking to turn custom project work and domain expertise into repeatable solutions that can be adopted by other organizations more easily.
The result is that end-user organizations selecting analytic applications will have a significantly wider variety of possible providers to evaluate. Organizations evaluating software vendors will almost always find a SaaS version of their packaged applications, and the similarity of product concepts will shift the emphasis of competition to the domain expertise embedded by the vendors into the application. Software vendors will increasingly face a co-opetition situation with their traditional service provider channels, forcing them to augment their own professional service capabilities. Service providers will use packaged applications as an integral part of their customer relationships, implying that there is a greater specialization in the services that they provide."
-Gartner Press Release 12.16.2013

Monday, March 10, 2014

Data Vs. Document Vs. Content

Remember letters?  Typeset documents on official-looking letterhead?  When we communicated primarily via letters, no one wondered what to call the media transmitting information.  It was a Document, plain and simple.  Then came the Internet and websites and eyeballs, and suddenly it was all about Content.  Keeping your content fresh, managing your content, re-using content.  Now it is all about the Data – Big Data, of course.  So, what’s the difference?  Data vs. content vs. document – is there a difference?  In theory, not much, but in practice, yes there is.

Let’s start with Documents.  Documents can be physical or virtual, but they typically have a defined start and end, often delineated by page.  Documents have a specific purpose: they were created by someone, for someone, and they are meant to convey information.  Documents carry with them an air of significance, importance and validity.  That’s why we have phrases like, “Legal documents, financial documents and immigration documents.”  Good examples of documents:  Your tax form, your birth certificate, a receipt from a purchase, your boarding pass.

Content is amorphous.  Though it too can be physical or virtual, it is generally thought of as virtual/electronic in nature only.  Content may or may not have a specific purpose.  It may be written by someone, or sometimes auto-generated.  Content is often not meant to stand on its own, but rather be a supporting player.  Content can be ephemeral, biased and taken out of context.  Because of this, content is not always trusted and carries less validity than documents.  Good examples of content:  blog entries, news, chapters in a book.

Data is virtual.  It is reported, stored or derived from other systems and carries with it a factual and scientific nature.  Data is meant to be bias-free and exist for measurement or tracking purposes.  Good examples of data:  your height and weight, stock prices, bank account balances.

To call information data is to expand on the original intent of what we understand data to be.  However, because our information today is generated and stored electronically, it feels like data, and we (or savvy marketers) have started calling it data.  Thus stored information has becomes data, with all the attached concepts typically assigned to data (factual, bias-free, etc.).  Data, therefore, feels trustworthy and valid – a strong case for managing its exposure.

For more information on the difference between Data and Records, see my article in this month's ARMA newsletter.  When is Data A Record?  (See pages 23-25)