Tuesday, December 1, 2009

Stuck in the Past

Remember rotary phones? Black & white tv? Vinyl albums? While these have all been long ago replaced by better solutions, there are still many of them in actual use today. It’s hard to imagine, but did you know that the old touch phone (last updated in 1968) is still being produced and sold today1?

I am mentioning this because it is analogous to what is going on with electronic discovery and duplicate files. Up to 10% of electronic discovery populations are never de-duped at all, according to a recent survey of top EDD providers. Valora was fortunate to participate in the survey and see early access to its results, but they weren’t pretty. It seems that nearly half of all electronic discovery populations get, at best, a simple within custodian de-dupe effort, while the results of cross-custodian de-duping are clearly spelled out in financial terms.

Why is this? Is it just that some customers are obstinate and refuse to try anything new? I doubt it. I think it is the same reason that people still use their rotary phones and listen to records. They are comfortable, familiar, safe and well understood. Simple removal of duplicates inside a single custodian, or no removal of duplicates is comfortable and easy to understand. Furthermore, it cements the idea that all files or documents need to be assessed individually – a belief that the legal community seems terrified to let go, even though all signs point to its eventual demise. Those who would cling to no de-duping or inferior de-duping are the same people who will avoid population analytics and automated review. Be glad they practice law and not medicine.

PS. In case you were wondering: Almost 1 billion vinyl albums were sold in 2007, up 15% from 20062.


[1] Source: Wikipedia http://en.wikipedia.org/wiki/Model_500_telephone

[2] Source: Time magazine http://www.time.com/time/magazine/article/0,9171,1702369,00.html

Wednesday, August 5, 2009

Doc Review Metamorphosis

Most people today communicate via email far more than they do via telephone (which in turn people use far more than letter-writing). It wasn’t always this way, of course, but most people do not realize that email has been around in one form or another since 1984! It took effectively 25 years for email to come to dominate as the preferred communication method, particularly for business.

How long will the Doc Review industry take to evolve to true non-linear[1] activity? Probably in less time than you think. For comparison, the transition from letters to phone dominance took more or less 60 years (1900-1960). The transition from phone to email took 25. And now as we evolve from relatively static email to far more dynamic Twitter, texting and live data/video feeds, the evolution will be even shorter, generally expected at 5-7 years. As a society we are becoming faster at adapting to change. The most sophisticated review teams are just now experimenting with non-linear review. The rest will be there within 3-4 years.

Already we see signs of the evolution to non-linear document review. Most people are at least aware of the simplest form on non-linear review: near duplicate detection. With near & exact duplicates grouped together into “Dupe Groups,” the very first level of non-linear review is taking place. With Near Dupe, savvy reviewers actually look at a group of documents together, rather than one by one. Documents are usually grouped together by content or attribute similarity, with a group “captain” embodying either the fullest set of content or a logical start and end-point to the logic chain grouping the documents together.

A similar technique is called Email Thread Grouping (ETG), where pieces (“stringlets”) of email conversation threads that might be resident in pockets of document storage mechanisms are brought together logically. Because typical ESI collection involves documents from several custodian sources, often who have important communication relationships with one another, the incidence rate of disassociated email conversations is extremely high. ETG bring together groups of documents from Inboxes, Outboxes, folders, different email storage systems and even different collection sites! By grouping the conversations together from all custodial sources, the review has taken a second step toward non-linear review.

Finally, the courts and the industry are waking up. It’s not going to be a “doc-by-doc” world much longer. Will you be ready? Where can you turn for smart, unbiased information?

The Electronic Discovery Institute is a non-profit organization dedicated to resolving electronic discovery challenges by conducting studies of litigation processes that incorporate modern technologies. They recently conducted a survey which broaches the subject of electronic document deduplication; the beginnings of non-linear review. Have a look and let us know what you think.


[1] Non-linear review is the concept of reviewing documents in bulk fashion, rather than one by one in sequential order.

Monday, July 27, 2009

Name That Service!

Remember the game show “Name That Tune[1]”? Contestants had to listen to the first few strains of a song and name the title. “I can name that tune in 4 notes!” they would say. Well, now it’s time to name something else – Valora’s new service offering!

After years of “black box” services provision, Valora is finally opening up its interface to our customers. For those of you who have been to Valora’s Processing Center or seen one of our FirstLook Population Analysis Reports, the interface will have a familiar feel.

To those who are new to such services, we are creating a combination Early Case Assessment Tool & Non-Linear Review interface.

The idea is really very simple: show our customers easily and intuitively what we here at Valora already know about their documents. For years, we have been identifying and documenting every possible attribute known to mankind[2] about each and every document that passes our threshold. More recently, we began to use that knowledge in a cross-functional, population analytics way to help diagnose what populations contain and how best to stretch processing dollars accordingly.

Now we are using the same information in a predictive, pre-emptive way to automatically accomplish much of the tasks that today take place manually. Similar to how Valora solved the expensive, inefficient manual coding problem a few years ago, we are now solving the problem of getting both document processing and document review down to a few cents per GB.

While we anticipate public release of our system in January, 2010, we are opening up our Beta program to three select matters late in the summer. Two of the three Beta partners have already been selected, but there is room for one more. If you are interested, contact our Marketing Department at 781.229.2265, mktg@valoratech.com for more information about Beta requirements and expectations.

And finally, the contest! You may have heard via email, and it’s true, that I will personally come to your home (especially if you live in Hawaii, Bermuda, south of France…) and cook you and your family a gourmet dinner! I’m a pretty good cook and I take requests. So, think about Valora, think about our new ECA-NLR platform and most importantly, think about a name! Official submission rules are on our website: Contest Rules! May the best name win!


Name That Service!

[1] For a modern-day (and incredibly time-wasting, though hilarious,) version of Name That Tune, visit: http://www.namemytune.com/nmt.asp.

[2] Today, Valora captures over 60 “fields” of information about each and every document in our systems.

Monday, June 22, 2009

Why are we so excited about our GSA Award?

Well, for starters, we are able to sell all our services at the top echelons of local, state and federal government. We are now part of a select list of “ok to use” government suppliers. This means Valora will be actively solicited by government agencies across the nation.

But, what it really means is that we have garnered an impressive seal of approval on our collective lapel. The Government Supply Agency (GSA) really does its homework on who may become an official supplier of goods and services. The goal is for government agencies to streamline their purchasing by using a pre-vetted supply source. But the upshot for everyone else is that Valora has earned this special position by performing admirably over a number of years. Here’s what Valora had to show in order to become a GSA supplier:

  1. That we have several years of outstanding and honorable service delivery and business dealings. GSA surveys a minimum of twenty (!) references to create an “OpenRatings” score for each GSA applicant. We are extremely proud of our OpenRatings score, having scored a whopping 92/100 total possible points!
  2. We must establish fair pricing and justify each and every cost item. We have to indicate every discount, every bundling variation, and every custom software/services configuration.
  3. Our staff had to undergo strong background checks, including financial, criminal and employment verification. We had to prove our company employs American workers, pays fair wages, holds appropriate insurance policies, is up to date in its federal and state tax payments and so on.

In short, we had to prove we are an organization in good standing, that provides strong-quality work products at good prices, and is comprised of decent, hardworking and upstanding people.

Valora received our certification in under 6 months – an outstanding accomplishment, which I think speaks very much for itself. Well done, Valora team! Well done!


Monday, April 27, 2009

Electronic Files Rehashed

This month’s blog entry is provided by guest blogger, Aaron Goodisman. Aaron is Valora’s Chief Technology Officer and Vice President of Engineering.

To a lot of people the word “hash” conjures up visions of leftover corned beef and onions (or, in some parts of the country, barbecued pork) – delicious. In the world of computers and electronic files, however, it’s less about chopped up meat and vegetables, and more about chopped up files and documents.

A “hash” (or more completely a “hash code”) is a kind of electronic signature of a computer file. The data bytes that make up the file are processed through a hash function (chopped up) to produce a short, fixed-length snippet of data that can be used to refer to the original file. Since the resulting hash code is short, it’s easy to store a lot of them. It’s also quick to send them across a network or compare them to each other – much quicker than doing the same thing with the whole file.

Web browsers and web servers use hash codes to decide which web pages to refresh. If the browser has a copy of a file stored locally on your computers hard disk (e.g., the header graphic on this page), it sends the hash code of that to the server to ask if the file has changed. The server compares that hash code to the hash code of its version of the file. If they’re different, the server transmits the new header graphic file; otherwise, it tells the browser to go ahead and use its local version. Sending just the hash code takes much less network bandwidth than sending the whole graphic, making pages load faster and reducing the overall load on the network.

The Litigation Support and Computer Forensics industries use hash codes, too. Rather than processing every file on a custodian’s hard drive, savvy practitioners skip over duplicate files, useless files or malicious files by computing the hash code of each file on the disk and comparing it against lists of hash codes of files known to be useless, malicious or already processed. Because the hash codes are small, the lists are easy to manipulate and fast to search.

For all this to work properly, the hash function needs to have several important characteristics:

  • It needs to be fast to compute the hash code of a file (otherwise it would defeat the purpose of being able to do quick comparisons)
  • The hash code of a file needs to change if you change the file, even a little bit (otherwise the web browser wouldn’t know to download the new version)
  • It needs to be extremely difficult to create a file with a specific hash or to create two files with the same hash (called a “collision”).

The last of those is particularly important for electronic file processing, because it wouldn’t do for the critical document in a case to be removed because it accidentally happened to have the same hash code as a standard Windows library file or some non-critical document already processed. Neither would we want a virus writer to be able to manipulate a file to contain a virus, but have the hash code of a different file known to be safe.

Fortunately, over the years a lot of hash functions have been developed and thoroughly tested. One of these hash functions is called MD5, developed in 1991 by MIT professor Ron Rivest. This hash function became extremely popular with the growth of the internet, the proliferation of electronic files, and the widespread use of hashing techniques for various purposes. In fact MD5 became so pervasive that some people in the Litigation Support industry use it as a synonym for term hash code.

Unfortunately, things change. Computers get faster and MD5 has reached the end of its useful life in this industry. In 1996 it was shown that it was theoretically possible to create two files with identical MD5 hash values (a collision) and in 2007 a group of researchers described how to do it. A recent article in Technology Review magazine describes one example of this.

The good news, of course, is that there are many other hash functions to choose from. Several years ago Valora switched to the SHA-1 hash function, designed by the National Security Agency and published as FIPS 180. SHA-1 is similar to MD5, but uses a longer hash code and is orders of magnitude less vulnerable to collisions. Those orders of magnitude don’t mean SHA-1 will be useful forever. Indeed, it’s been shown that SHA-1 is vulnerable to attacks similar to the ones used to bring down MD5, but it will be a while before the necessary computing resources are available. By then we’ll have moved on to other hash functions appropriate to the computing power of the day. But the industry needs to keep moving forward with technology, so when the time comes to switch functions, this discussion doesn’t need to be rehashed.


Aaron Goodisman is a software industry veteran with over 20 years experience in engineering management, software architecture, and product development. Prior to founding Valora Technologies, Mr. Goodisman served as Vice President of Engineering at SilverStream Software, acting as both manager and visionary for this award-winning application server product and its associated development and deployment tools.

Mr. Goodisman received his undergraduate and Master's degrees in Computer Science from MIT and is considered a world expert in Java industry standards and UI design. He is a frequent industry speaker and has authored several articles for industry publications. Mr. Goodisman is named as the inventor on several U.S. patents and currently pending patent applications.

Friday, March 27, 2009

Budget-based Solution Crafting

There’s an old, old joke whose punchline is, “I know what you are, my dear, we are now just negotiating price!” If you know this joke* , you recognize it as the quintessential business negotiation: the delta between the price someone will sell for and the price someone is willing to pay. It is the ultimate win-lose negotiation, with each party having to yield to or gain ground from the other.

Thanks to the current economic environment, a new approach to price negotiation is emerging. We call it “Budget-Based Solution Crafting” and it is exactly that. Custom solutions are molded and adapted to fit the realities of the customer’s budget. Rather than a win-lose, yield-gain struggle, the discussion centers around value: best options for the dollars available. It is a very smart approach for limited funds and tight credit.

In Budget-based Solution Crafting, the customer tells the proposed vendor exactly what she has available to spend. The vendor then adapts “regular” services into “custom” services to fit the budgetary restriction. It’s not a new concept, but it is typically not used much in lit support, where services tend to be sold on a fixed, transactional basis (per page, per GB, etc.) and where customers are often comparing commodity services. But while it can be scary to tell a vendor exactly how much you have to spend, it is the single best way to gain a solution that meets your client’s needs

Think of it like buying a wedding gift. You look at the bridal registry and you find a gift that you like, that you already know the bride and groom will like, and that fits your spending budget. No guesswork, no surprises, no haggling. Everybody wins.

The same concept works well for outsourced litigation support services. There are lots and lots of different services you can buy, at many varying price points. Rather than assume apples-to-apples service levels and price points, try putting the price point out first and then understanding what sorts of services (and bundles of services) are available at that level.

Here is a great way to think about Budget-Based Solution Crafting: pretend you are buying a car. Price-first is how we typically buy cars. Typically, a car buyer knows whether they are in the Porsche, Camry or Yugo class. Once they decide the class level, they can evaluate options in that class. Camry or Accord? Sunroof or navigation system? By buying services in this way, the customer is assured of the best value possible for the funds available.

The purchase of outsourced litigation support services lends itself very well to such an approach. Do you really need every document converted to TIF image? Do you really need to host all those non-responsive, irrelevant documents? Do you really need to look at all those duplicates during review? Probably not. So, why pay for them? Or, more importantly, why assume you have to pay for them and thus not select anything which could help? Customizable services can overcome that paralyzing all-or-nothing dynamic.

Quite frankly, your vendor should be your best ally. They should bend over backwards to try to get you what you need at a price you can afford. Like cars, there are lots and lots of options, at all price levels.

So, try it on your next project. Tell your vendor what you realistically have to spend and seek their advice on how best to spend it. You’ll be glad you did.


* If you do not know this joke, please contact me offline. Unfortunately, some of the joke setup is not appropriate for this family-friendly blog forum.

Wednesday, February 25, 2009

Gaining Strength from Automation & Prudent Financial Management

It’s a rough year out there. You probably know someone who’s been laid off or is in danger of it. Maybe that person is you… It’s in uncertain times like these that it pays to be a small, debt-free business. We at Valora are also very fortunate that we rely almost entirely on technology to provision our services. Valora has always tried to be prudent with spending and investment, but even we had no idea that the economy would turn this ugly this quickly. In this environment we find ourselves in a surprisingly strong position against a number of other service providers in our industry. Here’s why:

1. Valora is profitable and has been for over 6 years now. That means that our customers are ensured that we will continue to be around for many years to come. It also means that our employees are ensured of their salaries and their jobs, so they are focused on our business, not on “looking around.” We are happy simply to be doing our jobs and making our customers happy.

2. Valora does not carry debt, so no one is able to make a claim on us. There is nothing like the threat of a looming margin call to put everyone on edge and change corporate priorities. A surprising number of litigation support service providers are highly leveraged, which means they fund their operations on debt. Sometimes the debt was used to finance an acquisition. Sometimes it was used to buy a lot of advertising, marketing and sales presence. And sometimes it is just being used to make payroll. Debt can take the form of a bank loan, a leveraged buyout, venture capital and a host of other instruments. But no matter how you look at it, debt means the money (and the return) ultimately belongs to someone outside the company. And that someone else gets to call the shots. So, even though you may be speaking to your salesperson, account manager or project manager, they are not the ultimate debt-holder and do not have the final say. In some very ugly cases in this industry, the bank has actually become the lien-holder on the business. At Valora the only ones we answer to are our customers – which is just how we like it.

3. Valora has a service focus on automation, which is inherently low-cost and efficiency-driving. Valora’s competitive advantage is our technology and service provision, specifically our ability to automatically perform population analysis, unitization, coding, smart ESI processing, and review. Our “tagging” technology automatically captures all kinds of relevant information: bibliographic (dates, subject lines, authors/recipients), content (names mentioned, key words, issues, concepts) and categorical (duplicates, near duplicates, email topic groups, privilege, responsiveness). Because we are using automation to perform these tasks, the cost is low and the turnaround is quick. Low cost and immediate results are very important requirements in tight economic times. In fact, cost concerns are beginning to eclipse all other service requirements. Fortunately, with automation, you can easily get to “fast and cheap” without having to sacrifice quality or flexibility.

It is generally accepted that legal services go through the following progression over time: first senior attorneys perform the work, and then train more junior attorneys to perform the work at lower cost to the clients. Next, the work shifts to paralegals and other in-house labor. Then contract laborers perform the work, first onsite, then offsite, and then offshore. The final step is to create software programs that can perform the work more efficiently than any type of human labor at all. We have seen this progression in copying, coding and now in ESI and document review. In fact, the latter two are happening so fast, in large part due to economic pressures for low cost and high speed, that the outsource/offshore steps are being skipped completely. Instead, most firms are jumping straight from performing the work themselves to letting automation do the trick. If software can do the work for 2% of the manual cost and 10,000 times faster, why not? The more we feel the economic pinch, the more automation becomes a necessity.

This last point is indicative of the changes going on in the much broader economy as well. Businesses everywhere are looking to leverage technology to replace variable costs. In fact, a big piece of President Obama’s stimulus package is aimed at moving targeted industry sectors to finally embrace technology as an essential tool for operations.

Unusual times call for unusual measures. Give automation a try and see what you get. You might simultaneously be your client’s hero and save your job!


Monday, January 26, 2009

Inauguration at 35,000 Feet (aka Of Course, Technology)

I had the distinct pleasure of watching the 2009 Presidential Inauguration from the sky-high view at 35,000 feet, while en route to majestic Denver, CO for a meeting. Thanks to the wonders of modern streaming video technology, I was able to watch and listen to this event in real-time even though I wasn’t even actually on Earth’s surface!

While Obama’s inauguration was certainly inspiring, I was even more impressed with how far we have come in utilizing technology in our daily lives. Of course I was going to witness this, even if I would be traveling at 600 mph in mid-air. In fact, I expected to. After all, I create and adapt technology for a living. Why wouldn’t this be possible?

This “of course technology” attitude is a big part of what lies behind Valora and our push into document and population analytics. We often ask ourselves the same kind of why/why not questions: why would someone just automatically convert an entire electronic document population through to text, metadata and image? Why not figure out first what’s in there and whether anyone even cares about it? Why waste time and money hosting documents that will never be of any consequence? If the answer is, “because we don’t know what’s in them,” then we as an industry have collectively missed the “of course technology” idea completely.

If you had stomach pains and didn’t know why, you would not go to a doctor, have him slice you open in expensive and potentially risky surgery, and then try to figure out what ailed you! You’d be asked a lot of questions and you’d use surgery only as a last resort after rigorous examination and other attempted treatments. Indeed, your health insurance is specifically designed to make you and your doctor act that way! And yet, with litigation populations, we continually cut open the patient in the most expensive and time-consuming way possible, just to learn that 2 aspirin and a call in the morning would have sufficed.

Fortunately, the down economy is helping us, you might say forcing us, to re-think how we work and how we make decisions. Early case assessment, pre-conversion for ESI, population analysis, automated review and more are helping us understand what we have in our document populations and how to use it wisely. We are learning to separate out the useful from the not and to spend our limited resources wisely. For more on how Valora can assist in these area, please see our white papers on Population Analysis, and Automated Review and watch for an extensive article on this topic in this winter’s CASLM newsletter.

Eventually, my flight landed and the painful truth tinged my bright-eyed optimism. Obama’s got a lot of work to do. So do we.