eDiscovery Archives

May 11, 2007

eDiscovery and the data transfer problem

Discovery is a legal device employed by a party in a civil or criminal action, prior to trial, to require the adverse party to disclose information essential to case preparation which only the other party knows or possess [1]. Electronic discovery or eDiscovery is simply discovery applied to electronic media, e.g., email, documents, spreadsheets, schematics, instant messenger logs, voice mail recordings. eDiscovery is growing. It is growing because the legal requirements are expanding [2]. It is growing because technology is making more and more data readily accessible and therefore discoverable. Forrester expects eDiscovery technology to grow from $1.4B in 2006 to $4.8B in 2011; the 2006 Socha-Gelbmann Electronic Discovery Survey estimates $1.8B in 2006 to $3.1B in 2008 [3]. Machine costs should range between 15-25%.

I have spent the last two years working for Gallivan, Gallivan & O’Melia (GG&O), a Seattle-based firm offers both software and consulting to law firms and enterprises for electronic discovery. GG&O’s capabilities (and hence my experience) range from forensic data acquisition to native document processing and review support through imaging for production. I functioned as the both technical and managerial lead for the Silicon Valley office.

During my tenure, a “standard” matter ballooned from several hundred gigabytes and several hundred thousand files to multiple terabytes and multiple-million files. Because of the volume of data involved, data transfer, from drive to drive and from drive to memory to CPU (for hashing, indexing), has become the primary bottleneck holding attorneys back from review. Generally, unlike gaming or many scientific pursuits, eDiscovery is not computationally intensive; performance is not CPU-bound; it is input/output (i/o) bound.

Other applications that are i/o bound include bioinformatics, Homeland security, financial (transactional) databases, and enterprise document management systems. For these, having increased data throughput can generally be categorized as “nice-to-have” or “do-it-only-when-it-becomes-cheap-enough”. The requirements of electronic discovery, in contrast, are business critical. The legal and financial pressures are tangible and quantifiable (especially when dealing with government agencies with three-letter acronyms!). As the volume of data increases, the machine time, mostly because of data transfer issues, can involve days. Even when RAM or Flash-based solid state drives (SSD) become available, the time require to transfer data will remain a limiting factor. The interesting thing is that the eDiscovery industry will technologically drive itself. As faster data transfer becomes available, because litigation is competitive and time is always of the essence, there will be uptake.

The data transfer bottleneck. The bottleneck for data transfer comes from the need to access the entire volume through a single interface. IDE and SATA interfaces range from 150-300 MB/s. (USB devices operate at approximately 60 MB/s, DRAM up to 2-3 GB/s.) Besides the interface, there is also the issue of sequential versus random i/o. Operating at the highest rates, a terabyte (TB) takes approximately 1-2 hours to sequentially transfer from one drive to another. When files are accessed randomly (and there are many small files), the transfer time could be extended as much as 3-4x. The random versus sequential discrepancy can be alleviated by using RAM or Flash SSDs, which have better ways of addressing the data. But this still will be throttled by the interface. Where is my 1 TB RAM computer?

1. West Publishing Company. and West Group., West's encyclopedia of American law. 1998, Minneapolis/St. Paul, MN: West Pub. Co. v. <1-12 >.
2. Federal Rules of Civil Procedure. 2006 [cited 2007 May 10]; Available from:
3. Skjekkeland, A. eDiscovery Market Size. AIIM Knowledge Center Blog 2007 [cited 2007 May 10]; Available from:

Tape restoration (forensic alchemy) and the new rules

Someone wise once jokingly advised me, “if you want to securely destroy data, back it up onto tape”. Having restored (or attempted to restore) a fair sampling of (potentially corrupt) tapes, I say this is not far from truth. From a technical standpoint, this is especially the case when absolutely no information can be gleaned from the client on what (deprecated) OS, what (legacy) software (or worse, combination of software) and which (ancient) tape drive. Indeed, tape restoration can be very much forensic alchemy! That is why it costs so much and can take so long. But, from a legal standpoint, does technically difficult-to-restore imply “off-the-hook”? Well, as your attorney will respond, it depends.

The freshly amended Federal Rules of Civil Procedure treat this issue in Rule 26(b)(2) [1]. It reads…

(2) Limitations.

(A) By order, the court may alter the limits in these rules on the number of depositions and interrogatories or the length of depositions under Rule 30. By order or local rule, the court may also limit the number of requests under Rule 36.

(B) A party need not provide discovery of electronically stored information from sources that the party identifies as not reasonably accessible because of undue burden or cost. On motion to compel discovery or for a protective order, the party from whom discovery is sought must show that the information is not reasonably accessible because of undue burden or cost. If that showing is made, the court may nonetheless order discovery from such sources if the requesting party shows good cause, considering the limitations of Rule 26(b)(2)(C). The court may specify conditions for the discovery.

(C) The frequency or extent of use of the discovery methods otherwise permitted under these rules and by any local rule shall be limited by the court if it determines that: (i) the discovery sought is unreasonably cumulative or duplicative, or is obtainable from some other source that is more convenient, less burdensome, or less expensive; (ii) the party seeking discovery has had ample opportunity by discovery in the action to obtain the information sought; or (iii) the burden or expense of the proposed discovery outweighs its likely benefit, taking into account the needs of the case, the amount in controversy, the parties' resources, the importance of the issues at stake in the litigation, and the importance of the proposed discovery in resolving the issues. The court may act upon its own initiative after reasonable notice or pursuant to a motion under Rule 26(c).

Ronni Abramson has a great article in Legal Tech discussing two recent rulings: Best Buy Stores L.P. v. Developers Diversified Realty Corp. and Ameriwood Industries Inc. v. Liberman [2]. Here are the bullets:

Best Buy Stores L.P. v. Developers Diversified Realty (DDR) Corp.

  • Best Buy alleges overcharges for insurance and maintenance, seeks documentation on how insurance charges were calculated
  • DDR fails to respond and thus waives objections
  • Best Buy files motion to compel
  • DDR argues in brief that processing would exceed $125,000 and to hold determination until all issues have been sorted out, offers no proof to support argument that tapes were not reasonably accessible
  • Magistrate Judge Jeanne Graham states "the Defendants offer no proof, aside from conclusory statements, about the cost to obtain documents from electronic archives. So this concern cannot shield the defendants from discovery here." Orders responsive docs in 28 days.
  • DDR files objection with U.S. District Court Judge David Doty requesting rolling productions
    • submits unsworn statement from Kroll Ontrack advising 102-122 days to restore all tapes,
    • submits affidavit by directory of technology one day late (because of illness) with number of tapes, 345, that tapes were used solely for disaster recovery, and that an outside vendor would be required,
    • submits cost estimates from Kroll for restoration, filtering and processing (before review) – between $288,300 and $468,100 (~$1,000 per tape).
  • Doty unconvinced, upholds production order and timeline
  • DDR files motion for reconsideration.
  • Graham denies arguments that DDR was not aware of costs or delays and could not have presented evidence to support objections earlier.

Ameriwood Industries Inc. v. Liberman

  • Ameriwood alleges that Liberman, while employed, used confidential information to sabotage business relationships.
  • Liberman claims that lost sales were due to Ameriwood mismanagement, requests production of documents to show mismanagement.
  • Ameriwood produces some documents, objects that requests are “overly broad and unduly burdensome”
  • Liberman motions to compel, requests all responsive documents within a date range
  • Ameriwood argues that request would result in reviewing hundreds of thousands of documents, submits affidavit from forensics firm detailing that
    • the firm had collected responsive emails sent within the daterange for 23 former and current employees into a database
    • calculated that the emails within the database numbered in the hundreds of thousands
    • calculated that the emails for the six employees identified by Liberman would result in 60,000 emails and attachments
  • Judge rules that requested information is not reasonably accessible (because of review volume, not necessarily technical concerns), also that Liberman did not have a sufficiently narrow request.

The eDiscovery theme here is crystal clear: know what you have, know what it’ll cost, and for goodness sake, buy and submit the affidavit. These rulings suggest (and set precedence) that the law will not be kind to ignorance or procrastination.

On a different, and slightly skeptical note, consider the following: revenue for Best Buy FY2007 (ending 5/07) was $37B, revenue for DDR FY2006 (ending 12/06) was $0.8B. Big guys are up 2-0. Hmmm.

1. Federal Rules of Civil Procedure. 2006 [cited 2007 May 10]; Available from:
2. Abramson, R. Judges Rule on Hard-to-Discover Data. Legal Technology 2007 May 10 [cited 2007 May 11]; Available from:

About eDiscovery

This page contains an archive of all entries posted to Tim's Journal in the eDiscovery category. They are listed from oldest to newest.

Driving is the previous category.

Groupware is the next category.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.35