handling complex data formats; ediscovery and ig for schematics, multilingual, multimedia and cloud
TRANSCRIPT
Dealing with Difficult File TypesJuly 23, 2015
2
ACEDS Membership Benefits Training, Resources and Networking for the
E-Discovery Community
Join Today! aceds.org/join or Call ACEDS Member Services 786-517-2701
Exclusive News and Analysis Weekly Web Seminars Podcasts On-Demand Training Networking
Resources Jobs Board & Career Center bits + bytes Newsletter CEDS Certification And Much More!
“ACEDS provides an excellent, much needed forum… to train, network and stay current on critical information.” Kimarie Stratos, General Counsel, Memorial Health Systems, Ft. Lauderdale
Today’s Speakers
Johannes ScholtesChief Strategy OfficerZyLAB
Mary MackEnterprise Technology CounselZyLAB
Agenda
• What is a difficult file type?• General community practice• Current cases• Technical and workflow solutions• ROI via solving difficult file formats• Questions
5
Difficult file types and the 80/20 rule
• 20% of files • 80% of time• 80% of budget• 80% of reactive adrenaline
6
Difficult file types - let me count the ways
• GroupWise, mbox and other non-Microsoft, Lotus emails• AutoCAD and other graphics, including no-text PDF’s • Embedded• Proprietary databases• Regular databases• For predictive coding in particular:
– Multiple languages– Very short or very long documents– Number intense documents (Excel, databases)– Compressed files (archives, forensic files)
7
Difficult file types - what cases or stages
• Construction, real estate, patent, financial, medical - schematics and graphs– Huge drawings, sideways text, color, mapping, expensive programs, white on
black X-rays, graphs• Products liability, divorce, insider trading, internal investigations - audio
– Multiple encoding formats, metadata in different file• International cases, FCPA, Health, Education - multiple languages
– Detection, fonts, indexing side to side or up/down, idioms, grammar• Damages, contracts, class action certification, employment - databases
– High degree of skill to extract if reports do not exist, may need program to view properly, giving the whole database can expand scope
8
Difficult files - generally
• Determine if the data source is likely to contain relevant material
• Negotiate away the non responsive and/or hard stuff• Argue it away via “not reasonably accessible” or not
proportional• Produce it in the most efficient manner
9
Cases and ReferencesGPS/map United States v. Lizarraga-Tirado, —F.3d—, 2015 WL 3772772 (9th Cir.
June 18, 2015)Proprietary license Pero v. Norfolk S. Ry., Co., No. 3:14-CV-16-PLR-CCS, 2014 WL 6772619
(E.D. Tenn. Dec. 1, 2014)Russian Facebook United States v. Vayner, — F.3d —, 2014 WL 4942227 (2d Cir. Oct. 3,
2014)Translation Kenneth N. Rashbaum, Matthew F. Knouff & Dominique Murray,
Admissibility of Non-U.S. Electronic Evidence, XVIII RICH. J. L. & TECH. 9 (2012), http://jolt.richmond.edu/v18i3/article9.pdf
Audio Novick v. AXA Network, LLC, 2014 WL 5364100 (S.D.N.Y. Oct. 22, 2014)Databases ediscoverylaw.com (K&L Gates) has over 150 cases
10
Difficult file type checklist
ProportionalityReasonably accessibleLegacyAlternatives (ex. depo’s, admissions)Chain of custody (exception handling, outsource)Review (inline, or need special software, process)Form of production (labeling, branding, redactions)Admissibility (business record exception may not apply)Authentication (what it purports to be, true and correct copy)
11
Audio
– Unified messages, call recording, video, webcasts, customer service– Reduce via metadata– Index via speaker independent phonemes– Search– Produce natively, BATES and branding in file name– Formally transcribe documents for admission
12
Foreign languages
– Sources: Supply chain, HQ/Sub, constituents or clients– Determine or detect languages, folder• Language as a privacy trigger
– Reduce via metadata, machine translate, then search. • Consider native speakers for investigatory search, review
– Produce original file in original language, and/or translation. Admit with affidavit.
13
Schematics and Graphics
• All, nothing or by file type• Redaction of PII (medical)• Redaction of confidential or trade secret • Sideways and upside down text• A0 large drawings (construction, chips)
14
Databases
• Determine If collection via extraction, or reports will work• Consider limiting, monitoring inspection if necessary• Reduce via fields, columns or values. • Can include reports in review set• Document reduction with contemporaneous affidavit
• Produce report, or if necessary, annotated schema with csv or other common format like Excel.
15
Forensic formats
• Forensic extraction has lower expectation of accessibility • Process inline (normal, automated documentation)• Separate process • Extract on to clean media (original software or dongle necessary?• Point eDiscovery software at media (label source)• Collect chain of custody documentation prior to ingestion (fact affidavit)• Special folder for recovered items (may need expert affidavit)
• Produce whole forensic file or parts
16
Approaches for difficult data sources
AudioUnified messages, call recording, video, webcasts,
customer service
Reduce via metadata, index via phonemes, then search
Produce natively, formally transcribe documents for admission
17
Collect
Produce
Review
Reduce expense of difficult formats-ROI examples
Complex Data Formats - Audio Search
1. No need to
transcribe audio recordings.
2. Search through hundreds of hours of recordings within seconds.
3. Immediately identify relevant recordings.
4. Tag as responsive. DEMO
ROI Manual vs. Searchable Audio
ROI Manual vs. Searchable images
Search in one language, results include foreign
ROI: Manual translation vs. automated translation
Automatic Identification and Redaction of Privacy Sensitive Data
ROI: protect IP
25
ROI: No manual insertion, qc
26
ROI: Comprehensive review