concordance intro to admin training class -...
TRANSCRIPT
Concordance Intro to Admin Training Class
Agenda• Installation• Create New Database & Load Data• Opticon Imagebases• Administration & Maintenance• Concordance Preferences• Security• Concatenation• Replication & Synchronization• FYI Core Server demo
Concordance Basics• Database software designed for litigation• Fast and efficient searching of terabytes of text• Tags• Easy import and export of text data and e-docs• A la carte model, no suite required• Robust, flexible security• Bigger, cheaper, faster than competitor products
Concordance Capacity• Each field can hold up to 12 million characters
(~7500 pages)• Up to 1,875,000 pages per record• 33 million records per database• Up to 250 fields per database• IVT file max of 1/2 terabyte (512 gigabytes)
– Practical database size 256 - 512 gigabytes• Can search across 128 databases together
– 64 terabytes– ~4 billion docs
Concordance 2007 New Features• Due out soon• New look and feel• Dashboard and tab view for open databases • Simple search• Expanded tag management • Enhanced email / edocs importing• Expanded integrations
– TotalLitigator– CaseMap– IPro Thin Client viewer– Sanction
Installation• Standalone means Mobile version• Network Installation usually multi-seat
– Concordance installed on Server – Workstation.exe installed on all workstations– Add to corporate image
• Windows Permissions– Users must have full permission to db directory – Create/edit/delete, like MSWord
Hardware• System requirements in help• Searching uses end-user machine’s RAM• RAM has most impact on user perception of
speed (after search construction)• Internal network speed• Internet connection speed• Can put .dcbs anywhere on LAN• Can put images anywhere on LAN
Data Load Files• Class Exercise: IA_Data.dat• Delimited text format (a.k.a. “Load Files”)• Metadata (can hold body text also)• Typically .dat or .txt extension• Elements to check:
– Header row with field names– Date fields: 8 digits max, any order with slashes,
universal “true date” format without slashes– Carriage return at end of every record– Delimiters
DelimitersComma Field break indicator, default is þ (ASCII 254),
customizable, avoid characters in dataQuote Keeps text together, only required around
fields that have text & spaces, default is �(ASCII 20), customizable, avoid characters in data
New Line Manual line break, wrap within a field, default is ® (ASCII 174), customizable, avoid characters in data
New Record
Starts a new record, carriage return, cannot be changed, industry standard
Database Creation• Start with blank template• Templates are empty databases stored in the
template folder• Can make your own
(Documents>Export>Structure)• Start by making fields to match load file header• Then OCR fields• Couple of basic administrative fields
Field Names• Up to 250 fields per database• No predefined structure, no required fields• Maximum of 12 characters including:
– Letters– Numbers in middle or at end– Underscore in middle only
• Stored in all capital letters
Data TypesData Type Capacity Type Notes
Date 8 bytes Fixed Just for dates, keyed by default
Numeric 1 – 20 digits
Variable Currency, zero-filled, comma, keyed by default
Text 1 – 60 characters
Variable Alphanumeric, keyed by default
Paragraph 12 million characters
Fixed Alphanumeric, indexed by default, rich-text
Field PropertiesImage Indicates which field contains the image name
or aliasKey Speeds sorting (adds values to DB.key file),
keying everything dilutes the valueAccession Copies UUID in system table into visible field,
good for sorting by load order, can have gapsSystem Field cannot be seen by users, Concordance
creates these for rep/synch informationIndexed Enables full text searching (adds values to
DB.ndx and DB.dct)OCR Indexed
DANGEROUS, limits dictionary to Webster’s, indexes only English words
Indexing Considerations• Indexing puts contents in dictionary/index• Full text searching works only on indexed fields• Use relational searching on non-indexed fields• Paragraph fields indexed by default• Can index anything• Big dictionaries/indexes search slower• Avoid serial/Bates number (unique value fields)• Bloats dictionary/index with non-words
Indexing Considerations• Punctuation
– Just have to set this once– Pertains to indexed fields only– Punctuation indexed only if embedded between
alphanumerics• Stopwords – Good idea to set this before
indexing– Stopwords are ignored during indexing– Common prepositions, search operators etc.– Keeps the dictionary/index lean
Fields for ExerciseBEGNO Text 10, keyed, imageENDNO Text 10DOCDATE Date KeyedDOCTYPE Text 25, keyed, indexedDOCTITLE Paragraph IndexedTO Paragraph IndexedFROM Paragraph IndexedPAGES Numeric Length 10, places 0, plain formatOCR1 Paragraph IndexedOCR2 Paragraph Indexed
Basic Administrative Fields• Normally, not keyed or indexed• CREATEDATE: Date• EDITTRAIL: Paragraph• Set properties in Edit>Validation• PRODBEG, PRODEND for production later• TAGS: for Tag2Field.cpl• ADMIN1, ADMIN2 just in case
Importing Load File• Documents>Import>Delimited Text• Wizard or dialog box – personal preference• Overlay = Dangerous, use “matching” instead• Match date format to format used in load file• Pick delimiters that are used in load file• Set field order to match load file• Index• Check field mapping, date format, edit/create
date fields
Import OCR with ReadOCR.cpl• Body text generated by OCR software• Stored in text files named with BEGNO• Concordance Programming Language script• File>Begin program, CPL folder, ReadOCR.cpl• Copies individual file contents into designated
field in database• Search for missed records using relational• OCR1 = “” finds empties• Reindex to make new text searchable
Loading Images in Opticon• Need TIFs and an Opticon load file• TIFs are usually scanned from paper• Each TIF named by BEGNO• Class Exercise: IA_Image.opt• Usually .opt, .log, .txt format• Format: Image alias, volume, path/file name,
page break, binder break, box break, page count• Can edit file path now or after import• Check for final carriage return
Loading Images in Opticon• Click on camera button in Concordance• Error says “can’t find in imagebase”, click OK• In Opticon, Tools>Imagebase Management• Creates .dir and .vol files automatically• Open database.dir file (that is the “imagebase”)• Register – Load tab for load files• Register – Scan tab doesn’t need load file
(remember to set doc breaks)• Register – Directory to fix bad paths• Change records in Concordance to see image
Importing Electronic Data• E-Docs
– Needs source software installed to extract– Works great with MS Office– PDFs with embedded text– Easiest with E-Docs template, handles mapping– Hyperlink to native file (not the original!)
• E-mail– Outlook PST– Has to be associated with an Outlook profile– Control Panel>Mail>Profiles to make new
Importing Transcripts• Import text format files• Import Livenote’s PTF, PCF files• Always have to use Transcripts template
– CPL enables line numbering• Set up persistent tags• Index• Great way to search across all transcripts for
keywords
Integrations: Data Extraction• Guidance Software (GuidanceSoftware.com)
– Enterprise forensic data collection from LANs– Exports collected data directly to dcb
• IPRO Image (IPROTECH.com)– Can create Concordance Image load files
• Image Capture Engineering’s Legal Access Ware (LAW) (ImageCap.com)– Extracts text contents and metadata from electronic
files in native file format– Export data directly to dcb
Preparing Database for Users• Edit stopwords• Index / reindex• Make persistent tags• Set up synonyms• Enable security
Tags• Tag Management• Persistent tags can exist without being applied to
a document• Don’t use the standard right-click method to
create a “produce” tag
Synonyms• File>Dictionary>Synonyms• Make permanent associations between words• Associate fuzzy matches to party name• Associate email addresses with party name• Drug names: associate brand with generic• One-to-many relationship• Getting over-inclusive hits? Check here
Security• Set only user/password combo for security
console on first access• Share with other DBAs• Create users, set field and menu rights• Enable security• Logon required (leave off to use default)• Export security settings each time• Users set own password
Creating Templates• Creating Templates• Export structure or export with data and zap• Includes
– Stopwords– Edit validation settings– Structure– Description
• Replica includes security• Zap deletes all data, dictionary, index,
unnecessary subfiles
Concordance Preferences• Tools>Preferences• Registry settings are stored in local machine
under users Windows profile• Same user working on same machine sees
these same settings always• Searching: Default Operator, Wildcard, Quote
character• Highlight color• Browsing: Tag action, Title bar display• Camera button “sticky” settings
Cache Preferences• Index Cache: Local hard disk space used like
RAM for indexing and reindexing• Defaults to 35% of installed RAM• Increase to Total RAM – OS RAM
– Windows XP generally uses 160 MB RAM• Dictionary cache: Local hard disk space used
like RAM for dictionary functions like searching and sorting– Defaults to 32 MB– Decrease to 4 MB
Viewer Preferences• Viewer Settings• Database settings are stored in .INI file• “All Databases” settings are stored in subfile with
Concordance.exe installation• For Opticon, all fields should be blank
– Sometimes might have to path Opticon.exe• For IPRO
– Path to IPRO exe– Download cpl from IPROTech.com, path to that
Database Management• Backing up databases• Data entry and editing• Check for duplicates• Modifying field structure• Regular maintenance
– Reindex– Pack dictionary
Backing Up a Database• Couple of options, pick for your situation• Find out about automatic server backups
– Often take too long to restore for practical use• Copy the whole folder elsewhere• Copy the source files elsewhere• File>Export save copy elsewhere
– Replication takes longer, adds fields, but gets security• Don’t need to back up images often
Data Validation• Tools to support accurate data entry• Edit>Validation
– Data restrictions– Authority lists– Only applies to manual data entry/edits not imports
• Authority Lists – A predefined list of values that can be put into a field– List can be static or updateable by User– You can specify single or multiple entry field– Great for ensuring consistent data entry
Data Entry • Appending Records• Editing Records
– Use tool in Tagging to view all edited Records before you reindex the database
– Ditto• Pick fields that you would like to copy the data to a new
record or the existing record• Copy from previous record or specify a particular record• Saves time and keystrokes• Helps ensure more consistent data
Checking for Duplicates• Tools>Check for Duplicates• Pick fields to consider• Name tag to be applied• Creates a non-persistent tag with that name• Three possible conditions
– Unique (get no tag)– Original (first time value appears)– Duplicate (subsequent times value appears)
• Feature is easy, defining a duplicate is hard
Delete & Pack• 2 step process: delete then pack database• Run your query first• Edit>Delete and Undelete• See [DEL] in status line• Query by deletes in Tag Management• TIP: Export those records and save as a backup• File>Pack>Database• PERMANENTLY removes records• No undo button
Database Modification• “Modifying” a database usually means changing
the fields• Can do this after there is data in the database• BACK IT UP! Then File>Modify• Some things are more dangerous• Rule: Full index required immediately after any
change• At least “OK” out between, no more than three• Modifying is primary cause of corruption!
Indexing• Indexing makes new dictionary / index
– Must do this first time– Must do this after changes in File>Modify– Must do this after changing Stopwords
• Reindexing– Updates dictionary / index with changes or new data
• Indexing speed estimated at 40,000 – 70,000 records per hour
• Query by Edited in Tag Management
Packing Dictionary• Packing removes records marked for deletion• Only need to do this if there are records marked
for deletion• Packing the Dictionary optimizes it
– Improves search speed– Improves index / reindex speed (up to 6x)– Makes dictionary file smaller
Concatenation• Opening multiple databases
– Searches up to 128 databases simultaneously– Can open transcript and non-transcript databases
together– Functions with field selection have drop-down
database lists• Creating CAT files
– Open multiple databases automatically via CAT file– Save CAT file in same dir as main db and name it the
same to have it open automatically when the DCB is clicked
Replication• Enabling Replication
– A one time procedure that adds 2 system fields to your database
• Creating a replica• Included in a Replica
– Records in query– Table layout– Tags – Security & Synonyms– Data validation & authority lists
Synchronization• Recommend using default options• Tag reconciliation takes last status by time• Field edits produce collision screen• Pick which value you want
Opticon: Produce• Run production query first in Concordance• In Opticon, Tools>Produce• Generate new production numbers and write
them back into the Concordance database• Makes new set of TIFs to send to other side• Burns in redactions and other selected redlines• Also can use the production tool to:
– Create CD/DVD productions– Create subsets of imagebases– Create log files
FYI Core Server
FYI Core Server1. At the Main Office: The FYI Core Server software is
installed on your company’s internal server2. On Your Network: Internal Concordance and Opticon
users work on databases locally3. Satellite Offices & On the Road: Remote users with
local versions of the software can access the same centralized databases and imagebases
4. Experts, Co-Counsel, Clients: Using FYI Reviewer via a web browser, occasional users can access case data that they have permission to review
Tale of Two ClientsConcordance FYI Reviewer
Local Databases
Remote Databases
Editing, Searching
Importing TRANSCRIPTS
Admin Features
Runs in Internet Explorer
FYI Reviewer• Intuitive User Interface• “Google®-Style” Simple
Search• Standards from Concordance
– Tally– Send to Excel– Sort– Reporting
• Robust Transcript Support• Instantly Search Multiple
Databases & Matters• Tag folders, personal tags,
history, statistics
FYI Core Server Demo• Powerful Administrative
Tools• Advanced Security
Features• Trackable User Activity• Automated Administration• Drag & Drop
Management• Fast, Scalable
Deployment