importing a delimited ascii text (dat) file

11
LexisNexis ® Concordance ® 2007 Creating Databases – Importing a Delimited ASCII Text (DAT) File Document Overview Before You Begin Creating a New Database File Configuring Fields for Your Data Importing Your Data Additional Resources

Upload: simone-sterling

Post on 23-Oct-2015

83 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Importing a Delimited ASCII Text (DAT) File

LexisNexis® Concordance® 2007

Creating Databases – Importing a Delimited ASCII Text (DAT) File

Document Overview

• Before You Begin

• Creating a New Database File

• Configuring Fields for Your Data

• Importing Your Data

• Additional Resources

Page 2: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 2

Concordance® 2007 Quick Help

Concordance is a registered trademark of Applied Discovery, Inc. © 2007 Concordance. All rights reserved.

LexisNexis and the Knowledge Burst logo are registered trademarks of Reed Elsevier Properties Inc., used under license. Concordance is a registered trademark and FYI is a trademark of Applied Discovery, Inc. Other products or services may be trademarks or registered trademarks of their respective companies. © 2007 Concordance. All rights reserved.

Concordance®

Concordance® Image

Concordance® FYI™

Copyright © 2007 Concordance. All rights reserved.

Page 3: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 3

Before You Begin Delimited ASCII text files store 2-dimensional arrays of data by separating the values in each row with specific delimiter characters. Most database and spreadsheet programs are able to read or save data in a delimited format. Delimited-text files may have extensions such as .DAT, .ASC, .CSV or even .TXT, as long as the file is structured properly with text qualifiers, field delimiters and line breaks.

For many Concordance databases the files will also include optical character recognized (OCR) text and scanned document images. DAT files will often accompany the OCR text and image files containing the metadata for each document.

The procedure outlined in this document describes how to import a delimited ASCII text (.DAT) file.

You will need…

• Concordance • Text editor program (TextPad, UltraEdit or similar) • Delimited ASCII Text (DAT) file

Creating a New Database File 1 Open Concordance. 2 In the File menu select New.

Figure 1: Concordance Menu – File

3 In the Create database from template dialog (see figure 2), select the Blank database type.

Figure 2: Create database from template – General tab

Copyright © 2007 Concordance. All rights reserved.

Page 4: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 4

4 Click OK. 5 When prompted, choose a file name and directory (choose to store your database locally or on a

network drive).

NOTE – You must have full access to the directory.

6 Click Open to save the database and begin creating and customizing your fields.

Configuring Fields for Your Data Selecting the Blank database template creates an empty database containing no fields and is best to use when you are creating a custom structure for a delimited ASCII text (.DAT) file.

Plan your database structure Open your DAT file with a text editor.

Note the following:

• Delimiters used in the file (Text qualifier, field, and new line delimiters) • Field Headers (the first line will usually contain the field headers) • Type, format, and length of data • Date fields are 8 digits max, may be in any order with slashes, or in the universal “true date” format

without slashes • Field(s) database users need to search and sort • Field (if any) to be linked to an image • OCR content (if any) to be imported

Tip - While you have the DAT file open, scroll to the bottom of the file, and ensure that the last record (the last line) has a new line delimiter (create by pressing Enter on your keyboard) at the end of the record. Without the final return, the last record will not be imported into your database.

Immediately upon creating a blank database the New field dialog will open prompting you to begin creating and configuring your fields.

1 Type the name of your first field in the Name field (see figure 3).

NOTE – Field names do not need to match field headers specified in the DAT file. They may be up to 12 characters long and entered in upper or lower case letters. All characters will all be converted to upper case by the system. They must begin with a letter and may contain only alphanumeric characters and the underscore.

2 Select the field type in the Type drop-down, and select the appropriate attributes for the field.

Types and Attributes - To successfully import your DAT file, you must create fields to match the data type and size of your data. Refer to Tables 1 and 2 below for information about Field Types and Attributes.

Field Order - Create your fields in the order in which you will want to view them in Table and Browse views. Use the Insert and Delete (Similar functions to Paste and Cut respectively in MS Office products) buttons to arrange fields into the desired order as necessary.

Copyright © 2007 Concordance. All rights reserved.

Page 5: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 5

3 Click New to confirm your choices and to create the next field.

NOTE – If you accidentally click OK instead of New to create a new field, the New field definition dialog will close. To access this panel again, select Modify in the File menu.

Figure 3: New field definition dialog

Field Types

Type Capacity Notes Text* 1-60 alpha or numeric characters, keyed by

default Use for numeric values that are not used mathematically (i.e. phone numbers, social security numbers, and other serial numbers) Note - If you intend to sort records based on this field, zero fill any numeric values stored in to ensure they sort correctly.

Numeric* 1-20 digits long (including the decimal place, negative sign, and all digits following the decimal place), keyed by default

Display options available: • Currency • Commas • Zero filled • Plain

Date* MMDDYYYY YYYYMMDD DDMMYYYY

8 bytes in length The date format selected here will control how the data appears after it is imported into the database. It does not need to match the date format in DAT file.

Paragraph 12,000,000 characters (12 MB), indexed by default

Most flexible and variable in size, not ideal for sorting or searching by comparison. Supports rich text formatting.

*Fixed-length field

Table 1: Field Types

Copyright © 2007 Concordance. All rights reserved.

Page 6: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 6

Field Attributes

Attribute* Use Notes Key Most commonly applied to fixed length

fields, however it may be applied to any field (including paragraph fields) to make relational searches faster.

Keying a field creates a .KEY file, as KEY files grow in size, their efficiency decreases and may slow relational searches. All keyed fields will appear in the default table view.

Image Used to link Concordance with an Image viewer, it indicates which field contains the image name or alias.

Select only one field per database as an Image field. Identifying multiple fields in a database as an image field will interfere with the linkage between Concordance and the viewer.

Indexed Enables full text searching. Places every word in the field into a dictionary file (.NDX and .DCT) for fast retrieval.

System Special field that is hidden with no read or write access to end-users.

System fields should never be indexed, added, deleted or modified by users. Concordance will create these fields for replication and synchronization information.

Accession Unique serial numbers internally assigned to each record, managed entirely by Concordance.

Accession numbers may not be edited or modified. Helpful as load order identifier. Note – As records are edited, exported or removed you gaps in numbering may occur.

Optical Character Recognition (OCR) Indexing

Will not index text that is not contained in a defined dictionary.

Not recommended for any fields. Causes increased indexing times, and will limit the indexing to Webster’s dictionary and will include only English words. Use Synonyms instead.

*Not every Attribute is available for every field type

Table 2: Field Attributes

4 Repeat steps 1 through 3 as necessary to create a structure to match your DAT file. 5 When you have completed creating all your fields, click OK.

Your database structure is ready for the data import.

Additional Considerations

Embedded Punctuation Embedded punctuation is provided so that hyphenated words, dates, decimal numbers, and contractions are not split into two or more words. You may add or delete punctuation as needed, by default Concordance includes ‘ . , / characters as embedded punctuation for all fields.

If you will be importing OCR… Create your OCR fields now, in addition to the fields for your DAT file import.

As a best practice, create at least two OCR fields labeled with ascending numbers (example: OCR1 & OCR2) When using the ReadOCR.cpl to import your OCR text, the CPL will automatically overflow text from the first OCR field if it is over 12 million characters into the next sequential named OCR field.

Copyright © 2007 Concordance. All rights reserved.

Page 7: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 7

NOTE – If you do not create a second sequentially named OCR field, you run the risk of losing overflow data. You will not receive an error on the import if your content exceeds the 12 million character limit.

Importing your data 1 In the Documents menu, select Import then Delimited text.

Figure 4: Documents Menu – Import> Delimited text…

2 Select the Import/Overlay Wizard in the Import method dialog, and then click OK.

Figure 5: Import Method

3 Accept the default Load option for your initial import of data, and then click Next.

Figure 6: Import Wizard dialog – Load Method

Copyright © 2007 Concordance. All rights reserved.

Page 8: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 8

4 Select the delimited format that matches the one used in your DAT file, then click Next.

NOTE – The Import Wizard defaults to the standard Concordance delimiters, but you may also select Comma Delimited (CSV), Tab Delimited, or choose the Custom format and specify your unique ASCII character delimiters in the drop-down menu shown in figure 7.

Figure 7: Import Wizard dialog – Format

5 In the Date format window, select a date format that matches the dates in your DAT file, and then click Next.

NOTE – Selecting the date format will not affect how it will display in table and browse view. That preference was set when the date field was created in the New field definition dialog.

Figure 8: Import Wizard – Date format

Copyright © 2007 Concordance. All rights reserved.

Page 9: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 9

6 By default all of the fields you created will appear in the Selected Fields box, make sure the order of the fields matches the order in your DAT file.

Figure 9: Import Wizard – Fields

If you need to change the order of the files

• Move all the Selected Fields to the Available fields list by clicking on the << button, then add them back in the proper order one by one using the > button.

Or

• Click on a field to reorder and use the Up and Down buttons as needed to correct the order.

NOTE – If the DAT file contains the field information as the first line in the file, select the Skip first line checkbox to ensure that the data imported from the DAT File has the associated fields in the Selected Fields window.

7 Click Next to confirm the Selected Fields and their order. 8 Click Browse in order to navigate to and select your DAT file (delimited ASCII), and then click

Next.

Copyright © 2007 Concordance. All rights reserved.

Page 10: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 10

Figure 10: Import Wizard – Open

9 Confirm the location of your DAT file in the File field and click Finish to import your data.

Figure 11: Import Wizard – Finish

10 When the import is complete, the dialog will close. Select the Browse view to verify that your data import was successful.

If you are not linking to images or loading OCR, you are ready to index your database and get started searching, tagging, and working with your records.

Copyright © 2007 Concordance. All rights reserved.

Page 11: Importing a Delimited ASCII Text (DAT) File

Creating Databases – Importing a Delimited ASCII Text (DAT) File 11

Additional Resources General Product Information http://law.lexisnexis.com/concordance

Concordance Technical Support Phone: 866-495-2397 Email: [email protected]

Concordance Training Phone: 425-463-3503 Email: [email protected]

Copyright © 2007 Concordance. All rights reserved.