using lucene for search within xis
DESCRIPTION
Allex Lyons, a programmer at Access Innovations, Inc., talks about the decision made by this company to apply a faster, more reliable and efficient Lucene index to XIS for searching docsets, instead of a random access file.TRANSCRIPT
![Page 1: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/1.jpg)
XIS Lucene Indexing and Search
![Page 2: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/2.jpg)
What is XIS? XIS is a XML schema-based database system used to
store user data All records are stored in individual XML files Option to zip XML files available with XIS Project DTD
![Page 3: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/3.jpg)
How XIS Data Is Stored Docsets
Stores records with multiple fields (similar to SQL Table) Can also have subfields and lists of field values nested within a
record Can look up values from other fields in other Docsets or other
tables Tables
Stores a single list of values Can be referenced by other Docsets Can be directly accessible for editing or kept hidden from user
view
![Page 4: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/4.jpg)
How to Create a XIS Project Create DTD file for XIS project
Specify MAI Thesaurus to link to project Create Docset and Tables Specify ID lengths for each Docset Create fields for Docsets
Save DTD to dhserver/projects/projects/xml folder Create XIS Project folder under dhserver/data Create subfolders for each Docset under XIS Project
folder as well as Tables directory XIS Projects can only be created by administrators
![Page 5: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/5.jpg)
Starting a XIS Project Start Data Harmony server where project is located Log in to Admin module
Start MAI Thesaurus Start XIS Project Index XIS Project, especially if just created
Run startXis program Enter server, port, thesaurus, username, and password
to log in
![Page 6: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/6.jpg)
Indexing a XIS Project
![Page 7: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/7.jpg)
XIS Login Screen
![Page 8: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/8.jpg)
XIS Project View
![Page 9: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/9.jpg)
XIS Docset View
![Page 10: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/10.jpg)
XIS Table View
![Page 11: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/11.jpg)
XIS Record Format Saved in XML file Starts with tag to represent Docset name along with ID
as attribute Fields are listed within Docset tag along with values.
Subfields are nested within their parent fields
![Page 12: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/12.jpg)
XIS Search View
![Page 13: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/13.jpg)
XIS Search Results
![Page 14: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/14.jpg)
Current XIS Indexing and Search Uses text-based indexes Creates large number of index files (one for each field) Generates temporary files for results Uses less reliable RandomAccessFile search Has limited amount of search operands Does not take into account numerical values
![Page 15: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/15.jpg)
Lucene vs. Current XIS Index Fewer index files needed Allows for broader searches
Fuzzy matching Start and end wildcard searches
Recognizes numerical and date fields as such Can be utilized to remove stopwords
![Page 16: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/16.jpg)
New Lucene Search Process Establish index reader to perform search Submit query string containing fields and parameters Return results
![Page 17: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/17.jpg)
Other Lucene Functions Will be used for adding, updating, and deleting XIS
records Indexes will be housed on Data Harmony server
![Page 18: Using Lucene for Search within XIS](https://reader035.vdocuments.mx/reader035/viewer/2022062319/5550676fb4c905cc0f8b457e/html5/thumbnails/18.jpg)
Any Questions?