javaedge09 : java indexing and searching
DESCRIPTION
From AlphaCSP's Java conference - JavaEdge09. The presentation of myself and Evgeny Borisov about 'Java Indexing and Searching' In this session we discussed the need of Full Test Search (as opposed to regular textual/SQL search) , Lucene and it's OO mismatches, the solution that Hibernate Search provides to those mismatches and then a bit about Lucene's scoring algorithm.TRANSCRIPT
![Page 1: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/1.jpg)
1
Java Indexing and SearchingBy : Shay Sofer & Evgeny Borisov
![Page 2: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/2.jpg)
2
» Motivation» Lucene Intro» Hibernate Search» Indexing» Searching» Scoring» Alternatives
Agenda
![Page 3: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/3.jpg)
3
MotivationWhat is Full Text Search and why do I need it?
![Page 4: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/4.jpg)
4
Id Title Price
1 Head First Java 200
2 JBoss in action 120
3 Best jokes about Chuck Norris 250
4 Best of the best of the best 10
Motivation
Use case“Book” table
Good practices for Gava
![Page 5: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/5.jpg)
5
» We’d like to : Index the information efficiently answer queries using that index
» More common than you think
Full Text Search
Motivation
![Page 6: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/6.jpg)
6
» Integrated full text search engine in the database e.g. DBSight, Recent versions of MySQL, MS SQL Server,
Oracle Text, etc» Out of the box Search Appliances
e.g. Google Search Appliance» Third party libraries
Full Text Search Solutions
Motivation
![Page 7: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/7.jpg)
7
Lucene Intro
![Page 8: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/8.jpg)
8
» The most popular full text search library» Scalable and high performance» Around for about 9 years» Open source » Supported by the Apache Software Foundation
Apache Lucene
Lucene Intro
![Page 9: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/9.jpg)
9
Lucene Intro
![Page 10: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/10.jpg)
10
» “Word-oriented” search» Powerful query syntax
Wildcards, typos, proximity search.» Sorting by relevance (Lucene’s scoring algorithm) or
any other field» Fast searching, fast indexing
Inverted index.
Lucene’s Features
Lucene Intro
![Page 11: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/11.jpg)
11
Head First Java
Best of the best of the best
Chuck Norris in action
JBoss in action
Head 0
First 0
Java 0
Action 2 3
Best 1
JBoss 3
Chuck 2
Norris 2
0
2
1
3
Lucene Intro
Inverted Index DB
![Page 12: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/12.jpg)
12
» A Field is a key+value. Value is always represented as a String (Textual)
» A Document can contain as many Fields as we’d like» Lucene’s index is a collection of Documents
Basic Definitions
Lucene Intro
![Page 13: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/13.jpg)
13
Lucene Intro
Using Lucene API…IndexSearcher is = new IndexSearcher(“BookIndex");QueryParser parser = new QueryParser("title",
analyzer);
Query query = parser.parse(“Good practices for Gava”);return is.search(query);
![Page 14: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/14.jpg)
14
OO domain model Vs. Lucene’s Index structure
Lucene Intro
Extensible type system
Strong type system
Polymorphic
OO Domain ModelIndex structure
![Page 15: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/15.jpg)
15
» The Structural Mismatch Converting objects to string and vice versa No representation of relation between Documents
» The Synchronization Mismatch DB must by sync’ed with the index
» The Retrieval Mismatch Retrieving documents ( =pairs of key + value) and not objects
Object vs Flat text mismatches
Lucene Intro
![Page 16: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/16.jpg)
16
Hibernate Search
Emmanuel Bernard
![Page 17: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/17.jpg)
17
» Leverages ORM and Lucene together to solve those mismatches
» Complements Hibernate Core by providing FTS on persistent domain models.
» It’s actually a bridge that hides the sometimes complex Lucene API usage.
» Open source.
Hibernate Search
![Page 18: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/18.jpg)
18
» Document = Class (Mapped POJO)» Hibernate Search metadata can be described by
Annotations only» Regardless, you can still use Hibernate Core with XML
descriptors (hbm files)
» Let’s create our first mapping – Book
Mapping
Hibernate Search
![Page 19: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/19.jpg)
19
@Entity @Indexedpublic class Book implements Serializable { @Id private Long id;
@Boost(2.0f) @Field
private String title;
@Field private String description;
private String imageURL;
@Field (index=Index.UN_TOKENIZED) private String isbn; … }
Hibernate Search
![Page 20: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/20.jpg)
20
» Types will be converted via “Field Bridge”.» It is a bridge between the Java type and its
representation in Lucene (aka String)» Hibernate Search comes with a set for most standard
types (Numbers – primitives and wrappers, Date, Class etc)
» They are extendable, of course
Bridges
Hibernate Search
![Page 21: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/21.jpg)
21
» We can use a field bridge…
@FieldBridge(impl = MyPaddedFieldBridge.class, params = {@Parameter(name="padding",
value=“5")} )public Double getPrice(){ return price;}
» Or a class bridge - incase the data we want to index is more than just the field itself e.g. concatenation of 2 fields
Custom Bridges
Hibernate Search
![Page 22: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/22.jpg)
22
» In order to create a custom bridge we need to implement the interface StringBridge
» ParameterizedBridge – to inject params
Custom Bridges
Hibernate Search
![Page 23: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/23.jpg)
23
» Directory is where Lucene stores its index structure.» Filesystem Directory Provider» In-memory Directory Provider» Clustering
Directory Providers
Hibernate Search
![Page 24: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/24.jpg)
24
» Default» Most efficient» Limited only by the disk’s free space» Can be easily replicated» Luke support
Filesystem Directory Provider
Hibernate Search
![Page 25: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/25.jpg)
25
» Index dies as soon as SessionFactory is closed.» Very useful when unit testing. (along side with
in-memory DBs)» Data can be made persistent at any moment, if
needed.» Obviously, be aware of OutOfMemoryException
In-memory Directory Provider
Hibernate Search
![Page 26: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/26.jpg)
26
<!-- Hibernate Search Config --><property
name="hibernate.search.default.directory_provider"> org.hibernate.search.store.FSDirectoryProvider
</property>
<property name="hibernate.search.com.alphacsp.Book.directory_provider"> org.hibernate.search.store.RAMDirectoryProvider</property>
Directory Providers Config Example
Hibernate Search
![Page 27: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/27.jpg)
27
» Correlated queries - How do we navigate from one entity to another?
» Lucene doesn’t support relationships between documents
» Hibernate Search to the rescue - Denormalization
Relationships
Hibernate Search
![Page 28: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/28.jpg)
28
Hibernate Search
![Page 29: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/29.jpg)
29
@Entity @Indexedpublic class Book{ @ManyToOne @IndexEmbedded
private Author author;}
@Entity @Indexedpublic class Author{
private String firstName;}
» Object navigation is easy (author.firstName)
Relationships
Hibernate Search
![Page 30: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/30.jpg)
30
» Entities can be referenced by other entities.
Relationships – Denormalization Pitfall
Hibernate Search
![Page 31: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/31.jpg)
31
» Entities can be referenced by other entities.
Relationships – Denormalization Pitfall
Hibernate Search
![Page 32: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/32.jpg)
32
» Entities can be referenced by other entities.
Relationships – Denormalization Pitfall
Hibernate Search
![Page 33: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/33.jpg)
33
» The solution: The association pointing back to the parent will be marked with @ContainedIn
@Entity @Indexedpublic class Book{ @ManyToOne @IndexEmbedded private Author author;}
@Entity @Indexedpublic class Author{
@OneToMany(mappedBy=“author”) @ContainedIn private Set<Book> books;
}
Relationships – Solution
Hibernate Search
![Page 34: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/34.jpg)
34
» Responsible for tokenizing and filtering words » Tokenizing – not a trivial as it seems» Filtering – Clearing the noise (case, stop words etc) and
applying “other” operations» Creating a custom analyzer is easy
» The default analyzer is Standard Analyzer
Analyzers
Hibernate Search
![Page 35: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/35.jpg)
35
» StandardTokenizer : Splits words and removes punctuations.» StandardFilter : Removes apostrophes and dots from acronyms.» LowerCaseFilter : Decapitalizes words.» StopFilter : Eliminates common words.
Standard Analyzer
Hibernate Search
![Page 36: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/36.jpg)
36
Other cool Filters….
Hibernate Search
![Page 37: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/37.jpg)
37
» N-Gram algorithm – Indexing a sequence of n consecutive characters.
» Usually when a typo occurs, part of the word is still correct Encyclopedia in 3-grams = Enc | ncy | cyc | ycl | clo | lop | ope | ped | edi | dia
Approximative Search
Hibernate Search
![Page 38: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/38.jpg)
38
» Algorithms for indexing of words by their pronunciation
» The most widely known algorithm is Soundex » Other Algorithms that are available : RefinedSoundex,
Metaphone, DoubleMetaphone
Phonetic Approximation
Hibernate Search
![Page 39: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/39.jpg)
39
» Synonyms You can expand your synonym dictionary with your own
rules (e.g. Business oriented words)
» Stemming Stemming is the process of reducing words to their stem,
base or root form. “Fishing”, “Fisher”, “Fish” and “Fished” Fish Snowball stemming language – supports over 15
languages
Synonyms & Stemming
Hibernate Search
![Page 40: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/40.jpg)
40
» Lucene is bundled with the basic analyzers, tokenizers and filters.
» More can be found at Lucene’s contribution part and at Apache-Solr
Additional Analyzers
Hibernate Search
![Page 41: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/41.jpg)
41
» No free Hebrew analyzer for Lucene» Itamar Syn-Hershko
Involved in the creation of CLucene (The C++ port of Lucene) Creating a Hebrew analyzer as a side project Looking to join forces [email protected]
Hebrew?
Hibernate Search
![Page 42: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/42.jpg)
42
Hibernate Search
אחוות הטבעתשר הטבעות, גירסה ראשונה:
![Page 43: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/43.jpg)
43
» Motivation» Lucene Intro» Hibernate Search» Indexing» Searching» Scoring» Alternatives
Agenda
![Page 44: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/44.jpg)
44
» When data has changed?» Which data has changed?» When to index the changing data?» How to do it all efficiently?
Hibernate Search will do it for you!
Transparent indexing
Indexing
![Page 45: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/45.jpg)
45
Indexing – On Rollback
Application
Session (Entity Manager)
DB
Lucene Index
Insert/update
delete
Queue
Start Transaction
![Page 46: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/46.jpg)
46
Indexing – On Rollback
Application
Session (Entity Manager)
DB
Lucene Index
Insert/update
delete
QueueTransaction failed
Rollback
Start Transaction
![Page 47: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/47.jpg)
47
Indexing – On Commit
Application
Session (Entity Manager)
DB
Lucene Index
Insert/update
delete
QueueTransaction Committed
√
![Page 48: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/48.jpg)
48
<property name="org.hibernate.worker.execution“>async</property>
<property name="org.hibernate.worker.thread_pool.size“>2 </property>
<property name="org.hibernate.worker.buffer_queue.max“>10</property>
hibernate.cfg.xml
Indexing
![Page 49: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/49.jpg)
49
It’s too late! I already have a database without Lucene!
Indexing
![Page 50: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/50.jpg)
50
» FullTextSession extends from Session of Hibernate core Session session = sessionFactory.openSession(); FullTextSession fts = Search.getFullTextSession(session);
» index(Object entity)» purge(Class entityType, Serializable id)» purgeAll(Class entityType)
Manual indexing
Indexing
![Page 51: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/51.jpg)
51
tx = fullTextSession.beginTransaction(); //read the data from the database Query query = fullTextSession.createCriteria(Book.class); List<Book> books = query.list(); for (Book book: books ) {
fullTextSession.index( book); } tx.commit();
Manual indexing
Indexing
![Page 52: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/52.jpg)
52
tx = fullTextSession.beginTransaction(); List<Integer> ids = getIds(); for (Integer id : ids) { if(…){ fullTextSession.purge(Book.class, id ); } } tx.commit();
» fullTextSession.purgeAll(Book.class);
Removing objects from the Lucene index
Indexing
![Page 53: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/53.jpg)
53
Rrrr!!! I got an OutOfMemoryException!
Indexing
![Page 54: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/54.jpg)
54
session.setFlushMode(FlushMode.MANUAL);session.setCacheMode(CacheMode.IGNORE);Transaction tx=session.beginTransaction();ScrollableResults results =
session.createCriteria(Item.class) .scroll(ScrollMode.FORWARD_ONLY);
int index = 0;while(results.next()) { index++; session.index(results.get(0)); if (index % BATCH_SIZE == 0){ session.flushToIndexes(); session.clear();
} }tx.commit();
Indexing
54
100
![Page 55: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/55.jpg)
55
Searching
![Page 56: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/56.jpg)
56
title : lord title: rings+title : lord +title: rings title : lord –author: Tolkien title: r?ngs title: r*gs title: “Lord of the Rings” title: “Lord Rings”~5 title: rengs~0.8 title: lord author: Tolkien^2And more…
Lucene’s Query Syntax
Searching
![Page 57: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/57.jpg)
57
» To build FTS queries we need to: Create a Lucene query Create a Hibernate Search query that wraps the Lucene
query
Why?» No need to build framework around Lucene» Converting document to object happens
transparently.» Seamless integration with Hibernate Core API
Querying
Searching
![Page 58: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/58.jpg)
58
String stringToSearch = “rings";Term term = new Term(“title",stringToSearch);TermQuery query = new TermQuery(term);FullTextQuery hibQuery = session.createFullTextQuery(query,Book.class);
List<Book> results = hibQuery.list();
Hibernate Queries Examples
Searching
![Page 59: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/59.jpg)
59
String stringToSearch = "r??gs";Term term = new Term(“title",stringToSearch);WildCardQuery query = new WildCardQuery (term);...
List<Book> results = hibQuery.list();
WildCardQuery Example
Searching
![Page 60: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/60.jpg)
60
Id Title Price
1 Head First Java 200
2 Chuck Norris in action 120
3 Chuck Norris vs JBoss 120
4 JBoss strikes back 10
Motivation
Use caseBook table
Good practices for Gava
![Page 61: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/61.jpg)
61
HS Query Flowchart
Searching
Loads objects from the Persistence Context
Hibernate
SearchQuery
Client
LuceneIndex
DB
Query the index
Persistence Context
DB access
(if needed)
Receive matching ids
![Page 62: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/62.jpg)
62
» You can use list(), uniqueResult(), iterate(), scroll() – just like in Hibernate Core !
» Multistage search engine» Sorting» Explanation object
Querying tips
Searching
![Page 63: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/63.jpg)
63
Score
![Page 64: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/64.jpg)
64
» Most based on Vector Space Model of Salton
Score
![Page 65: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/65.jpg)
65
» Most based on Vector Space Model of Salton
Score
![Page 66: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/66.jpg)
66
Term Rating
Score
total number of documents containing term “I”
term weightnumber of documents in the index
Logarithm
best java in action books
![Page 67: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/67.jpg)
67
Term Rating Calculation
Score
0=)500
500log(
2=)50
5000log(
3=)5
5000log(
![Page 68: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/68.jpg)
68
1. Head First Java2. Best of the best of the best3. Best examples from Hibernate in action4. The best action of Chuck Norris
Scoring example
Score
Search for: “best java in action books”Term Frequency ScoreJava 1 Best 3Action 2
0.124940.30103
0.60206
![Page 69: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/69.jpg)
69
» Conventional Boolean retrieval» Calculating score for only matching documents» Customizing similarity algorithm» Query boosting» Custom scoring algorithms
Lucene’s scoring approach
Score
![Page 70: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/70.jpg)
70
Alternatives
![Page 71: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/71.jpg)
71
Alternatives
Shay Banon
![Page 72: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/72.jpg)
72
Alternatives
Simple
Lucene based
Configurable via XML or
annotations
Local & External TX Manager
Integrates with popular ORM frameworks
Spring support
Distributed
![Page 73: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/73.jpg)
73
Alternatives
![Page 74: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/74.jpg)
74
» Enterprise Search Server Supports multiple protocols (xml, json, ruby, etc...)
» Runs as a standalone Full Text Search server within a servlet e.g. Tomcat
» Heavily based on Lucene» JSA – Java Search API (based on JPA)
ODM (Object/Document Mapping) Spring integration (Transactions)
Apache Solr
Alternatives
![Page 75: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/75.jpg)
75
» Powerful Web Administration Interface Can be tailored without any Java coding!
» Extensive plugin architecture» Server statistics exposed over JMX» Scalability – easily replicated
Apache Solr
Alternatives
![Page 76: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/76.jpg)
76
Resources
Lucene
Lucene contrib part
Hibernate Search
Hibernate Search in Action / Emmanuel Bernard, John Griffin
Compass
Apache Solr
![Page 77: JavaEdge09 : Java Indexing and Searching](https://reader036.vdocuments.mx/reader036/viewer/2022062513/55557c57b4c9058a5a8b5113/html5/thumbnails/77.jpg)
77
Thank you!Q & A