what's new in solr 5.0

26
What’s new in Apache Solr 5.0

Upload: anshum-gupta

Post on 15-Jul-2015

840 views

Category:

Software


5 download

TRANSCRIPT

Page 1: What's new in Solr 5.0

What’s new in Apache Solr 5.0

Page 2: What's new in Solr 5.0

Who am I?

• Anshum Gupta, Apache Lucene/Solr committer, Lucidworks Employee.

• Search and related stuff for 9+ years.

• Apache Lucene since 2006 and Solr since 2010.

• Organizations I am or have been a part of:

Page 3: What's new in Solr 5.0

Solr - Releases

Page 4: What's new in Solr 5.0

–Someone

Ease of Use: Because usability doesn’t end after the first five minutes!

Page 5: What's new in Solr 5.0

Scripts - Richer, faster, easier!

• Solr Demo:

• bin/post script

• Auto config-set copying

• Create -> Post -> Browse -> Delete

• bin/solr start -e cloud -noprompt ; bin/post -c gettingstarted http://lucidworks -recursive 2; open http://localhost:8983/solr/gettingstarted/browse

Page 6: What's new in Solr 5.0

Example is now Server

• No default collection1

• Configset options

• ant example server

• post.sh

Page 7: What's new in Solr 5.0

Posting documents was never so easy!

• bin/post script wraps around the improved SimplePostTool

• Index JSON directly OTB

• Developers: SolrServer is now SolrClient

Page 8: What's new in Solr 5.0

Managing Solr

Page 9: What's new in Solr 5.0

Managing Solr Configuration - Application• Paramsets: Add/Edit

• initParams: Generic appends, invariants and defaults outside of the component

• Schema API: REST API for adding field types, and dynamic fields

• Managing requestHandlers through API

• Implicit registration of replication, get and admin Handlers.

Page 10: What's new in Solr 5.0

Managing the cluster - Systems• Collection APIs

• BALANCESHARDUNIQUE: Even distribution of custom replica properties

• Improved APIs

• Option to not shuffle nodeSet specified during CREATE Collection

• Logging

• Transaction log replay status

• Slow request (optional)

• Support for editing common solrconfig.xml values

• Scripts to support installing and running Solr as a service on Linux.

Page 11: What's new in Solr 5.0

Keeping Solr Instance(s) Stable

• ReplicationHandler now has an option to throttle the speed of replication

• timeAllowed respected more widely - Query expansion, collection and LBHTTPSolrClient retries

• Finite default timeouts for select and update requests

Page 12: What's new in Solr 5.0

Scalability

Page 13: What's new in Solr 5.0

• Splitting of ClusterState • Every collection has its own cluster state

• No need to watch what everyone else is doing

• Might be the default in 5.0

• Improved Solr - Zk communication

• Speed up overseer operations avoiding cluster state reads from zookeeper at the start of each loop

• Better default timeouts to operate at a large scale

Page 14: What's new in Solr 5.0

–Johnny Appleseed

“Type a quote here.”

Solr scalability is unmatched.

Page 15: What's new in Solr 5.0

Features

Page 16: What's new in Solr 5.0

Distributed IDF• Multiple contributors and almost 5 years.

• 4 implementations OTB:

• LocalStatsCache: Local Stats

• ExactStatsCache: One time use aggregation

• ExactSharedStatsCache: Stats shared across requests

• LRUStatsCache: Stats shared in an LRU cache across requests

• Flow:

• Conditionally Send GET_TERM_STATS request to participating nodes

• Compute global values, another request for SET_TERM_STATS + GET_TOP_IDS

• Conditional GET_FIELDS

Page 17: What's new in Solr 5.0

Stats Component

• stats.field can now be used to generate stats over the numeric results of arbitrary functions,

• stats.field={!func}product(price,popularity)

• Stats hang off pivots via tags

Page 18: What's new in Solr 5.0

And there are more…

• DateRangeField for indexing date ranges, especially multi-valued ones.

• Spatial fields that used to require units=degrees now take distanceUnits=degrees/kilometers miles instead.

• MoreLikeThis QueryParser: Works in SolrCloud mode too.

• API for managing blobs

Page 19: What's new in Solr 5.0

and more…

• First class support in SolrJ for Collection API calls

• Upgrade Tika to 1.7: This adds support for parsing Outlook PST and Matlab (MAT) files.

Page 20: What's new in Solr 5.0

Maturity

• Jepsen tests

• More unit tests and more success stories of Solr.

• Protection of ZK content

Page 21: What's new in Solr 5.0

No more WAR!

• Solr is now an app, no more shipping a war starting Solr 5.0

• Upgrade to Jetty 9 coming soon

• Will allow for a lot of things (SPDY) that wouldn’t be possible if we had to support tomcat/netty/jetty everything else.

Page 22: What's new in Solr 5.0

Between 4.10 and 5.0: The new Identity

Page 23: What's new in Solr 5.0

Timeline*

• Release branch cut

• 2nd RC vote in progress.

• Vote - 3 days, 3 votes

• Artifacts propagation to ASF mirrors - 1 day

• Official release note - Right after!

* prospective and subject to how things go

Page 24: What's new in Solr 5.0

Coming soon

• Collections API: REBALANCESHARDS

• Spatial 2D heat-map faceting

• Facet and analytics

• Replication performance

• More API goodness

Page 25: What's new in Solr 5.0

Questions?

Page 26: What's new in Solr 5.0

Connect @

http://www.twitter.com/anshumgupta

http://www.linkedin.com/in/anshumgupta/

[email protected]