multi faceted responsive search, autocomplete, feeds engine & logging
DESCRIPTION
Presented by Remi Mikalsen, Search Engineer, The Norwegian Centre for ICT in Education Learn how utdanning.no leverages open source technologies to deliver a blazing fast multi-faceted responsive search experience and a flexible and efficient feeds engine on top of Solr 3.6. Among the key open source projects that will be covered are Solr, Ajax-Solr, SolrPHPClient, Bootstrap, jQuery and Drupal. Notable highlights are ajaxified pivot facets, multiple parents hierarchical facets, ajax autocomplete with edge-n-gram and grouping, integrating our search widgets on any external website, custom Solr logging and using Solr to deliver Atom feeds. utdanning.no is a governmental website that collects, normalizes and publishes study information for related to secondary school and higher education in Norway. With 1.2 million visitors each year and 12.000 indexed documents we focus on precise information and a high degree of usability for students, potential students and counselors.TRANSCRIPT
Multi-faceted responsive search, autocomplete, feeds engine and logging
Remi MikalsenSearch Engineer, utdanning.no
Multi-faceted Multi-faceted responsive search, responsive search, autocomplete, autocomplete, feeds engine and feeds engine and logginglogging
Introduction
Remi MikalsenSearch engineer, utdanning.no
«Utdanning.no is the official Norwegian national education and career portal, and includes an overview of education in Norway and more than 500 career descriptions» - utdanning.no
« [...] Our main goals are to improve the quality of education and to improve learning outcomes and learning for children, pupils and students thourgh use of ICT in education» - iktsenteret.no
utdanning.no
Drupal 7 & Solr 3.6
~3 million visitors / year~12,000 documents~18,000,000 terms~260 fields
~1 QPS (~9M searches / year)
~8 ms latency
Data integration in the CMS
Universities, colleges and community colleges~30 different endpoints
~3500 documents
Folk high schools(non-academic)
1 national endpoint~650 documents
Secondary schools1 national endpoint~1100 documents
Higher education admissions(Samordna opptak)
1 national endpoint~1500 documents
Secondary schools metadata (Grep)
1 national endpoint~650 documents
Higher education metadata (NUS)
1 national endpoint~3500 documents
Transform & normalize
Drupal 7ER-model
Added value
Editorial staffProfessions, interviews,
education summaries, etc.~1500 documents
Professions metadata(STYRK)
2 national endpoints~1000 documents
Fetch data
Solr 3.6De-normalized
Searchable
Indexing
Drupal 7
Apache Solr Search Integration 7.x-1.1
Customizedbusiness logic
Solr 3.6
ProsBasic Drupal integrationTrack document changesSome facet supportEasily extendable
ConsLacks deep introspectingLittle de-normalizationHacky hierarchies (Drupal)
NoteCustom config files!schema.xml(mainly dynamic fields)
solrconfig.xml(mainly a drupal request handler)
We addedDeep introspectingData de-normalizationSolid hierarchy supportPivot facet supportAtomizationManual partial re-index
schema.xml - field types (auto) - various copy fields - better spell - bucket fields - autocomplete
Organization (school)
Study programStudy programStudy program
Organization (school)
+
all its Study programs
Drupal DB Solr documents
Study program
+
Organization
<doc> <str name="id">394353</str> <bool name="bs_mainsearch">true</bool> <str name="bundle">org</str> <str name="bundle_name">Organization</str> <str name="label">ACME University</str> <str name="atom">[XML]</str> <arr name="related_nodes"> <str>ACME Rocket Science</str> <str>Study program 2</str> <str>Study program N</str> </arr>
<arr name="sm_geography_hierarchy"> <str>1>California</str> <str>2>California>San Diego</str> <str>3>California>San Diego>Gaslamp Quarter</str> </arr>
<str name="ss_menu_1">orgmenu</str> <str name="ss_menu_2">org</str></doc>
<doc> <str name="id">394354</str> <bool name="bs_mainsearch">true</bool> <str name="bundle">he</str> <str name="bundle_name">Higher Education</str> <str name="label">ACME Rocket Science</str> <str name="atom">[XML]</str>
<arr name="sm_offered_by"> <str>ACME University</str> </arr> <arr name="sm_study_area"> <str>Engineering</str> <str>Science</str> </arr>
<long name="its_field_semesters">8</long>
<str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">he</str></doc>
Searching
- Site search
- Embedded search
- Feeds engine
Site search
Our goalStudents, councelors and teachers must find what they look for
How? - Interaction design (IxD) vs graphical design - User testing, user testing and user testing (and experience)
- Resulting in a GUI specification we must implement
Ajax-Solr is our JS framework:https://github.com/evolvingweb/ajax-solr/wiki/reuters-tutorial - manages all querying - widgets for interaction with and displaying results - events fire search requests which updates widgets
We extended it heavily - Developed all our widgets (10+) - Added logging (async, via ajax, local and GA) - Distributed configuration (server + client) - Simplified initialization script
But it also works out of the box!
Logger~200 lines
JS library~1700 lines Solr 3.6
Our Website
Solr proxy~85 lines
ajax-solrevolvingweb
SolrPhpClientr60Default config
Initialize (config)
JS library(copy)Search
ACME EngineeringLorum sollicitudin nunc id nibh blandit pellentesque ipsum.
ACME LawCras nunc id nibh blandit pellentesque sollicitudin.
ACME MedIpsum ollicitudin nunc id blandit nibh pellentesque nibh.
- Include JS library- Initialize- Set up HTML- Search! (and log)
Site search – widgets & faceting
Ajax Solr allows defining N widgets
«Everything» is a widget
A facet is an instance of a FacetWidget
Interaction with widgets may fire query
All facetation is piped into one query
All widgets are updated after Solr response
Some facet widgets we have developed - Plain
Facet values and facet counts in a listMultiple (AND) or single choice
- HierarchicalFacet values and facet counts in a listClicking on a facet value drills down into the hierarchy; facet.prefix + fq
- DropdownDisplays facet values in a dropdown listUseful for mobile devices in our responsive theme
- TagcloudFacet values in a tagcloud
- Pivot facetOur menu system
Adding facets
Configfacets['interests'] = new facetobject('tagcloud', 'field_interests', '#interests');facets['ispublic'] = new facetobject('plain', 'field_ispublic', '#ispublic');config['facets'] = facets;
HTML<ul id="interests"></ul><ul id="ispublic"></ul>
INITIALIZEManager.addFacets(config);
Example widget codeAjaxSolr.PlainFacetWidget = AjaxSolr.AbstractFacetWidget.extend({ multivalue: true, target: null, // HTML target id field: null, // Solr-field
facet_display_limit: 5, // Max facets to display before «See more» facet_field_sort: null, // Optional facet sort dependencies: null, // Conditional display of facet
facet_display_more: 'See more', facet_display_less: 'See less',
...
init: function() { ...} beforeRequest: function() { ... } afterRequest: function() { ... }});
Site search – pivot facet
Pivot faceting allows you to facet within the results of the parent facet
- http://wiki.apache.org/solr/SimpleFacetParameters
Slight problem; we don't run Solr 4.x!
ProblemMenu facets shouldn't affect each other, but affect search result and other facets
Our solutionSolr document 1 <str name="ss_menu_1">orgmenu</str> <str name="ss_menu_2">org</str>
Solr document 2 <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">higher_ed</str>
Solr document 3 <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">secondary</str>
Solr query when a top level menu tab is selected fq={!tag=ss_menu_1}ss_menu_1:edumenu& facet.field={!ex=ss_menu_1}ss_menu_1
Solr query when a sub-level menu tab is selected fq={!tag=ss_menu_1}ss_menu_1:edumenu& fq={!tag=ss_menu_1,ss_menu_2}ss_menu_2:higher_ed& facet.field={!ex=ss_menu_1}ss_menu_1& facet.field={!ex=ss_menu_2}ss_menu_2
Drawbacks - Can be VERY slow on large indexes with many unique terms in the facet
Why do we do it?
- Small index; 18M terms, 12K documents - Pivot facet fields have very few distinct values (5-8)!
Site search - autocomplete
Our goalGive our users the feeling that we've implemented a mind-reader
How?With relevant, grouped suggestions* as they type in a search query
Do we succeed?50% of our «clicks to content» from searches comes from autocomplete
Implementing autocomplete is «easy» 1) Ajax 2) Detect keystrokes 3) Send one request per keystroke 4) Receive results, populate result list
Techniques we employ - Minimal payload (reduced fl) - But same boosts and qf as «normal» queries - group=true, group.field=, group.limit= - start_label^1.5 wild_label^1 wild_other^0.25 - Caching (jsonp, cache=true)
Define field type <fieldType name="startsWith" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> </analyzer> </fieldType>
Define fields <field name="start_label" type="startsWith" indexed="true" stored="false" multiValued="false"/>
Copy fields <copyField source="label" dest="start_label"/>
Define field type <fieldType name="wildCardType" class="solr.TextField" omitNorms="true"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="front"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="back"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="false"/> <filter class="solr.NorwegianLightStemFilterFactory"/> </analyzer> </fieldType>
Define fields <field name="wild_label" type="wildCardType" indexed="true" stored="false" multiValued="false"/> <field name="wild_other" type="wildCardType" indexed="true" stored="false" multiValued="true"/>
Copy fields <copyField source="label" dest="wild_label"/> <copyField source="teaser" dest="wild_other"/> <copyField source="body" dest="wild_other"/> <copyField source="searchwords" dest="wild_other"/> <copyField source="related_nodes" dest="wild_other"/>
Embedded search
Our goalLet other sites search our data
How?The exact same way we do ourselves
Do we succeed?Two external sites are up and running and a third is on its way
Logger~200 lines
JS library~1700 lines Solr 3.6
ACME Website
Solr proxy~85 lines
ajax-solrevolvingweb
ACME config SolrPhpClientr60
Default config
Config (override)
JS library(copy)Search
ACME EngineeringLorum sollicitudin nunc id nibh blandit pellentesque ipsum.
ACME LawCras nunc id nibh blandit pellentesque sollicitudin.
ACME MedIpsum ollicitudin nunc id blandit nibh pellentesque nibh.
- Register with us- Include our JS library- Set up config- Set up HTML- Search! (and log)
<html> <head>
<title>ACME Website</title>
<!-- utdanning.no search framework --> <script src="/js/jquery.js"></script> <script src="http://example.com/solrservice/js-min/solr-search-full-min.js"></script> <script src="/js/search-init.js"></script>
</head> <body> <!-- Search form --> <form> <input id="query" name="query" type="search" /> <input type="submit" value="Search" /> </form>
<!-- Search results --><div><ul class="hits" id="hits"></ul></div>
</body>
</html>
<script type="text/javascript">
// ACME mockup init-script
var Manager; // Search manager object uno_config = loadConfig(http://example.com/solrservice/.../acme.config);
// Fully customizable search configuration, e.g.: uno_config['server']['qf'] = 'label^1.8 content^1.2';
// Search box widgetManager.addPlainSearch(uno_config);
// Result list widgetManager.addResults(uno_config);
Manager.finalizeConfig(uno_config);
Manager.doRequest(); // Optional
Site owners have full controlAdd, edit and configure widgetsQuery fields, boosts, etc.FacetingStylingPre-limit search to parts of our index
Because we eat our own dog food!
Feeds engine
Our goalDeliver data in bulk to partner organizations
How?Restful searchable data endpoint that returns XML (Atom++)
Do we succeed?Beta-partner up and running with stunning performance
ConsumerQuery
Default config
Feeds engine~300 lines
Solr proxy~85 lines Solr 3.6
Logger~200 lines
SolrPhpClientr60
Feeds engine - Parses incoming query - Loads config (filters, weights, ...) - Transforms incoming + config to Solr URL - Sends to Solr proxy
Solr Proxy - Loads Solr PHP Client library - Sends search request and parses response - Returns results to Feeds engine
Feeds engine - Loads logger and logs results - Picks out ATOM from response - Glues result inside an ATOM frame - Display feed
http://example.com/data/atom/organizationshttp://example.com/data/atom/organizations/10/2http://example.com/data/atom/organizations?fq=type:HEhttp://example.com/data/atom/organizations?fq=type:HE&q=law
Consume with feeds reader
Logging
How?
Logging back-end written in PHP that writes to a MySQL database- called asynchronously from JS library- called inline in Feeds engine
Google Analytics (ga.js)- called from JS library (searchwords and categories)
What?
- Search terms - Facets - User interaction - List of search results - Stack latency (JS, PHP, Solr) - Search domain - Session
Why?
Most popular queries with no results?
Most popular queries?
How does QPS affect latency?
Follow a user through search (interaction design & user testing)
Displaying logs
Charts are generated with Google Chart Tools in Drupal
Other statistics can easily be explored with Drupal Views
Demo (includes responsiveness)
http://utdanning.no/sok
http://utdanning.no/search
http://utdanning.no/solrservice/utdanning.no
Drupal 7Apache Solr Search Integration+ custom indexingOmega theme (responsiveness with Drupal)+ custom js
Ajax Solr+ custom widgetsSolr Php Client r60+ custom proxyBootstrap (responsiveness without Drupal)
jQueryGoogle Chart Tools
Remi MikalsenRemi [email protected]@iktsenteret.no
iktsenteret.noiktsenteret.no
Multi-faceted Multi-faceted responsive search, responsive search, autocomplete, autocomplete, feeds engine and feeds engine and logginglogging
CONTACTRemi [email protected]