Chapter 14. Solr Search Engine

Table of Contents

Configuring KonaKart to use Solr
Instructions
Customization of Solr
Forcing Solr Usage
Suggested Search
Suggested Search using Solr terms
Suggested Spelling
Functionality
Setup
Catalog Support for Suggested Search and Spelling
Faceted Searching
Prices, Manufacturers and Categories
Custom Faceted Search
Multi-Valued Facets

Since version 3.0.1.0, the Enterprise version of KonaKart can use the Apache Solr search engine. (The Business version of KonaKart from v8.9.0.0 also supports Solr). Solr gives you fully indexed search using the Jakarta Lucene search engine. As well as being very fast regardless of the size of your catalog; it caters for misspellings, synonyms, plurals and alternate spellings.

Solr can be installed on a dedicated search server or on the same server where KonaKart is running. Prior to v8.5.0.0, KonaKart Enterprise Extensions includes a war called solr.war which will install Solr when placed in the webapps directory of Tomcat. From v8.5.0.0 Solr executes not as a webapp but as an independent process. The standard KonaKart installation already creates a solr directory where Solr can be configured and where the indexed data is stored. After a default installation, the Solr administration application is available on port 8983. For security reasons, it's important that you configure your firewall to block all requests using this port number to prevent Solr being accessed by external requests.

From v8.8.0.0 of KonaKart it is possible to choose between a number of different "Solr Access Modes". The purpose of these is to allow the integration with different configurations of Solr including set-ups which use a Zookeeper ensemble to manage a cooperating set of Solr instances. The different configuration options are described in more detail below.

Configuring KonaKart to use Solr

Instructions

The first step is to install Solr. As mentioned above, this can be done on a dedicated search server or by using the Solr installation provided in the standard KonaKart Business or Enterprise installation package.

The default set-up in KonaKart is to use the "Solr" Solr Access Mode. This is the Solr access mode used by KonaKart since v8.5.0.0. This requires a Base Solr URL and a Solr path (typically "solr/konakart") to be defined in the Solr configuration parameters.

For high-performance, fault-tolerant systems you can choose the "Zookeeper" Solr Access Mode. This Solr access mode requires a Zookeeper connection string to be defined that KonaKart uses to look up the "live" Solr instances. Multiple Zookeepers (3 or more on different hosts are recommended) and multiple Solr instances need to be set up in order to use the "Zookeeper" Solr Access Mode. The Zookeeper hosts string should take the form "127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183". KonaKart connects to one of these Zookeepers (chosen at random) and looks up the live Solr nodes information to obtain a URL to use for accessing Solr. From the set of live nodes it retrieves from Zookeeper it will choose one at random in order to balance the Solr requests to different instances. For setting up Zookeeper ensembles and multiple Solr instances it is recommended that you refer to Apache Solr website and one of the many tutorials available on the Internet then tune the set-up to meet your local requirements. In brief, you will need to:

  • Install, configure and start your Zookeepers

  • Install, configure and start your Solr instances

  • Load the Solr config to one of the ZooKeepers. An example command for this is:

    bin/solr.cmd zk upconfig -d C:/KonaKart/solr/server/solr/configsets/konakart -n konakart -z 127.0.0.1:2181
  • Create a Solr collection for use with KonaKart. An example command for this is:

    bin/solr.cmd create -c konakart -n konakart -shards 3 -replicationFactor 3

When using the "Zookeeper" Solr Acces Mode it is recommended that you set "Do not commit in code" to "true", "Delete from index on commit" to "false" and "Distributed terms searching" to "true".

In the default setting for the "Zookeeper Hosts" it defines zookeeper IP addreses on the same machine. In order to gain maximum fault-tolerance you should run your zookeepers and Solr instances on separate servers.

Whatever your chosen configuration for Solr it is recommended that access to the Solr Admin functionality is restricted or disabled (by default the Solr installation creates a webapp that allows access to the admin UI). Various techniques are possible to restrict access; see the Apache Solr website for recommended solutions.

KonaKart must be told where Solr is located (dependent on the chosen Solr Access Mode) and also instructed to use Solr rather than the standard database search. This can be achieved in the Configuration>>Solr Search Engine panel of the Admin App as shown in the screen shot below:

Configure Solr Search Engine

Configure Solr Search Engine

Once KonaKart has been configured to use Solr, you must instruct it to index the product catalog currently in the KonaKart database. This can be done in the Tools >> Manage Solr Search Engine panel of the Admin App. This tool allows you to index all of the products or to remove all of the products from Solr. The "add" operation can be performed multiple times. Products that already exist will be overwritten and not added twice.

Another way of indexing the products is to use the AddAllProductsToSearchEngine batch job available from the Admin App under Scheduler. This batch job allows you to configure the number of threads being used and the number of products read in a loop from the database, in order to maximize performance. The full source code is available and so can be used as a template for creating your own Solr importer where you may want to add constraints in order to decide which products are added. A log is generated which may be viewed during the import in order to track the progress.

When KonaKart is enabled to use Solr, the Solr index will be updated automatically whenever a new product is added or an existing product is edited or deleted using the Admin App.

Customization of Solr

The Solr search engine behaves differently when compared to a relational database, so we make it straightforward to customize Solr to allow you to configure the search behavior in order to satisfy your requirements. For example, the standard KonaKart behavior when searching for a product using a search string is to add a wild card before and after the string in order to make the search work reliably. Let's say that the name of a product is "Hewlett Packard LaserJet 1100Xi" and a search is made for Laserjet. With a relational database, the product will not be found unless it has a leading and trailing wildcard. i.e. The search string becomes %laserjet%. However, with Solr the string is tokenized and the search for laserjet returns a result without requiring any wild cards so by default they are not added because this makes the query slower and affects the behavior of synonyms. However, if a search is made for Laser, the relational database with its wild cards will return a result whereas Solr will not return a result unless a wild card is added so that the search string becomes laser*.

Under KonaKart/java_api_examples/src/com/konakart/apiexamples you will find a Java file called MySolrMgr.java with a couple of methods that allow you to define how you want wild cards to be used for Solr searches. One method is used for managing wild cards when searching for products using text searches. The other method is used for managing the wild cards when searching by matching custom fields. If you decide to change the default behavior, you must edit the konakart.properties file in order to use the new manager rather than the standard manager.

konakart.manager.SolrMgr = com.konakart.bl.MySolrMgr

When a search string contains multiple words there is an option to search for an exact match on the string or to tokenize the string into separate keywords and to search for the keywords AND'ed together. For example, rather than searching for "Electric Rotary Lawnmower" the search string becomes Electric AND Rotary AND Lawnmower. In order to activate the tokenizer you must set the tokenizeSolrInput attribute to true in the ProductSearch object which is sent as a parameter to the SearchForProducts API call.

By default the attributes of a product that are indexed and that can be used for searching are:

  • Product Name

  • Product Description including comparison data and description custom fields for all languages

  • Product Model

  • Product Manufacturer

In some cases you may wish to add other product attributes so that they become searchable through Solr. For example, these may include custom fields or the SKU.

Under KonaKart/java_api_examples/src/com/konakartadmin/apiexamples you will find a Java file called MySolrMgr.java with a method

public String getCustomSearchData(AdminProduct prod, AdminLanguage lang)

that overrides the method of the standard manager. It allows you to choose any data from the product object passed in as a parameter and to add it to a string (returned by the method) which will be indexed in Solr. Note that as explained in the Javadoc for the method, the indexed data will only be used by the storefront APIs when you set whereToSearch in the ProductSearch object to ProductSearch.SEARCH_IN_PRODUCT_DESCRIPTION .

If you decide to change the default behavior, you must edit the konakartadmin.properties file in order to use the new manager rather than the standard manager.

konakart.admin_manager.AdminSolrMgr = com.konakartadmin.apiexamples.MySolrMgr

In some situations you may have a different Solr set-up perhaps using a master slave configuration of some kind. In order to use other custom configurations effectively you can implement a custom "Solr Access Mode". If you wish to do this you need to do the following:

  • Set Solr Acces Mode to "Custom" in the Admin Console

  • Create your own versions of the SolrMgr and he AdminSolrMgr as described above

  • Set any required configuration variables you need in the refreshConfigs() method (that you would override in your versions of the managers. (You can add new configuration variables of your own if you wish)

  • In your version of the SolrMgr you need to specialise "public String getSolrUrl()" which needs to return the Solr Url as a String (for example http://localhost:8983/solr/konakart)

  • In your version of the AdminSolrMgr you need to specialise "public URL getSolrUrl()" which needs to return the Solr Url as a URL.

Forcing Solr Usage

Normally Solr is only used by KonaKart when doing text searches or when returning custom facets. In order to always use Solr, for example even when returning products for a category or manufacturer, you must set the forceUseSolr attribute of the ProductSearch object to true. The advantage in using Solr is that it will return extra information such as the minimum and maximum product prices of the result set, and manufacturer, category and price facets. The storefront application automatically uses Solr for all product search related API calls when it has been enabled.