Configuring KonaKart to use the Solr Search Engine

Introduction

Since version 3.0.1.0, the enterprise version of KonaKart can use the Apache Solr search engine. Solr gives you fully indexed search using the Jakarta Lucene search engine. As well as being very fast regardless of the size of your catalog; it caters for misspellings, synonyms, plurals and alternate spellings.

Solr can be installed on a dedicated search server or on the same server and servlet engine where KonaKart is running. KonaKart Enterprise Extensions includes a war called solr.war which will install Solr when placed in the webapps directory of Tomcat. The standard KonaKart installation already creates a solr directory where Solr can be configured and where the indexed data is stored.

Configuration Instructions

The first step is to install Solr. As mentioned above, this can be done on a dedicated search server or simply by dropping solr.war into the Tomcat webapps directory. The following instructions assume that Solr is installed in the same container as KonaKart.

Next, KonaKart must be told where Solr is located and also instructed to use Solr rather than the standard database search. This can be achieved in the Configuration>>Solr Search Engine panel of the Admin App as shown in the screen shot below:

Configure Solr Search Engine

Configure Solr Search Engine

Once KonaKart has been configured to use Solr, you must instruct it to index the product catalog currently in the KonaKart database. This can be done in the Tools>Manage Solr Search Engine panel of the Admin App. This tool allows you to index all of the products or to remove all of the products from Solr. The "add" operation can be performed multiple times. Products that already exist will be overwritten and not added twice.

When KonaKart is enabled to use Solr, the Solr index will be updated automatically whenever a new product is added or an existing product is edited or deleted using the Admin App.

Customization of Solr

The Solr search engine behaves differently when compared to a relational database, so we make it straightforward to customize Solr to allow you to configure the search behavior in order to satisfy your requirements. For example, the standard KonaKart behavior when searching for a product using a search string is to add a wild card before and after the string in order to make the search work reliably. Let's say that the name of a product is "Hewlett Packard LaserJet 1100Xi" and a search is made for Laserjet. With a relational database, the product will not be found unless it has a leading and trailing wildcard. i.e. The search string becomes %laserjet%. However, with Solr the string is tokenized and the search for laserjet returns a result without requiring any wild cards so by default they are not added because this makes the query slower and affects the behavior of synonyms. However, if a search is made for Laser, the relational database with its wild cards will return a result whereas Solr will not return a result unless a wild card is added so that the search string becomes laser*.

Under KonaKart/java_api_examples/src/com/konakart/apiexamples you will find a Java file called MySolrMgr.java with a couple of methods that allow you to define how you want wild cards to be used for Solr searches. One method is used for managing wild cards when searching for products using text searches. The other method is used for managing the wild cards when searching by matching custom fields. If you decide to change the default behavior, you must edit the konakart.properties file in order to use the new manager rather than the standard manager.

konakart.manager.SolrMgr = com.konakart.bl.MySolrMgr

Suggested Search using Solr terms

When KonaKart is configured to use Solr (as described above), the store-front application automatically activates a Google GWT search widget which is normally invisible. As you type into the search box, a list of suggested search items appear matching the typed letters.

The KonaKart suggested search functionality uses Solr terms. For each product a number of terms are stored. The suggested search list is ordered by popularity of the term, so the more times a term has been saved, the greater chance it has of appearing in the search list. The default terms stored for each product are:

  • The category name(s) of the product. i.e. Televisions

  • The name of the product manufacturer. i.e. Sony

  • The name of the product. i.e. Vaio

  • The name of the product model. i.e. VPC-EB42FX

  • The name of the category by manufacturer. i.e. Televisions by Sony, Televisions by Philips. The added word "by" is read from the admin message catalog stored as "label.by".

  • The name of the manufacturer in a category. i.e. Sony in Televisions, Sony in Computers. The added word "in" is read from the admin message catalog stored as "label.in".

Each term is indexed with metadata containing the category, product and manufacturer ids so when a customer clicks on a term such as "Sony in Televisions" he is directed to a category view, displaying Sony products within the Televisions category. At this point he can choose another manufacturer remaining within the category and / or can apply extra filters such as screen size, LED, LCD etc. if the category has been set up with product tags to allow faceted search.

Under KonaKart/java_api_examples/src/com/konakartadmin/apiexamples you will find a Java file called MySolrMgr.java with a method that overrides the addTerm() method of the standard manager. This method allows you to decide which terms you want to index and which you want to exclude. By default, all terms are indexed. However, if for example all of your products belong to the same manufacturer or category, you may want to exclude certain terms. If you decide to change the default behavior, you must edit the konakartadmin.properties file in order to use the new manager rather than the standard manager.

konakart.admin_manager.AdminSolrMgr = com.konakartadmin.apiexamples.MySolrMgr

From version 6.3.0.0 the algorithm used to search for the search string within the term is configurable using regular expression. As can be seen from the image above, there are two configuration variables which contain the regex to add before the search string and the regex to add after the search string. The default for both configuration variables is ".*" which means that there can be any character (.) any number of times (*) before and after the search string to ensure that it finds substrings within the term. If both of these configuration variables are left empty, then the original algorithm is used where the search string has to match from the start of the term. For example if the term is "matrox g200 MMS" and the search string is "g200" the new algorithm will find the term whereas the old one wont. The old algorithm will only find the term with a search string of "mat...".

A Solr terms query returns all documents matching the search string including any documents that have been marked for deletion but not yet removed from the Solr index. In order for the standard SOLR Commit command to remove the documents, the expungeDeletes attribute must be set to true. i.e. <commit expungeDeletes="true" />. Setting this attribute degrades the performance of the Commit operation and so it can be configured through a configuration variable (see image above). The default setting is "true" although if you are not using Suggested Search it's more efficient to set it to false. In order to completely rebuild the Solr index you may always issue the Optimize command. e.g. http://localhost:8780/solr/update?optimize=true .

Faceted Search using Solr

KonaKart allows a configurable way to use the powerful faceted search functionality of Solr by allowing different facets to be defined for different product types. An example will be used in order to demonstrate how to use the Administration Application to configure product attributes to be used as facets.

The example uses the standard KonaKart demonstration database which contains three DVD products under the DVD Movies >> Drama category. The first step is to define a set of custom attributes which can be applied to DVDs and can be used to return facet values.

Define Custom Attributes

As can be seen from the image, each attribute is assigned a facet number (from 1 to 10) which is used to map that attribute to a facet field in the Solr schema. In this particular example, since the custom attributes can only take a fixed number of values, a drop list is defined (in the Set Function field) containing the allowed values. Once the custom attributes have been defined, a custom attribute template must be created to group the new attributes:

Define Custom Attribute Template

Once the template has been inserted, the three custom attributes may be added to the template by clicking on the Attributes button. The next step is to select each of the products within the Drama category:

Select Products in Drama Category

As shown below, each product within the Drama category must be associated with the DVD template created earlier.

Add template to product

Once the template has been added to the product, the values of the new custom attributes may be set by selecting allowed values from the drop lists and clicking the Save button.

Set custom attribute values

At this point, we've completed the setup procedure for products. Now what needs to be done is to create three Tag Groups (one for each custom attribute) mapped to the same facet numbers as the custom attributes.

Create tag groups

As can be seen from the above image, the Rating Tag Group is mapped to facet number 2 which is the same as the Rating custom attribute. This mapping also applies to the Genre and Type Tag Groups. Once the Tag Groups have been inserted they must be associated to the Drama Category as can be seen below. The order is important this will be the order in which they are read using the KonaKart API.

Add tag groups to category

The reasons for creating a Tag Group for each facet field are twofold. Within KonaKart, Solr faceted search may only be used if a category id is specified in the search query. A category id is mandatory because only similar products belonging to the same category and associated with the same template must be returned. If a DVD and another type of product such as a Keyboard were returned in the same query, it would be impossible to have a set of facets applicable to both products. When a customer clicks on a category in the store-front application, the code must retrieve the Tag Groups associated with that category and pass these to the Product Search API call. Using this information, the KonaKart engine code can determine the sort order for the returned facet data (i.e. the same as the sort order of the Tag Groups) and if Solr doesn't return any data for one or more facets, the API call still returns the facet value with no entries so that it may be displayed on the UI.

Before testing the configuration with some API calls, the products must be added to Solr as shown below:

Add products to Solr

The next part of this tutorial will demonstrate how the KonaKart Engine API may be used to retrieve products in the Drama Category using queries that return facet information and how to add the facet information as a constraint.


            /*
             * Get the tag groups for the Drama Category
             */
            TagGroupIf[] groups = eng.getTagGroupsPerCategory(dramaCatId,/* getProdCount */false,
                    KKConstants.DEFAULT_LANGUAGE_ID);

            /*
             * Create a ProductSearch object for the search
             */
            ProductSearch search = new ProductSearch();
            search.setReturnCustomFacets(true);
            search.setCategoryId(dramaCatId);
            search.setTagGroups(groups);

            ProductsIf prods = eng.searchForProducts(null, null, search, DEFAULT_LANGUAGE);

            for (int i = 0; i < prods.getCustomFacets().length; i++)
            {
                KKFacetIf facet = prods.getCustomFacets()[i];

                System.out.println(facet.getName() + " - " + facet.getNumber());
                if (facet.getValues() != null)
                {
                    for (int j = 0; j < facet.getValues().length; j++)
                    {
                        NameNumberIf value = facet.getValues()[j];
                        System.out.println("\t" + value.getName() + "(" + value.getNumber() + ")");
                    }
                }
            }

The above code retrieves the Tag Groups for the category. It then creates a ProductSearch object, passing it the Tag Groups, the Category Id and instructions to return custom facets. The print out from running the code can be seen below:


Rating - 2
	G(1)
	PG(1)
	R(1)
Type - 3
	Blu-ray(2)
	HD-DVD(1)
Genre - 1
	drama(3)
	

This is what we would expect because the products have been set up like this:

  • Product1: Rating = PG, Type = Blu-ray, Genre = drama

  • Product2: Rating = G, Type = Blu-ray, Genre = drama

  • Product3: Rating = R, Type = HD-DVD, Genre = drama

We can pass a constraint to the search by adding it to the relevant Product Group as shown below. The constraint is that the rating must be "R". Note that Rating is the first Tag Group in the list.


            /*
             * Get the tag groups for the Drama Category
             */
            TagGroupIf[] groups = eng.getTagGroupsPerCategory(dramaCatId,/* getProdCount */false,
                    KKConstants.DEFAULT_LANGUAGE_ID);

            /*
             * Create a ProductSearch object for the search
             */
            ProductSearch search = new ProductSearch();
            search.setReturnCustomFacets(true);
            search.setCategoryId(dramaCatId);
            groups[0].setFacetConstraint("R"); // The first tag group is Rating
            search.setTagGroups(groups);

            ProductsIf prods = eng.searchForProducts(null, null, search, DEFAULT_LANGUAGE);

            for (int i = 0; i < prods.getCustomFacets().length; i++)
            {
                KKFacetIf facet = prods.getCustomFacets()[i];

                System.out.println(facet.getName() + " - " + facet.getNumber());
                if (facet.getValues() != null)
                {
                    for (int j = 0; j < facet.getValues().length; j++)
                    {
                        NameNumberIf value = facet.getValues()[j];
                        System.out.println("\t" + value.getName() + "(" + value.getNumber() + ")");
                    }
                }
            }

This constraints forces KonaKart to return only one product as can be seen from the print out:


Rating - 2
	R(1)
Type - 3
	HD-DVD(1)
Genre - 1
	drama(1)