Google Search Appliance (GSA) is Google’s search solution for the Businesses on their private data stored in the various formats. Most of the Businesses need the crawl and search capability of Google (or something similar) for the quicker access to the private data. In the absence of these features, the organization end up wasting a lot of time in finding the relevant document or they end up recreating an already existing data/document.
GSA is not an Open Source solution and depending on your need, it does cost you significant money. This is where we felt a need to have an alternate solution. A solution which is implemented with Open Source Solutions and offers similar capabilities/features as GSA. Since fast, accurate and controlled search is the key criteria, we decided to make use of one of the most popular open source search engine, ElasticSearch.
ElasticSearch is an Open Source Search & Analytics Engine built on top of the Apache Lucene. It is mainly focused on document storage and retrieval, searching and sorting of documents. It was designed to be used in distributed environments by providing flexibility and scalability.
As a part of this article, I am listing the most popular features of GSA and I will walk you through the implementation of Spell Checker capability of GSA using ElasticSearch. In a series of articles, I am going to show the implementation of other features as well.
Major GSA functionalities
Following are the major GSA features, which business use for different reasons.
- Spell Checker
- Self-learning scorer
- Highlight query terms
- Dynamic navigation
- Query Suggestions
- Query Suggestions Blacklist
- Synonyms
- Related Queries
- Collecting metrics
- Advanced Search
- Sorting by metadata
- Autocomplete
- Wildcard search
In this article, we will focus on Spell Checker!
Problem Statement
We have an e-commerce application where people will come and search for the products. There is a possibility that people may type the wrong word while searching. To handle this, the application should be smart enough to suggest the proper spellings for the requested search term.
Prerequisites
- Proficient in J2SE, J2EE
- Proficient in ElasticSearch concepts
- GSA functionality understanding
Spell Checker implementation using ElasticSearch
As part of this feature implementation, ElasticSearch should check the spelling of search queries and offer spelling suggestions to Users.
The Spell Checker should use the ElasticSearch document’s data to make spelling suggestions. Spelling suggestions should be derived from ElasticSearch index documents dynamically based on the search query.
A single spelling suggestion is returned with the results for queries when the Spell Checker detects a possible spelling suggestion. Spelling suggestions are not automatically enabled by default, we need to make certain changes in ElasticSearch index.
Setup
Create ES Index Settings with Spell Checker Analyzer. We can query the Spell Checker analyzer for spelling suggestions in ES Index.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293 PUT ecommerce_parts {"settings": {"index": {"analysis": {"filter": {"stemmer": {"type": "stemmer","language": "english"},"stopwords": {"type": "stop","stopwords": ["_english_"]}},"analyzer": {"SpellChecker": {"type": "custom","char_filter": ["html_strip"],"filter": ["lowercase"],"tokenizer": "standard"},"default": {"type": "custom","char_filter": ["html_strip"],"filter": ["lowercase","stopwords","stemmer"],"tokenizer": "standard"}}},"number_of_replicas": "1","number_of_shards": "5","refresh_interval": "1000"}}}
Create ES Index Mappings
Created one additional field (spell_checker) in ES Index to link with above SpellChecker Analyzer to copy the ES Index field’s value into this field for spelling suggestions. Add this copy statement only for fields, which are required for spelling suggestions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137 PUT ecommerce_parts/_mapping/ecommerce_parts_type{"properties": {"BrandName": {"type": "string","index": "not_analyzed","fields": {"raw": {"type": "string"}},"copy_to": ["spell_checker"]},"Cat": {"type": "string","index": "not_analyzed","fields": {"raw": {"type": "string"}},"copy_to": ["spell_checker"]},"Desc": {"type": "string","index": "not_analyzed","fields": {"raw": {"type": "string"}},"copy_to": ["spell_checker"]},"SubCat": {"type": "string","index": "not_analyzed","fields": {"raw": {"type": "string"}},"copy_to": ["spell_checker"]},"Term": {"type": "string","index": "not_analyzed","fields": {"raw": {"type": "string"}},"copy_to": ["spell_checker"]},"spell_checker": {"type": "string","analyzer": "SpellChecker"}}}
Demonstration:
Search the documents with BrandName (‘Sprayaway’) and verify the results
Now query the SpellChecker analyzer for spelling suggestions with ‘Sprayaway’ search term and verify the spelling suggestions. The expectation is that there should not be any spelling suggestion because there is a Brand with ‘Sprayaway’ name.
In above query result, options array gives the spelling suggestions but it is empty for ‘Sprayaway’ search term. It is expected behavior.
Now query the SpellChecker analyzer for spelling suggestions with ‘Sprayway’ wrong Brand Name and verify the spelling suggestions. The expectation is that there should be spelling suggestion because there is no Brand with ‘Sprayway’ name.
In above query search results, we can see the ‘sprayaway’ as a spelling suggestion because we gave wrong Brand Name (‘Sprayway’), with this exercise we can say that Spell Checker is working as expected.
Summary
As a part of this article, I have listed the most popular features of GSA. Also, I have explained one specific use case of GSA and how it can be implemented using ElasticSearch. In a series of articles, I am going to show the implementation of other features as well. Hope, you are able to use this article to make better use of ElasticSearch.
At WalkingTree, we have been using ElasticSearch and related product suite for few years and we would love to help you take the advantage of this product.