Design Search AutoCompletion using plain Java

Design Search AutoCompletion using plain Java

·

4 min read

This article intends to discuss the low-level implementation of Search Autocompletion using Java. Let's cut the crap and directly mention that we will be using Trie data structure for the use case. We're not going to discuss in depth about Trie data structure or why it is the right fit here. There are plenty of HLD articles around for you to refer to. So, let's focus on the implementation here.

Here is a sample TrieNode class:

class TrieNode{
    Map<Character,TrieNode> children;
    boolean isEndOfWord;

    TrieNode(){
        children = new HashMap<>();
        isEndOfWord = false;
    }
}

Note that, we're using Map here for storing all characters at a particular level in the Trie. If the question is to return the results in alphabetic order, then we got no option but to use TreeMap. This will ensure the order but will cost you in insertion and search since TreeMap lookups/insertions would take log(n) time.

For this article, we assume that we can return results in any order to make things simple. Also, let's assume we have the cached response from the backend. We can take it as a source for words for the autocompletion task. Let this input be words and let's say we're searching for a word searchWord . So, the method definition boils down to the below:

public List<String> suggestedWords(String[] words,
 String searchWord) {
  for (String word: words) {
    insert(word); // insert to Trie
  }
  return search(searchWord); // search in Trie and return results
}

The method insert(String word) would insert words to the Trie. Below is the sample implementation for the same. Note that, we need to set the flag isEndOfWord to true for the last TrieNode, indicating the last character of the word.

private void insert(String word) {
  int len = word.length();
  TrieNode current = this.root; // member variable pointing to the root of TrieNode
  for (int i = 0; i < len; i++) {
    TrieNode node = current.children.get(word.charAt(i));
    if (node == null) {
      node = new TrieNode();
      current.children.put(word.charAt(i), node);
    }
    current = node;
  }
  current.isEndOfWord = true;
}

Trie can be prepopulated and can be set up in an application server instead. So, real-time queries just have to call search(searchWord) directly. After inserting a bunch of words, it would look like the below (eg: apple, ape, god, good) :

Now comes the major part, which is search auto-completion. To start with, we need to implement prefix search. We should be able to search for a prefix in the Trie. If there is a match found, then we need to search for complete words from that specific TrieNode. Real-time applications would have k number of results returned rather than all words. It makes sense to have k words returned for real-time performance. Here is the implementation for prefix search:

private TrieNode startsWith(String prefix) {
  int len = prefix.length();
  TrieNode current = this.root;

  for (int i = 0; i < len; i++) {
    TrieNode node = current.children.get(prefix.charAt(i));
    if (node == null) {
      return null;
    }
    current = node;
  }
  return current;
}

If there are no prefix found in the Trie, we return null, otherwise we return the TrieNode will be the starting point for our main search. Prefix search will only take O(n) time in the worst case. Let me re-write the search function below with all the defined methods so far:

private List<String> search(String searchWord) {
  List<String> result = new ArrayList <> ();
  TrieNode current = this.root;
  int len = searchWord.length();

  current = startsWith(searchWord);
  if (current != null) {
    List<String> list = new ArrayList <> ();
    StringBuilder sb = new StringBuilder(searchWord);
    // backtrack(list, current, sb, k); yet to implement. 
    result.addAll(list);
  }
  return result;
}

Note that, searchWord may or may not be a complete word. But it doesn't matter and we return all complete words for the searchWord . I have commented on the above code which is yet to be discussed/implemented. We can use backtracking to traverse all following TrieNodes from the prefix TrieNode, until we find a complete word. Refer to the below implementation:

private void backtrack(List<String> list,
 TrieNode current,
 StringBuilder sb,
 int k) {
    if (list.size() == k) {
      return;
    }
    if (current.isEndOfWord) {
      list.add(sb.toString());
    }
    for (Map.Entry<Character,TrieNode> entry: current.children.entrySet()) {
      sb.append(entry.getKey());
      backtrack(list, entry.getValue(), sb);
      sb.setLength(sb.length() - 1);
    }
}

We stop backtracking the Trie when at most k words are collected. Most importantly, we collect the word with the help of isEndOfWord flag. Use of StringBuilder is self-explanatory here. Once backtracking is done, the result will have all the complete words. Now, the same solution can be applied if we need to return a list of phrases instead of a list of single words. The word here is synonymous with a phrase, so it doesn't matter.

This is something you can quickly code up in an actual coding interview and sky is the limit for any improvements! Happy coding!