PolarSPARC |
Introduction to Trie Data Structure - Part 2
Bhaskar S | 08/31/2019 |
👧 👦 Hey Charlie !!! We are really excited and can't wait to hear about the optimizations you alluded to during our previous ( Part-1 ) discussion. Would you have some time now to elaborate on it ???
👲 Hey guys !!! Sure, have some time now to discuss the optimized version of the Trie data structure. It is also known as a Compact Trie or a Patricia Trie, where PATRICIA stands for Practical Algorithm To Retrieve Information Coded In Alphanumeric.
👲 We will leverage the Node implementation as discussed previously.
👲 The following is the implementation of a Node of a Trie using Python for convenience:
👲 Let me dive right into explaining how one can insert a word into a Compact Trie data structure using an example. Assume the word is ant. Starting at the Root Node as the current Node and starting with the letter at index 0 (letter a), lookup the letter in the children of the current Node. It will not be found, so create a new Node and set the value of the item to be the entire word. Also, set the ancestor to be the current Node and add the newly created node as the reference to the letter in the children of the current Node. Finally, set the leaf to be True.
👲 The following illustration depicts the Compact Trie data structure:
👲 Assume the next word is art. Starting at the Root Node as the current Node and starting with the letter at index 0 (letter a), lookup the letter in the children of the current Node. This time around it will be found and will point to the Node whose item value will be the word ant. Assign this as the current Node. Given that we found a Node, we have to find the largest common prefix between the given word art and the existing word ant. The common prefix between the two words is a. Since the length of the item value in the current Node is greater than the length of the common prefix, we create a new Node with the item value set to the suffix nt (from the existing word) and add it as child for the letter l under the current Node. Next, create another Node with the item value set to the suffix rt (for the given word) and add it as child for the letter r under the current Node. Finally, we change the item value of the current Node to the common prefix, which is a. The following illustration depicts the Compact Trie data structure with the two words:
👲 Does this make sense ???
👦 👧 Hmm !!! Not sure ... may be another example would help ???
👲 Ok, sure ... Let us consider the words hunter, hunt, hunted, and hung.
👲 We start with the word is hunter. Starting at the Root Node as the current Node and starting with the letter at index 0 (letter h), we lookup the letter in the children of the current Node. It will not be found, so we create a new Node and set the value of the item to be the entire word. Also, set the ancestor to be the current Node and add the newly created node as the child for the letter h under the current Node. Finally, set the leaf to be True. The following illustration depicts the Compact Trie data structure:
👲 The next word is hunt. Starting at the Root Node as the current Node and starting with the letter at index 0 (letter h), we lookup the letter in the children of the current Node. This time around it will be found and will point to the Node whose item value will be the word hunter. Assign this as the current Node. Given that we found a Node, we have to find the largest common prefix between the given word hunt and the existing word hunter. The common prefix between the two words is hunt. Since the length of the item value in the current Node is greater than the length of the common prefix, we create a new Node with the item value set to the suffix er (from the existing word) and add it as child for the letter e under the current Node. Finally, we change the item value of the current Node to the common prefix, which is hunt. The following illustration depicts the Compact Trie data structure:
👲 So far we good ???
👧 👦 YES !!!
👲 The next word is hunted. Starting at the Root Node as the current Node and starting with the letter at index 0 (letter h), we lookup the letter in the children of the current Node. Again, this time it will be found and will point to the Node whose item value will be the word hunt. Assign this as the current Node. Given that we found a Node, we have to find the largest common prefix between the given word hunted and the existing word hunt. The common prefix between the two words is hunt. Since the length of the item value in the current Node is equal to the length of the common prefix and the length of the new word is greater, we move the index forward from index 0 plus the length of the prefix to the new index 4. Starting at the current Node and starting with the letter at index 4 (letter e), we lookup the letter in the children of the current Node. Once again, it will be found and will point to the Node whose item value will be the suffix er. Assign this as the current Node. Given that we found a Node, we have to find the largest common prefix between the suffix word ed and the existing suffix word er. The common prefix between the two suffix words is e. Since the length of the item value in the current Node is greater than the length of the common prefix, we create a new Node with the item value set to the suffix r (from the existing suffix word er) and add it as child for the letter r under the current Node. Next, create another Node with the item value set to the suffix d (for the given suffix word ed) and add it as child for the letter d under the current Node. Finally, we change the item value of the current Node to the common prefix, which is e. The following illustration depicts the Compact Trie data structure with the three words:
👲 Now for the final word hung. Starting at the Root Node as the current Node and starting with the letter at index 0 (letter h), we lookup the letter in the children of the current Node. It will be found and will point to the Node whose item value will be the word hunt. Assign this as the current Node. Given that we found a Node, we have to find the largest common prefix between the given word hung and the existing word hunt. The common prefix between the two words is hun. Since the length of the item value in the current Node is greater than the length of the common prefix, we create a new Node with the item value set to the suffix t (from the existing word hunt) and add it as child for the letter t under the current Node. Next, create another Node with the item value set to the suffix g (for the given word hung) and add it as child for the letter g under the current Node. Finally, we change the item value of the current Node to the common prefix, which is hun. The following illustration depicts the Compact Trie data structure with the four words:
👧 👦 Ahh !!! Got it. This makes a lot of sense now, WOW !!! This is GREAT !!!
👲 Excellent !!! Glad the example made it clear. The key point to remember is that when add a new word, we need to check for the largest common prefix between the new word and the existing word(s) in the Compact Trie data structure.
👲 The following is the code implementation to insert a word into a Compact Trie data structure using Python:
👲 Assume we inserted the follwoing 10 words into the Trie data structure:
ant
art
ball
bend
car
cart
hunt
hunter
hunted
hung
👲 The following illustration depicts the Nodes in the example Compact Trie:
👲 See how compact this Trie data structure is compared to the basic Trie !!!
👦 👧 Oh Yeah !!! Looks very space efficient as fewer Nodes are created.
👲 Next, let me explain how to look-up a prefix of a word in a Compact Trie data structure. Assume the prefix is ben. The first letter in the prefix is b. Starting at the Root Node as the current Node and starting with the letter at index 0 (letter b), lookup the letter in the children of the current Node. It will be found and will point to the child Node whose item value will be the prefix word b. Since the length of the item value equals 1, move the index forward by 1 and assign the child as the current Node. Starting at the current Node and starting with the letter at index 1 (letter e), lookup the letter in the children of the current Node. Again, it will be found and will point to the child Node whose item value will be the suffix word end. Since the length of the item value is greater than 1, we compare one letter after the other between the item value and the given prefix starting at the current index. On each successful compare, we move forward the index by 1. If the entire length of the prefix is covered successfully, we have found the given prefix. Else, the prefix is not found and we are done.
👲 The following is the code implementation to search a prefix from a Compact Trie data structure using Python:
👧 👦 WoW !!! This is awesome.
👲 Next, let me explain how to look-up a word in a Compact Trie data structure. Assume the word is art. We use the same logic as looking up the prefix art. Except, we add one more check to see if the Node returned by the prefix check has the leaf flag set as True. That is it.
👲 The following is the code implementation to search a word from a Compact Trie data structure using Python:
👲 Finally, let me explain how to remove a word from a Compact Trie data structure. Assume the word is ball. For this, we use the same logic as looking up the word ball. If the word exists, we get a reference to the suffix Node (all). If there are no children in the returned Node, we remove the reference to this Node from its ancestor. The following illustration depicts the state of the Compact Trie data structure after the suffix all is removed:
👲 We are not done yet. If the ancestor of the returned Node has just one remaining child (which is the case with the suffix end), then we need to compress and merge the child Node with the ancestor as shown in the illustration below:
👲 Once the Nodes are compressed and merged, the following illustration depicts the state of the Compact Trie data structure:
👲 The following is the code implementation to remove a word from a Trie data structure using Python:
👲 The following is the full source code for a compact Trie data structure using Python:
👲 Executing the above Python code will generate the following output:
insert_word: word => ant inserted insert_word: word => art inserted insert_word: word => ball inserted insert_word: word => bend inserted insert_word: word => cart inserted insert_word: word => car inserted insert_word: word => hunt inserted insert_word: word => hunter inserted insert_word: word => hunted inserted insert_word: word => hung inserted Is prefix ben present => True Is prefix bet present => False Is prefix hunt present => True Is word ball present => True Is word car present => True Is word care present => False Is word cart present => True Is word hunter present => True remove_word: word => ball deleted remove_word: word => hunter deleted Is prefix ba present => False Is prefix ben present => True Is word art present => True Is word ball present => False Is word hunter present => False All words from the trie => ['ant', 'art', 'bend', 'car', 'cart', 'hunt', 'hunted', 'hung'] -------------------------------------------------- print_details: item => None, children => ['a', 'b', 'c', 'h'], leaf => False print_details: item => a, children => ['n', 'r'], leaf => False print_details: item => nt, children => [], leaf => True print_details: item => rt, children => [], leaf => True print_details: item => bend, children => [], leaf => True print_details: item => car, children => ['t'], leaf => True print_details: item => t, children => [], leaf => True print_details: item => hun, children => ['t', 'g'], leaf => False print_details: item => t, children => ['e'], leaf => True print_details: item => ed, children => [], leaf => True print_details: item => g, children => [], leaf => True --------------------------------------------------
👧 👦 This is just AWESOME !!! Thanks for explaining the Compact Trie data structure Charlie.
👲 You welcome. Oh crap !!! Am late for my next meeting, have to run !!!
References