PolarSPARC |
Introduction to Trie Data Structure - Part 1
Bhaskar S | 08/17/2019 |
👧 Hey Bob !!! What's up dude - you seem to be lost in deep thought ? All okay with you ???
👦 Hey Alice !!! Thanks for noticing and asking. Yeah, have a problem to solve and don't have an anwer yet. May be chatting with you will open up ideas.
👧 Happy to discuss and may be will learn something as well. What is the problem you trying to solve ???
👦 Here is the issue - want to store some words from the English dictionary.
👧 That is easy - why not use a Map data structure ???
👦 Here is the catch though. One could also check if a prefix of the word (from the begining) exists.
👧 Hmm. Interesting. How about we store all the prefixes of the word including the word in the Map data structure ???
👦 Hmm. That is an interesting idea. Let us check it out.
After a few mins on the white-board ...🕐...🕑...
👦 Dank. Does not work. Here is the case. Assume we added the word hunter along with its prefixes h, hu, hun, hunt, and hunte. How does one distinguish between the prefix hunt and the word hunt ??? See the dilemma ???
👧 I see. Damn - thought we figured. This is indeed an interesting problem.
👲 Hey guys !!! Looks like you are having some interesting discussion. Would you mind if me joined ???
👧 Hey Charlie !!! Of course not. Join the party.
Alice and Bob describe the problem to Charlie...
👲 What a timimg !!! Recently read about an interesting data structure called Trie (pronounced as *Try*). It is also known as a Prefix Tree or a Radix Tree.
👦 👧 Interesting - tell us more about it.
👲 Well, it a Tree like data structure and is used for reTrieval (or search) of prefixes or words (or keys) in an efficient manner. It is a type of Search Tree.
👲 Given any word, each letter from the word is represented as a Node in the Trie. The starting point of the Trie is the Root Node. For example, if we are given the words ant and art, the following illustration depicts the Nodes in the example Trie:
👲 For our discussion, we will only assume words from the English dictionary. This means each Node in the Trie can have up to 26 children (26 letters from the English alphabet).
👲 Each Node in the Trie will contain the following members:
item :: This will contain the letter from the word this Node represents
children :: A dictionary of letters. Can have up to 26 children
ancestor :: A reference to the parent Node
leaf :: Represents the leaf Node (last letter of a word)
👲 The following is the implementation of a Node of a Trie using Python:
👲 Next, let me explain how to insert a word into a Trie data structure using an example. Assume the word is ant. The first letter in the word is a. Starting at the Root Node as the current Node, lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, create a new Node, set the value of the item to be the letter (a). Set the ancestor to be the current Node and add the newly created node as the reference to the letter in the children of the current Node. Set the current Node to reference the newly created letter Node.
👲 The second letter in the word is n. Lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, create a new Node, set the value of the item to be the letter (n). Set the ancestor to be the current Node and add the newly created node as the reference to the letter in the children of the current Node. Set the current Node to reference the newly created letter Node.
👲 The last letter in the word is t. Lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, create a new Node, set the value of the item to be the letter (t). Set the ancestor to be the current Node and add the newly created node as the reference to the letter in the children of the current Node. Set the leaf to be True since this is the last letter of the word. And we are done.
👲 The following is the code implementation to insert a word into a Trie data structure using Python:
👲 Assume we inserted the follwoing 10 words into the Trie data structure:
ant
art
ball
bend
car
cart
hunt
hunter
hunted
hung
👲 The following illustration depicts the Nodes in the example Trie:
👲 Next, let me explain how to look-up a prefix of a word in a Trie data structure. Assume the prefix is ben. The first letter in the prefix is b. Starting at the Root Node as the current Node, lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, the prefix is not found and we are done.
👲 The second letter in the prefix is e. Lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, the prefix is not found and we are done.
👲 The last letter in the prefix is n. Lookup the letter in the children of the current Node. If found, the prefix is found successfully and we are done. Else, the prefix is not found and we are done.
👲 The following is the code implementation to search a prefix from a Trie data structure using Python:
👧 👦 WoW !!! This is awesome.
👲 Next, let me explain how to look-up a word in a Trie data structure. Assume the word is art. We use the same logic as looking up the prefix art. Except, we add one more check to see if the Node corresponding to the last letter t has the leaf flag set as True. That is it.
👲 The following is the code implementation to search a word from a Trie data structure using Python:
👧 👦 This is really COOL that we can leverage the is_prefix method to find a word.
👲 We are not done yet. Next, let me explain how to remove a word from a Trie data structure. Assume the word is ball. For this, we use the same logic as looking up the word ball. If the word exists, we also get a reference to the last letter Node (l). If there are no children in the last letter Node, we remove the reference to this last Node from its ancestor Node. We then move to the ancestor Node and perform the same step of removing the Node (only if children is empty). We continue this process till we hit the Root Node.
👲 The following illustration depicts the state of the Trie after the word ball is removed:
👲 The following is the code implementation to remove a word from a Trie data structure using Python:
👲 The following is the source code for a basic Trie data structure using Python:
👲 Executing the above Python code will generate the following output:
insert_word: word => ant inserted insert_word: word => art inserted insert_word: word => ball inserted insert_word: word => bend inserted insert_word: word => car inserted insert_word: word => cart inserted insert_word: word => hunt inserted insert_word: word => hunter inserted insert_word: word => hunted inserted insert_word: word => hung inserted Is prefix ben present => True Is prefix bet present => False Is prefix hunt present => True Is word ball present => True Is word car present => True Is word care present => False Is word cart present => True Is word hunter present => True remove_word: word => ball deleted remove_word: word => hunter deleted Is prefix ba present => False Is prefix ben present => True Is word art present => True Is word ball present => False Is word hunter present => False All words from the trie => ['ant', 'art', 'bend', 'car', 'cart', 'hunt', 'hunted', 'hung'] -------------------------------------------------- print_details: item = None, children = ['a', 'b', 'c', 'h'], leaf = False print_details: item = a, children = ['n', 'r'], leaf = False print_details: item = n, children = ['t'], leaf = False print_details: item = t, children = [], leaf = True print_details: item = r, children = ['t'], leaf = False print_details: item = t, children = [], leaf = True print_details: item = b, children = ['e'], leaf = False print_details: item = e, children = ['n'], leaf = False print_details: item = n, children = ['d'], leaf = False print_details: item = d, children = [], leaf = True print_details: item = c, children = ['a'], leaf = False print_details: item = a, children = ['r'], leaf = False print_details: item = r, children = ['t'], leaf = True print_details: item = t, children = [], leaf = True print_details: item = h, children = ['u'], leaf = False print_details: item = u, children = ['n'], leaf = False print_details: item = n, children = ['t', 'g'], leaf = False print_details: item = t, children = ['e'], leaf = True print_details: item = e, children = ['d'], leaf = False print_details: item = d, children = [], leaf = True print_details: item = g, children = [], leaf = True --------------------------------------------------
👲 Hope was able to add value to this discussion and share what learnt recently.
👧 👦 This was really super useful and we learnt something new today. Thanks for sharing !!!
👲 You welcome. Well, the algorithm for the basic Trie data structure we just covered isn't space efficient. We should discuss that option as well. Well, am getting late for my next meeting, so let me go ...