Implementing Trie (Prefix Tree): A Complete Guide with Java
The Problem: Fast String Lookups and Autocomplete
Imagine you're building a search engine or a text editor with autocomplete functionality. As the user types "pro", you want to instantly suggest words like "program", "project", and "professional". Or perhaps you're developing a spell checker that needs to quickly verify if a word exists in a dictionary with millions of entries.
How do you efficiently store and retrieve strings for these scenarios? A naive approach might use an array or a hash map, but these structures fall short when dealing with prefix-based operations. Arrays require linear search time, and hash maps, while offering O(1) lookups for exact matches, don't efficiently support prefix-based queries.
This is where the Trie data structure shines.
What is a Trie?
A Trie (pronounced "try" or "tree") is a specialized tree-based data structure designed for efficient string operations, particularly those involving prefixes. The name comes from "retrieval," highlighting its primary purpose.
Unlike binary trees where each node typically contains a single value, in a Trie:
- Each node represents a single character in a string
- The path from the root to a node forms a prefix
- Children of a node are indexed by the next character in the sequence
- Nodes may contain a flag indicating if they represent the end of a complete word
This structure allows Tries to excel at:
- Prefix matching: Find all words starting with a given prefix
- Exact string search: Check if a string exists in the collection
- Lexicographical ordering: Words are naturally stored in sorted order
- Space efficiency: Common prefixes are stored only once
Building a Trie in Java
Let's implement a complete Trie data structure in Java with the following operations:
- Insert a word
- Search for a word
- Delete a word
- Count words with a given prefix
- Get all words with a given prefix
Here's our implementation:
javaimport java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; public class Trie { private TrieNode root; public Trie() { root = new TrieNode(); } // Node class for our Trie private static class TrieNode { Map<Character, TrieNode> children; boolean isEndOfWord; int wordCount; // Count of words passing through this node public TrieNode() { children = new HashMap<>(); isEndOfWord = false; wordCount = 0; } } /** * Inserts a word into the trie. */ public void insert(String word) { if (word == null || word.isEmpty()) { return; } TrieNode current = root; for (char c : word.toCharArray()) { current.children.putIfAbsent(c, new TrieNode()); current = current.children.get(c); current.wordCount++; // Increment count for prefix statistics } current.isEndOfWord = true; } /** * Returns true if the word is in the trie. */ public boolean search(String word) { if (word == null || word.isEmpty()) { return false; } TrieNode node = findNode(word); return node != null && node.isEndOfWord; } /** * Returns true if there is any word in the trie that starts with the given prefix. */ public boolean startsWith(String prefix) { return findNode(prefix) != null; } /** * Returns the count of words with the given prefix. */ public int countWordsWithPrefix(String prefix) { TrieNode node = findNode(prefix); return node != null ? node.wordCount : 0; } /** * Helper method to find a node corresponding to the given string. */ private TrieNode findNode(String str) { TrieNode current = root; for (char c : str.toCharArray()) { if (!current.children.containsKey(c)) { return null; } current = current.children.get(c); } return current; } /** * Removes a word from the trie if present. * Returns true if the word was removed. */ public boolean delete(String word) { if (word == null || word.isEmpty()) { return false; } return delete(root, word, 0); } /** * Recursive helper method for delete operation. */ private boolean delete(TrieNode current, String word, int index) { if (index == word.length()) { // If word doesn't exist if (!current.isEndOfWord) { return false; } // Mark as non-end of word current.isEndOfWord = false; // Return true if this node has no other children return current.children.isEmpty(); } char c = word.charAt(index); TrieNode node = current.children.get(c); if (node == null) { return false; } // Decrement the word count as we're removing a word node.wordCount--; boolean shouldDeleteCurrentNode = delete(node, word, index + 1); // If true is returned from the recursive call and the node is not // the end of another word, we can delete it if (shouldDeleteCurrentNode && !node.isEndOfWord) { current.children.remove(c); return current.children.isEmpty(); } return false; } /** * Returns all words in the trie with the given prefix. */ public List<String> getWordsWithPrefix(String prefix) { List<String> result = new ArrayList<>(); TrieNode node = findNode(prefix); if (node != null) { collectWords(node, prefix, result); } return result; } /** * Helper method to collect all words from a given node. */ private void collectWords(TrieNode node, String prefix, List<String> result) { if (node.isEndOfWord) { result.add(prefix); } for (Map.Entry<Character, TrieNode> entry : node.children.entrySet()) { collectWords(entry.getValue(), prefix + entry.getKey(), result); } } }
How the Trie Works
Let's break down the key operations of our Trie implementation:
Insertion
When inserting a word like code, we:
- Start at the root node
- For each character ('c', 'o', 'd', 'e'):
- Create a new node if the character doesn't exist as a child
- Move to that child node
- Increment the word count for prefix statistics
- Mark the final node as the end of a word
This process gives us O(m) time complexity, where m is the length of the word.
Searching
To search for a word:
- Start at the root and follow the path defined by each character
- If at any point a character doesn't exist in the current node's children, return false
- If we reach the end of the word, check if the current node is marked as an end of word
Like insertion, search has O(m) time complexity.
Deletion
Deletion is more complex as we need to:
- Find the end node of the word
- Mark it as not an end of word
- Remove nodes that are no longer needed (no children and not marking the end of another word)
- Update word counts along the path
Prefix Operations
The real power of Tries comes from prefix operations:
- startsWith(prefix): Check if any word starts with the given prefix
- countWordsWithPrefix(prefix): Count words with the given prefix
- getWordsWithPrefix(prefix): Get all words with the given prefix
Practical Example: Autocomplete System
Let's see how our Trie can be used to build a simple autocomplete system:
javapublic class AutocompleteSystem { private Trie dictionary; public AutocompleteSystem() { dictionary = new Trie(); // Initialize with some words String[] words = { "apple", "application", "apply", "apartment", "book", "bookstore", "bookmark", "code", "coding", "coder", "computer" }; for (String word : words) { dictionary.insert(word); } } public List<String> getSuggestions(String prefix) { return dictionary.getWordsWithPrefix(prefix); } public static void main(String[] args) { AutocompleteSystem system = new AutocompleteSystem(); // Test with different prefixes System.out.println("Suggestions for 'a': " + system.getSuggestions("a")); System.out.println("Suggestions for 'ap': " + system.getSuggestions("ap")); System.out.println("Suggestions for 'co': " + system.getSuggestions("co")); // Count words with prefix System.out.println("Words starting with 'co': " + system.dictionary.countWordsWithPrefix("co")); // Delete a word and check again system.dictionary.delete("code"); System.out.println("After deleting 'code', suggestions for 'co': " + system.getSuggestions("co")); } }
Output:
plaintextSuggestions for 'a': [apple, application, apply, apartment] Suggestions for 'ap': [apple, application, apply, apartment] Suggestions for 'co': [code, coding, coder, computer] Words starting with 'co': 4 After deleting 'code', suggestions for 'co': [coding, coder, computer]
Performance Analysis
Let's analyze the performance of our Trie implementation:
Time Complexity
- Insert: O(m) where m is the length of the word
- Search: O(m) where m is the length of the word
- Delete: O(m) where m is the length of the word
- Prefix operations: O(m + k) where m is the length of the prefix and k is the number of words with that prefix
Space Complexity
- O(n × m) where n is the number of words and m is the average word length
- This can be reduced if many words share common prefixes
Comparison with Other Data Structures
| Operation | Trie | HashMap | ArrayList |
|---|---|---|---|
| Insert | O(m) | O(m) | O(n) |
| Search | O(m) | O(m) | O(n×m) |
| Delete | O(m) | O(m) | O(n) |
| Prefix Search | O(m+k) | O(n×m) | O(n×m) |
Where:
- m = length of the word
- n = number of words
- k = number of words with the given prefix
Real-World Applications
Tries are used in numerous applications:
- Autocomplete and Search Suggestions: As demonstrated in our example
- Spell Checkers: Quickly verify if a word exists in a dictionary
- IP Routing: Longest prefix matching in network routers
- Predictive Text: Mobile keyboards that suggest the next word
- Word Games: Efficiently check if a string forms a valid word
- Contact Lists: Fast lookup as users type a name
Optimizations and Variations
Several optimizations can make Tries even more efficient:
Compressed Tries (Radix Trees)
In a standard Trie, nodes with only one child can be compressed, reducing space requirements:
t -> r -> e -> e
Can be compressed to:
tree
Ternary Search Tries
Instead of using a HashMap for children, a ternary search trie uses three pointers per node:
- Left (for smaller characters)
- Equal (for the next character in the sequence)
- Right (for larger characters)
This can be more memory-efficient for certain character distributions.
Character Set Optimization
If you know your character set is limited (e.g., only lowercase English letters), you can use an array instead of a HashMap:
javaprivate static class TrieNode { TrieNode[] children = new TrieNode[26]; // For lowercase English letters boolean isEndOfWord; }
Conclusion
Tries are powerful data structures that excel at string operations, particularly those involving prefixes. Their efficiency in handling these operations makes them invaluable for applications like autocomplete systems, spell checkers, and search engines.
By understanding how Tries work and implementing them effectively, you can significantly improve the performance of string-intensive applications. The implementation we've covered provides a solid foundation that you can adapt and optimize for your specific use cases.
Whether you're building a text editor, a search engine, or any application that deals with string lookups, consider using a Trie to enhance your solution's efficiency and capabilities.
Further Reading
- Suffix Trees - An extension of Tries for substring searches
- Burst Tries - A hybrid approach combining Tries with other data structures
- Double-Array Tries - A space-efficient Trie implementation
Related Posts
Continue exploring similar topics
Merkle Trees: Implementation in Java and Real-World Applications
A comprehensive guide to Merkle Trees with Java implementation, practical applications in blockchain, distributed systems, and data integrity verification.
Spring Boot 3.x: What Actually Changed (and What Matters)
A practical look at Spring Boot 3.x features that change how you build services - virtual threads, reactive patterns, security gotchas, and performance lessons from production.
Java Multithreading Coding Questions for Interviews: Classic Problems and Solutions
Master the most common multithreading interview questions including Producer-Consumer, Dining Philosophers, and FizzBuzz problems with practical Java implementations.