When preparing for coding interviews, one of the most common questions is how to find the word frequency of a given string paragraph and extract the three highest frequency words. This type of problem tests your understanding of string manipulation, data structures, and algorithms across different programming languages. In this article, we’ll explore solutions in JavaScript, Node.js, Python, PHP, Java, and Go.
Table of Contents
We’ll walk you through the approach, the thought process behind the code, and then provide the actual implementation in each of these popular programming languages.
Understanding the Problem
The task is to find the frequency of each word in a given string and extract the top three words with the highest frequency. For example:
Given the input string:
"The quick brown fox jumps over the lazy dog. The dog was not so lazy after all."
We want to:
- Find how many times each word appears in the paragraph.
- Extract the three words that appear the most frequently.
Key Considerations:
- Case Insensitivity: “The” and “the” should be treated as the same word.
- Punctuation Removal: Periods, commas, and other punctuation should be ignored.
- Handling Ties: If multiple words have the same frequency, choose the words that appear first in the text.
Steps to Solve the Problem
- Normalize the Text: Convert all words to lowercase and remove any punctuation.
- Split the Paragraph into Words: Split the text into individual words using spaces.
- Count the Frequency of Each Word: Use a dictionary, map, or hash table to keep track of the word counts.
- Sort the Words by Frequency: Sort the word-frequency pairs by frequency in descending order.
- Extract the Top 3: Once sorted, retrieve the top three words.
Now, let’s look at how to implement this in various programming languages.
Solution in JavaScript
javascriptCopy codefunction getTopThreeWords(text) {
// Step 1: Normalize the text
const normalizedText = text.toLowerCase().replace(/[^\w\s]/g, '');
// Step 2: Split the text into words
const words = normalizedText.split(/\s+/);
// Step 3: Count the frequency of each word
const wordFrequency = {};
words.forEach(word => {
wordFrequency[word] = (wordFrequency[word] || 0) + 1;
});
// Step 4: Sort by frequency
const sortedWords = Object.entries(wordFrequency).sort((a, b) => b[1] - a[1]);
// Step 5: Extract the top three words
return sortedWords.slice(0, 3).map(entry => entry[0]);
}
const paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all.";
console.log(getTopThreeWords(paragraph));
Explanation:
- Step 1: Convert the text to lowercase and remove non-alphabetic characters.
- Step 2: Split the string into individual words.
- Step 3: Use an object to store word frequencies.
- Step 4: Sort the words based on their frequency in descending order.
- Step 5: Return the top three most frequent words.
Solution in Node.js
javascriptCopy codefunction getTopThreeWords(text) {
const normalizedText = text.toLowerCase().replace(/[^\w\s]/g, '');
const words = normalizedText.split(/\s+/);
const wordFrequency = new Map();
words.forEach(word => {
wordFrequency.set(word, (wordFrequency.get(word) || 0) + 1);
});
const sortedWords = Array.from(wordFrequency.entries()).sort((a, b) => b[1] - a[1]);
return sortedWords.slice(0, 3).map(entry => entry[0]);
}
const paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all.";
console.log(getTopThreeWords(paragraph));
Explanation:
- In Node.js, we use the Map object to store word frequencies.
- Everything else follows the same logic as the JavaScript solution.
Solution in Python
pythonCopy codeimport re
from collections import Counter
def get_top_three_words(text):
# Step 1: Normalize the text
normalized_text = re.sub(r'[^\w\s]', '', text.lower())
# Step 2: Split the text into words
words = normalized_text.split()
# Step 3: Count the frequency of each word
word_counts = Counter(words)
# Step 4: Get the top 3 words
top_three = word_counts.most_common(3)
return [word for word, count in top_three]
paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all."
print(get_top_three_words(paragraph))
Explanation:
- Step 1: The re.sub method removes punctuation and converts text to lowercase.
- Step 2: We split the text into a list of words.
- Step 3: Use Python’s Counter from the collections module to count word occurrences.
- Step 4: The most_common method extracts the three most frequent words.
Solution in PHP
phpCopy codefunction getTopThreeWords($text) {
// Step 1: Normalize the text
$normalizedText = strtolower(preg_replace("/[^\w\s]/", '', $text));
// Step 2: Split the text into words
$words = explode(' ', $normalizedText);
// Step 3: Count the frequency of each word
$wordFrequency = array_count_values($words);
// Step 4: Sort by frequency
arsort($wordFrequency);
// Step 5: Extract the top three words
return array_slice(array_keys($wordFrequency), 0, 3);
}
$paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all.";
print_r(getTopThreeWords($paragraph));
Explanation:
- In PHP, we use preg_replace to remove punctuation and array_count_values to count the frequency of each word.
- We then use arsort to sort the words by their frequency in descending order.
Solution in Java
javaCopy codeimport java.util.*;
public class WordFrequency {
public static List<String> getTopThreeWords(String text) {
// Step 1: Normalize the text
String normalizedText = text.toLowerCase().replaceAll("[^a-zA-Z\\s]", "");
// Step 2: Split the text into words
String[] words = normalizedText.split("\\s+");
// Step 3: Count the frequency of each word
Map<String, Integer> wordFrequency = new HashMap<>();
for (String word : words) {
wordFrequency.put(word, wordFrequency.getOrDefault(word, 0) + 1);
}
// Step 4: Sort by frequency
List<Map.Entry<String, Integer>> sortedWords = new ArrayList<>(wordFrequency.entrySet());
sortedWords.sort((a, b) -> b.getValue().compareTo(a.getValue()));
// Step 5: Extract the top three words
List<String> topThree = new ArrayList<>();
for (int i = 0; i < Math.min(3, sortedWords.size()); i++) {
topThree.add(sortedWords.get(i).getKey());
}
return topThree;
}
public static void main(String[] args) {
String paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all.";
System.out.println(getTopThreeWords(paragraph));
}
}
Explanation:
- In Java, we use a HashMap to store word frequencies and then sort the entries by value using Comparator.
Solution in Go
goCopy codepackage main
import (
"fmt"
"regexp"
"sort"
"strings"
)
func getTopThreeWords(text string) []string {
// Step 1: Normalize the text
re := regexp.MustCompile(`[^\w\s]`)
normalizedText := strings.ToLower(re.ReplaceAllString(text, ""))
// Step 2: Split the text into words
words := strings.Fields(normalizedText)
// Step 3: Count the frequency of each word
wordFrequency := make(map[string]int)
for _, word := range words {
wordFrequency[word]++
}
// Step 4: Sort by frequency
type wordCount struct {
word string
count int
}
var sortedWords []wordCount
for word, count := range wordFrequency {
sortedWords = append(sortedWords, wordCount{word, count})
}
sort.Slice(sortedWords, func(i, j int) bool {
return sortedWords[i].count > sortedWords[j].count
})
// Step 5: Extract the top three words
var topThree []string
for i := 0; i < 3 && i < len(sortedWords); i++ {
topThree = append(topThree, sortedWords[i].word)
}
return topThree
}
func main() {
paragraph := "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all."
fmt.Println(getTopThreeWords(paragraph))
}
Explanation:
- Go uses the regexp package to remove punctuation, and a map to count word occurrences.
- The words are sorted by frequency using the sort.Slice function.
Conclusion
In this article, we tackled the problem of finding the most frequent words in a given string. We’ve provided solutions in JavaScript, Node.js, Python, PHP, Java, and Go. While the core logic remains similar—normalizing the text, counting word frequencies, and sorting by count—the syntax and methods differ across languages. Being able to solve this problem in multiple languages is an excellent skill for coding interviews and showcases your versatility as a programmer.
FAQs
- What data structure is best for storing word frequencies?
- In most languages, a dictionary or hash map works well to store words and their frequencies.
- How do I handle punctuation in the input text?
- You can use regular expressions to remove any non-alphanumeric characters.
- Can I solve this problem with a database query instead of code?
- Yes, if the text is stored in a database, you could use SQL queries to count word occurrences.
- Is there a way to handle case sensitivity in the counting process?
- Convert the entire string to lowercase before processing to ensure case insensitivity.
- Can I use this approach for languages other than English?
- Yes, but be aware of different punctuation marks and word delimiters in other languages. Adjust the normalization step accordingly.
Hi, this is a comment.
To get started with moderating, editing, and deleting comments, please visit the Comments screen in the dashboard.
Commenter avatars come from Gravatar.