How to Get the Word Frequency of a Given String Paragraph and Extract the 3 Most Frequent Words: Solutions

A Person Doing Computer Programming

When preparing for coding interviews, one of the most common questions is how to find the word frequency of a given string paragraph and extract the three highest frequency words. This type of problem tests your understanding of string manipulation, data structures, and algorithms across different programming languages. In this article, we’ll explore solutions in JavaScript, Node.js, Python, PHP, Java, and Go.

We’ll walk you through the approach, the thought process behind the code, and then provide the actual implementation in each of these popular programming languages.

Understanding the Problem

The task is to find the frequency of each word in a given string and extract the top three words with the highest frequency. For example:

Given the input string:

"The quick brown fox jumps over the lazy dog. The dog was not so lazy after all."

We want to:

  1. Find how many times each word appears in the paragraph.
  2. Extract the three words that appear the most frequently.

Key Considerations:

  • Case Insensitivity: “The” and “the” should be treated as the same word.
  • Punctuation Removal: Periods, commas, and other punctuation should be ignored.
  • Handling Ties: If multiple words have the same frequency, choose the words that appear first in the text.

Steps to Solve the Problem

  1. Normalize the Text: Convert all words to lowercase and remove any punctuation.
  2. Split the Paragraph into Words: Split the text into individual words using spaces.
  3. Count the Frequency of Each Word: Use a dictionary, map, or hash table to keep track of the word counts.
  4. Sort the Words by Frequency: Sort the word-frequency pairs by frequency in descending order.
  5. Extract the Top 3: Once sorted, retrieve the top three words.

Now, let’s look at how to implement this in various programming languages.


Solution in JavaScript

javascriptCopy codefunction getTopThreeWords(text) {
    // Step 1: Normalize the text
    const normalizedText = text.toLowerCase().replace(/[^\w\s]/g, '');
    
    // Step 2: Split the text into words
    const words = normalizedText.split(/\s+/);
    
    // Step 3: Count the frequency of each word
    const wordFrequency = {};
    words.forEach(word => {
        wordFrequency[word] = (wordFrequency[word] || 0) + 1;
    });
    
    // Step 4: Sort by frequency
    const sortedWords = Object.entries(wordFrequency).sort((a, b) => b[1] - a[1]);
    
    // Step 5: Extract the top three words
    return sortedWords.slice(0, 3).map(entry => entry[0]);
}

const paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all.";
console.log(getTopThreeWords(paragraph));

Explanation:

  • Step 1: Convert the text to lowercase and remove non-alphabetic characters.
  • Step 2: Split the string into individual words.
  • Step 3: Use an object to store word frequencies.
  • Step 4: Sort the words based on their frequency in descending order.
  • Step 5: Return the top three most frequent words.

Solution in Node.js

javascriptCopy codefunction getTopThreeWords(text) {
    const normalizedText = text.toLowerCase().replace(/[^\w\s]/g, '');
    const words = normalizedText.split(/\s+/);
    
    const wordFrequency = new Map();
    
    words.forEach(word => {
        wordFrequency.set(word, (wordFrequency.get(word) || 0) + 1);
    });
    
    const sortedWords = Array.from(wordFrequency.entries()).sort((a, b) => b[1] - a[1]);
    
    return sortedWords.slice(0, 3).map(entry => entry[0]);
}

const paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all.";
console.log(getTopThreeWords(paragraph));

Explanation:

  • In Node.js, we use the Map object to store word frequencies.
  • Everything else follows the same logic as the JavaScript solution.

Solution in Python

pythonCopy codeimport re
from collections import Counter

def get_top_three_words(text):
    # Step 1: Normalize the text
    normalized_text = re.sub(r'[^\w\s]', '', text.lower())
    
    # Step 2: Split the text into words
    words = normalized_text.split()
    
    # Step 3: Count the frequency of each word
    word_counts = Counter(words)
    
    # Step 4: Get the top 3 words
    top_three = word_counts.most_common(3)
    
    return [word for word, count in top_three]

paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all."
print(get_top_three_words(paragraph))

Explanation:

  • Step 1: The re.sub method removes punctuation and converts text to lowercase.
  • Step 2: We split the text into a list of words.
  • Step 3: Use Python’s Counter from the collections module to count word occurrences.
  • Step 4: The most_common method extracts the three most frequent words.

Solution in PHP

phpCopy codefunction getTopThreeWords($text) {
    // Step 1: Normalize the text
    $normalizedText = strtolower(preg_replace("/[^\w\s]/", '', $text));
    
    // Step 2: Split the text into words
    $words = explode(' ', $normalizedText);
    
    // Step 3: Count the frequency of each word
    $wordFrequency = array_count_values($words);
    
    // Step 4: Sort by frequency
    arsort($wordFrequency);
    
    // Step 5: Extract the top three words
    return array_slice(array_keys($wordFrequency), 0, 3);
}

$paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all.";
print_r(getTopThreeWords($paragraph));

Explanation:

  • In PHP, we use preg_replace to remove punctuation and array_count_values to count the frequency of each word.
  • We then use arsort to sort the words by their frequency in descending order.

Solution in Java

javaCopy codeimport java.util.*;

public class WordFrequency {
    public static List<String> getTopThreeWords(String text) {
        // Step 1: Normalize the text
        String normalizedText = text.toLowerCase().replaceAll("[^a-zA-Z\\s]", "");
        
        // Step 2: Split the text into words
        String[] words = normalizedText.split("\\s+");
        
        // Step 3: Count the frequency of each word
        Map<String, Integer> wordFrequency = new HashMap<>();
        for (String word : words) {
            wordFrequency.put(word, wordFrequency.getOrDefault(word, 0) + 1);
        }
        
        // Step 4: Sort by frequency
        List<Map.Entry<String, Integer>> sortedWords = new ArrayList<>(wordFrequency.entrySet());
        sortedWords.sort((a, b) -> b.getValue().compareTo(a.getValue()));
        
        // Step 5: Extract the top three words
        List<String> topThree = new ArrayList<>();
        for (int i = 0; i < Math.min(3, sortedWords.size()); i++) {
            topThree.add(sortedWords.get(i).getKey());
        }
        
        return topThree;
    }

    public static void main(String[] args) {
        String paragraph = "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all.";
        System.out.println(getTopThreeWords(paragraph));
    }
}

Explanation:

  • In Java, we use a HashMap to store word frequencies and then sort the entries by value using Comparator.

Solution in Go

goCopy codepackage main

import (
	"fmt"
	"regexp"
	"sort"
	"strings"
)

func getTopThreeWords(text string) []string {
	// Step 1: Normalize the text
	re := regexp.MustCompile(`[^\w\s]`)
	normalizedText := strings.ToLower(re.ReplaceAllString(text, ""))
	
	// Step 2: Split the text into words
	words := strings.Fields(normalizedText)
	
	// Step 3: Count the frequency of each word
	wordFrequency := make(map[string]int)
	for _, word := range words {
		wordFrequency[word]++
	}
	
	// Step 4: Sort by frequency
	type wordCount struct {
		word  string
		count int
	}
	var sortedWords []wordCount
	for word, count := range wordFrequency {
		sortedWords = append(sortedWords, wordCount{word, count})
	}
	sort.Slice(sortedWords, func(i, j int) bool {
		return sortedWords[i].count > sortedWords[j].count
	})
	
	// Step 5: Extract the top three words
	var topThree []string
	for i := 0; i < 3 && i < len(sortedWords); i++ {
		topThree = append(topThree, sortedWords[i].word)
	}
	
	return topThree
}

func main() {
	paragraph := "The quick brown fox jumps over the lazy dog. The dog was not so lazy after all."
	fmt.Println(getTopThreeWords(paragraph))
}

Explanation:

  • Go uses the regexp package to remove punctuation, and a map to count word occurrences.
  • The words are sorted by frequency using the sort.Slice function.

Conclusion

In this article, we tackled the problem of finding the most frequent words in a given string. We’ve provided solutions in JavaScript, Node.js, Python, PHP, Java, and Go. While the core logic remains similar—normalizing the text, counting word frequencies, and sorting by count—the syntax and methods differ across languages. Being able to solve this problem in multiple languages is an excellent skill for coding interviews and showcases your versatility as a programmer.


FAQs

  1. What data structure is best for storing word frequencies?
    • In most languages, a dictionary or hash map works well to store words and their frequencies.
  2. How do I handle punctuation in the input text?
    • You can use regular expressions to remove any non-alphanumeric characters.
  3. Can I solve this problem with a database query instead of code?
    • Yes, if the text is stored in a database, you could use SQL queries to count word occurrences.
  4. Is there a way to handle case sensitivity in the counting process?
    • Convert the entire string to lowercase before processing to ensure case insensitivity.
  5. Can I use this approach for languages other than English?
    • Yes, but be aware of different punctuation marks and word delimiters in other languages. Adjust the normalization step accordingly.

1 thought on “How to Get the Word Frequency of a Given String Paragraph and Extract the 3 Most Frequent Words: Solutions”

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top