Exploring Burstiness: Evaluating Language Dynamics in LLM-Generated Texts

Soumya Mukherjee
6 min readJun 22, 2023

--

Clustered For Coherence // generated using Nightcafe

In the realm of natural language processing and generative AI models, Perplexity has long been the go-to metric for assessing model success.

However, as we delve deeper into the intricacies of language generation, we find that Perplexity alone may fall short when it comes to capturing the authentic quality and effectiveness of these models.

Enter Burstiness, a complementary metric that overcomes the limitations of Perplexity and offers a more comprehensive evaluation.

In this post, we will explore Burstiness, its calculation logic, use cases, limits, and ways to address them.

Understanding the Limitations of Perplexity

As discussed in the previous post, Perplexity, a widely used metric, measures the average uncertainty of predicting the next word in a sequence.

While it provides a valuable assessment of model performance, it fails to capture an essential aspect of natural language — its dynamic nature.

Perplexity treats each word prediction as equally important, disregarding the bursty nature of language, where certain words or phrases occur more frequently in specific contexts.

Introducing Burstiness

Burstiness accounts for words' distribution and occurrence patterns in a generated text.

While Perplexity measures how well an AI model forecasts the next word, Burstiness goes beyond by capturing the intricate dance of words, revealing their hidden patterns and clustering.

It breathes life into the model success evaluation process, infusing it with a dynamic element, taking the evaluation beyond static measurements.

Calculating Burstiness

Burstiness can be calculated using the following formula:

B = (λ — k) / (λ + k)

Where:

B = Burstiness

λ = Mean inter-arrival time between bursts

k = Mean burst length

Let’s take an example: “The sun is shining brightly in the clear blue sky, and birds are chirping happily looking at the sun.”

Burst 1 (for the word “sun”): The word “sun” appears twice in the sentence.

  • Burst Length: 2 (the word “sun” appears twice)
  • Inter-arrival Time: 16 (number of words between the first and second appearance of “sun”)

Burst 2 (for the word “the”): The word “the” appears three times in the sentence.

  • Burst Length: 3 (the word “the” appears three times)
  • Inter-arrival Time: 5 (number of words between the first and second appearance of “the”) and 10 ( between the second and the third appearance)

To calculate the Burstiness, as per the formula:

Burstiness (B) = (λ — k) / (λ + k)

For Burst 1 (sun):

  • Mean inter-arrival time between bursts (λ) = 16 (the average inter-arrival time between the bursts)
  • Mean burst length (k) = 2 (number of times “sun” appears)

Plugging the values into the formula: Burstiness for Burst 1 = (16–2) / (16 + 2) = 14 / 18 ≈ 0.778

For Burst 2 (the):

  • Mean inter-arrival time between bursts (λ) = 3 (the average inter-arrival time between the bursts)
  • Mean burst length (k) = 7.5

Plugging the values into the formula: Burstiness for Burst 2 = (3–7.5) / (3 + 7.5) = -4.5 / 10.5 ≈ -0.429

A Burstiness value of 0.778 for Burst 1 indicates a relatively concentrated distribution of the specific term in the generated text. It suggests that the word “sun” appears in bursts, with shorter inter-arrival times between its occurrences. Upon reading the sentence, this makes sense. If the model is prompted to write poems, in that context, this repetition of the word “sun” might be acceptable, whereas if it is prompted to write, say, morning news reports, a pronoun would be better (“The sun is shining brightly in the clear blue sky, and birds are chirping happily looking at it.”).

On the other hand, a negative Burstiness value of -0.429 for Burst 2 indicates a dispersed distribution, suggesting that the occurrences of the bursts are more spread out rather than concentrated.

By analyzing the Burstiness values across different bursts in a generated text, we can gain insights into the patterns and concentrations of specific terms, providing a deeper understanding of the language generation performance.

Let’s consider two scenarios to illustrate examples of good and bad Burstiness:

Example 1: Good Burstiness

Suppose we have a language generation model that aims to create engaging and diverse product descriptions for an e-commerce platform. In this case, a good Burstiness value would indicate that the model generates bursts or clusters of specific terms related to product features, benefits, and uniqueness to create more impact through the description. For instance:

Generated Sentence: “Introducing our new smartphone — with its stunning camera, a lightning-fast processor, and all-day battery life, it’s a game-changer in the world of mobile technology.”

In this example, the Burstiness value would be considered good if it shows a higher concentration of terms like “stunning camera,” “lightning-fast processor,” and “all-day battery life” when they are clustered together. This Burstiness pattern reflects the desired outcome of creating compelling product descriptions that highlight key features in a captivating manner.

Example 2: Bad Burstiness

Let’s consider a scenario where we have a language generation model responsible for generating news headlines for a news aggregator application. In this case, a bad Burstiness value would indicate a dispersed distribution of terms, lacking coherent bursts of relevant keywords. For instance:

Generated Sentence: “Sun is shining while birds chirping. Stocks rise. New movie released. Coffee shop opens.”

In this example, if the Burstiness value is low, it suggests that there is no significant clustering of terms related to news topics. The news headlines don’t read as meaningful as news elements, as much as they should.

The lack of bursty patterns could make the generated headlines seem disjointed and fail to convey the desired news-specific focus.

In this case, higher Burstiness indicating more cohesive bursts of terms related to news events would be considered preferable.

Burstiness in Action

Here are a few use cases where burstiness can truly help language models to shine through:

  1. Creating engaging content: Burstiness helps assess the coherence and diversity of generated text. Models with higher Burstiness can produce more engaging and varied content, making them valuable assets for content creators, marketers, and social media managers.
  2. Conversations with Chatbots: Burstiness aids in evaluating the responsiveness and conversational flow of AI-powered chatbots. Higher Burstiness indicates the chatbot’s ability to adapt to context, delivering more human-like and contextually appropriate responses.
  3. Praiseworthy Storytelling: Burstiness provides insights into the storytelling capabilities of generative models. Models with optimal Burstiness can create captivating narratives with unexpected twists, enhancing the immersive experience for readers and players of interactive storytelling games.

Limitations of Burstiness and Overcoming Them

  1. Burstiness Complexity: Calculating Burstiness may involve handling large datasets and intricate data analysis. Using specialized libraries, parallel processing, or cloud-based resources can help streamline the computation process.
  2. Subjectivity in Burst Detection: Identifying bursts in the generated text can be subjective. Applying automated burst detection algorithms or leveraging human evaluators can ensure more consistent and reliable results.
  3. Contextual Understanding: Burstiness alone may not capture the complete context and semantic meaning of the generated text. Integrating Burstiness with other metrics (like fluency, relevance, diversity, semantic meaning etc.) and incorporating domain-specific knowledge can provide a more holistic evaluation of model performance.
Enriching Patterns // generated using Nightcafe

As we strive to create increasingly sophisticated language models, Burstiness emerges as a valuable complementary metric to evaluate the success of LLMs and generative AI models.

By considering the bursty nature of language and assessing the distribution and occurrence patterns of words, Burstiness enriches our understanding of model performance.

With wide-ranging use cases, Burstiness provides a more comprehensive assessment of generative AI models. By acknowledging its limitations and leveraging advanced techniques, Burstiness can be a powerful tool in harnessing the true potential of AI-powered language generation.

--

--