BiasBeware · Subtask C

Sanitization of Attacked Product Descriptions

Can a system remove manipulation without removing meaning? Subtask C asks participants to rewrite product descriptions that have been attacked with cognitive-bias cues so that the manipulative signal is removed while the factual product content stays intact.

Illustration for Subtask C sanitization
The sanitization task focuses on restoring neutrality while preserving product facts.
Goal

Remove manipulation, preserve meaning

In Subtask C, participants are given product descriptions that have been subtly attacked with cognitive-bias cues. The goal is to rewrite these descriptions so that the manipulative signal is removed while the factual product content stays intact.

What a successful system should do

  • Reduce the manipulative signal in the attacked description
  • Preserve factual product attributes and identity
  • Remain faithful to the original product content
  • Avoid excessive rewriting and unnecessary drift

Why this matters

Debiasing product descriptions is not only about removing persuasive phrases. A good system must keep what the product is while removing how it is being nudged, making Subtask C a benchmark for controlled debiasing and faithful rewriting.

Evaluation

Restoration toward the clean reference

We evaluate sanitization both at the distribution level and at the instance level, measuring whether rewritten descriptions return toward the clean reference corpus and remain close to their original clean counterparts. Ultimately, how much can we trust that debiasing has been achieved?

Primary metric: KL divergence

Does the rewritten (debiased) text move back toward the original neutral description distribution?

KL(initial || sanitized)
KL
Lower values mean the sanitized set is lexically closer to the clean reference.

Interpretation: \(D_{\mathrm{KL}} = 0\) means the two distributions are identical.

KL is the primary metric because Subtask C is fundamentally a restoration task: the goal is not only to produce plausible neutral text, but to move the rewritten corpus back toward the original unbiased distribution.

Additional: pairwise normalized Levenshtein similarity

Does the rewritten text remain close to the original description at the individual example level?

Avg. normalized Levenshtein similarity
LevSim
Higher values mean fewer edits are needed to recover the clean description.
Range
0–1
1.0 means identical strings; values closer to 0 indicate more extensive rewriting.

Levenshtein similarity complements KL because the two metrics capture different aspects of restoration: KL evaluates whether the set of rewritten descriptions returns toward the clean distribution, while Levenshtein evaluates whether each individual description remains close to its clean target.

Why both metrics matter

A system could recover a good corpus-level distribution while still rewriting many examples too aggressively. Conversely, it could preserve local wording while failing to restore the broader neutral distribution. Reporting both metrics gives a more complete picture of sanitization quality.

Pilot baseline

Initial zero-shot debiasing result

As an initial baseline, the sample data was debiased using Gemini 3 (zero-shot).

Prompt

Baseline prompt
Debias the following product descriptions

Results

KL(initial || Gemini debias)
0.197
The debiased descriptions move reasonably close to the original clean distribution.
LevSim
0.304
The rewritten outputs remain only moderately close in surface form to the clean references.

How to read these numbers

These values are encouraging, but they also reveal the difficulty of the task. KL = 0.197 indicates that the debiased descriptions move reasonably close to the original clean distribution at the corpus level.

Average pairwise normalized Levenshtein similarity = 0.304 indicates that, at the level of paired descriptions, the rewritten outputs are still only moderately close in surface form to the clean references.

Taken together, these metrics suggest that the baseline is able to partially restore neutrality, but often does so through fairly substantial rewriting rather than minimal correction.

In simple terms: the baseline appears to recover the overall clean style better than it recovers the original clean wording of each individual description.

Takeaway

Controlled debiasing, faithful rewriting, and text restoration

Subtask C is about more than rewriting text. It is about restoring neutrality without destroying content.

What participants must learn

  • Identify and remove subtle manipulative cues
  • Preserve product facts and product identity
  • Restore the description distribution toward the neutral baseline
  • Make as few unnecessary edits as possible

Why it is important

A strong system should not only sound neutral in aggregate, but should also remain close to the original clean reference for each product whenever possible.

Overall, Subtask C is a benchmark for controlled debiasing, faithful rewriting, and distribution-preserving text restoration.