BiasBeware · Subtask B

Defense Against Attack

Can a system maintain fair recommendations under manipulated descriptions? Subtask B evaluates whether recommendation behavior remains robust when one or more product descriptions are attacked with cognitive-bias cues.

Subtask B focuses on protecting recommendation behavior from bias-driven ranking distortions.

Goal

Maintain fair recommendations under manipulated descriptions

In Subtask B, participants are given recommendation settings with competing products where one or more descriptions may have been attacked using cognitive-bias cues. The recommender to be used is Qwen3-0.6B, and the original pre-attack ranking is provided. The objective is to reduce unfair rank shifts caused by manipulated language.

What a successful system should do

Detect or neutralize the effect of biased product descriptions
Preserve the integrity of the original recommendation ordering
Remain robust across recommendation settings and attack styles
Generalize beyond a single prompt-specific defense trick

Why this matters

Cognitive-bias attacks can unfairly boost or suppress products without changing their underlying technical content. Subtask B measures whether a system can resist such distortions and preserve fair recommendation behavior.

Evaluation

Rank restoration under attack

We evaluate defense through rank restoration using the sum of squared rank displacement.

Primary metric

sum(Δ²)

Lower is better. Values closer to 0 indicate rankings closer to the original unbiased ordering.

Interpretation

0 → best

Larger deviations are penalized more strongly than with simple absolute displacement.

Primary metric: \(\sum_{i=1}^{n}\left(r_{\mathrm{before}}(i)-r_{\mathrm{after}}(i)\right)^2\)

Additional descriptive metrics

avg|Δ|: average absolute rank displacement
Spearman correlation between defended and original rankings
Kendall tau correlation
Kendall distance

These complementary metrics help distinguish between small local shifts and more global ranking disruption.

Pilot observations

Simple prompt-based defense is not enough

Preliminary experiments with Qwen3-0.6B show that a lightweight prompt-based defense is not sufficient to reliably preserve the original ranking under cognitive-bias attacks.

Overall pilot averages

avg|Δ|

2.5755

The average product still moves by about 2.6 ranking positions.

sum(Δ²)

68.2583

Substantial ranking distortion remains after defense.

Spearman

0.0081

Very weak global ranking agreement.

Kendall tau

0.0081

Pairwise order recovery remains near zero.

Kendall distance

0.4959

The defended rankings remain far from preserving the original pairwise order.

What these numbers mean

Even after defense, rankings remain substantially distorted. The average product still shifts by about 2.6 positions, while both Spearman and Kendall tau remain near zero, indicating that the original recommendation structure is only weakly recovered.

Across attack families

Some families are slightly easier, but none are well controlled

At the attack-family level, the defense performs similarly poorly overall, though some differences appear in the pilot.

Directory-level results

Scarcity — hardest by the primary metric: sum(Δ²) = 69.5125
Discount framing — slightly less disruptive: sum(Δ²) = 67.5062
Exclusivity — very close in aggregate: sum(Δ²) = 67.6424
Social proof — comparatively easiest for the current defense: sum(Δ²) = 68.3375, Spearman = 0.0514, Kendall tau = 0.0404, Kendall distance = 0.4798

Interpretation

The overall picture is consistent: the current defense does not strongly recover the original ranking for any attack family. Social proof is comparatively the easiest case, while scarcity is the most difficult by the primary metric.

Variation across attacks

Some attack instances are much more disruptive than others

Within each family, individual attacks vary substantially in how recoverable they are.

Examples of difficult cases

Discount framing / Attack 0: sum(Δ²) = 86.00, Spearman = -0.1644, Kendall tau = -0.1111
Scarcity / Attack 6: sum(Δ²) = 78.35
Social proof / Attack 4: sum(Δ²) = 81.25

Examples of milder / more recoverable cases

Exclusivity / Attack 2: sum(Δ²) = 50.65, one of the mildest cases in the pilot
Social proof / Attack 0: sum(Δ²) = 56.45, Spearman = 0.3290, Kendall tau = 0.2603, the most recoverable case under the current defense

Takeaway

This variability suggests that some biased interventions are easier to counteract than others, even within the same attack family. Robust defense therefore requires more than generic instruction-based prompting.

Pilot takeaway

Robust defense under biased language remains an open challenge

The pilot confirms that Subtask B is non-trivial and meaningful. A simple defense prompt does not reliably restore fair recommendation behavior: products still move substantially, and overall ranking agreement remains weak.

What this shows

The task is not solved by generic instruction-based prompting
Robust defense requires more than superficial prompt hardening
Future systems should be evaluated on whether they truly preserve the original ranking structure

Why it matters

Overall, the pilot supports Subtask B as a challenging benchmark for robust recommendation under cognitively manipulated language.

Defense Against Attack

Maintain fair recommendations under manipulated descriptions

What a successful system should do

Why this matters

Rank restoration under attack

Additional descriptive metrics

Simple prompt-based defense is not enough

Overall pilot averages

What these numbers mean

Some families are slightly easier, but none are well controlled

Directory-level results

Interpretation

Some attack instances are much more disruptive than others

Examples of difficult cases

Examples of milder / more recoverable cases

Takeaway

Robust defense under biased language remains an open challenge

What this shows

Why it matters

Back to BiasBeware