BiasBeware · Subtask A

Causal Attribution

Can a system identify which biased product is causally connected to a recommendation outcome? Subtask A asks participants to reason about downstream effects and decide whether Description A, Description B, or neither can best explain the observed shift.

Subtask A evaluates whether a model can recover the manipulated input most responsible for the downstream recommendation effect.

Goal

Identify the manipulated description behind the outcome

Participants are given pairs of attacked product descriptions, denoted Description A and Description B. The task is to determine whether the observed recommendation effect is better attributed to A, B, or Uncertain. In this setting, the recommendation effect refers to the change in a product’s ranking position before and after the attack. Each attacked product is therefore associated with one of three movement labels: Up, Down, or Same, indicating whether the product moved higher, lower, or remained at the same position after the cognitive-bias manipulation.

Prediction space

A — description A is the better causal explanation
B — description B is the better causal explanation
Uncertain — the effect cannot be confidently attributed to one side

Why it is difficult

This is not a bias-detection task. Systems must reason about which manipulated input is most likely connected to the downstream recommendation change, making the problem substantially harder than simply recognizing persuasive language.

Pilot construction

How the pilot data is built

The pilot begins from a clean control ranking and several attack-specific recommendation settings. Using ChatGPT 5.4 as the recommender, attacked products are compared against their positions in the control ranking and assigned one movement label: Up, Down, or Same.

Per attacked product

attacked product description
cognitive bias type
movement label: Up, Down, or Same

Final pilot instance

description_A description_B final_label

If the movement labels differ, one side is randomly selected as the gold causal source. If the movement labels are the same, the gold label is Uncertain.

Pilot setup

100 paired examples

For the pilot study, we sampled 100 paired examples.

Model roles

ChatGPT 5.4 — recommender
Gemini 3 — causal-attribution model

Expected output

Gemini 3 receives a pair of attacked descriptions and must output exactly one label: A, B, or Uncertain.

Evaluation

Multi-class causal classification

Predictions are evaluated against the gold labels using standard multi-class classification metrics.

Metrics

Primary metric: Macro-F1
Additional metrics: Accuracy, Balanced Accuracy, Per-class Precision / Recall / F1, Confusion Matrix
Also reported: one-vs-rest ROC/AUC values for completeness

Macro-F1

0.153

Primary metric on the 100-example pilot set.

Accuracy

0.160

Overall exact-match classification accuracy.

Balanced Accuracy

0.203

Accounts for uneven class behavior across labels.

Pilot results

Per-class performance

Class A

Precision: 0.167
Recall: 0.316
F1: 0.218

Class B

Precision: 0.167
Recall: 0.273
F1: 0.207

Class Uncertain

Precision: 0.100
Recall: 0.021
F1: 0.034

Confusion matrix

	pred_A	pred_B	pred_Uncertain
true_A	6	13	0
true_B	15	9	9
true_Uncertain	15	32	1

One-vs-rest ROC AUC

A: 0.473
B: 0.301
Uncertain: 0.424
Macro OVR AUC: 0.399
Weighted OVR AUC: 0.392

Interpretation

Why these results matter

The pilot shows that causal attribution is substantially harder than bias identification. Humans can easily recognize explicit bias cues, but deciding which manipulated product is responsible for a downstream ranking effect is a much more demanding reasoning problem.

Key observations

Gemini 3 performs well below a strong reliability threshold
It does somewhat better on A and B than on Uncertain
The model tends to over-commit instead of abstaining in ambiguous cases
Classwise analysis is essential; overall accuracy alone is misleading

Pilot takeaway

The pilot confirms that Subtask A is non-trivial and meaningful. It is not solved by superficial pattern matching and supports BiasBeware as a benchmark for causal reasoning under biased language conditions.