Skip to content

5.3. Fairness Isn’t Always Fair

Fairness Isn’t Always Fair

In the last section, we explored how explainability failures can mask flawed logic beneath fluent narratives. But even when reasoning is made visible, AI systems can fail in deeper, quieter ways, by applying that reasoning in ways that amplify bias or erase context. This brings us to fairness: a property as complex as the societies AI serves.

AI systems often claim fairness. They pass audits, meet parity metrics, and publish equality reports. But in deployment, these same systems can still cause harm, especially when fairness is defined statistically, not socially.

In this section, we confront a difficult truth: technical fairness doesn’t always protect real people. And when AI designers forget to ask “fair to whom, in what context, and with what consequences?”, trustworthy outcomes break down.

⚠️ When Fairness Creates New Harm

Case Study 018: 2022 Apple Watch Heart Rate Monitor Bias (Location: Global | Theme: Fairness and Bias in Health AI)

🧾 Overview
In 2022, researchers found that the Apple Watch’s heart rate monitor was less accurate on darker skin tones. The device’s optical sensors, combined with non-diverse training data, led to reduced performance for users with higher melanin levels.

🚧 Challenges
The system failed to account for variations in skin tone during design and testing. No alerts were provided when readings were unreliable.

💥 Impact
Millions of users received less accurate health data, raising concerns about fairness and health equity in wearable technology.

🛠️ Action
The study prompted calls for more inclusive testing and transparency in wearable health technology performance.

🎯 Results
The case highlighted how lack of diversity in design can embed bias in health AI and the need for subgroup-sensitive evaluation.

In 2022, researchers found that the Apple Watch’s heart rate monitor consistently underperformed on darker skin tones1. The failure was not malicious. The sensors were colorblind. The algorithm was "calibrated." But the training data lacked diversity in melanin levels. The result? Millions of users received silent underperformance, and the system never flagged a problem.

This is just one in a series of cases where models technically satisfied fairness metrics, but socially failed:

  • Credit scoring models that assign lower trust to women, even with identical financial history
  • Hiring systems that filter out “non-traditional” educational backgrounds
  • Language models that associate ethnic names with crime, poverty, or aggression2

These aren't edge cases. They are structural failures of context, rooted in how training data encodes social bias, and how fairness metrics often mask that reality.

📊 Key Insight:
A model can treat every group the same, and still amplify historical injustice.

Under the EU AI Act, developers of high-risk systems are obligated to assess and mitigate “systematic bias” across demographic groups. Similarly, Korea’s AI Basic Act (2024) mandates transparency and non-discrimination when AI is used in healthcare, employment, finance, and legal decision-making.

But compliance doesn’t equal justice. As law sets minimums, fairness demands deeper reflection:

  • Are marginalized voices accounted for?
  • Can users detect and contest unfair treatment?
  • Do performance metrics mask harm to underrepresented groups?

This chapter treats fairness as both a technical property and a social contract. One that must evolve with usage, context, and the communities it affects.

🧭 Where We Go from Here

In the next two sections, we explore how fairness can fail even with good intentions:

  • How systems include underrepresented data, but lose the cultural or social context that gives it meaning.
  • How silent calibration errors in risk prediction can lead to harmful bias, especially in critical areas like healthcare and insurance.

Each section includes not just the failures, but the tools to intervene: counterfactual fairness, subgroup calibration, concept-aware audits, and post-deployment feedback loops that don’t just correct harm, but prevent it from being invisible.

Because true fairness requires more than symmetry, it requires situational intelligence.

Bibliography


  1. Bent, B., Goldstein, B. A., Kibbe, W. A., & Dunn, J. P. (2020). Investigating sources of inaccuracy in wearable optical heart rate sensors. NPJ digital medicine, 3(1), 18. 

  2. Binns, R., et al. (2018). 'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions. CHI 2018. https://doi.org/10.1145/3173574.3173951