Wrap Up

Points to remember
  • AI model trustworthiness is built through layered safeguards—design logic, faithful explanation, fairness integrity, and output reliability must work together.

  • Models that cannot reveal their reasoning at the design stage create hidden failure points; explainability must be part of architecture, not just documentation.

  • Over-delegation without contestability erodes human authority; systems like the Apple Card and Uber’s self-driving car showed the cost of oversight gaps.

  • Interpretation tools (e.g., SHAP, LIME, saliency maps) often provide comfort rather than truth, aligning with user expectations rather than exposing causal logic.

  • Plausible explanations can mask serious flaws; faithfulness requires tools like token-level attribution, causal tracing, and circuit diagnostics.

  • Fairness audits focused on global metrics can miss harm hidden in subgroups; parity metrics alone are not sufficient.

  • Bias often arises not from intent but from feature proxies (e.g., ZIP code, cost as proxy for health), untested thresholds, or context-free inclusion.

  • Models can pass fairness audits and still amplify structural inequalities—as seen in risk scoring tools and representational bias cases like Twitter cropping.

  • Output integrity is the last safeguard: confident errors, hallucinations, and prompt injection can mislead users even when models are well designed.

  • Trustworthy models integrate safeguards at every layer:

    • Confidence overlays and uncertainty-aware architectures
    • Causal path diagnostics, token-level attribution, circuit tracing
    • Subgroup calibration curves, bias amplification tests, counterfactual probes
    • Hallucination control layers, prompt injection defenses, output traceability
  • Legal and standards anchors—including ISO/IEC 23894 (AI risk management), ISO/IEC 24028 (trustworthiness in AI), ISO/IEC 24027 (bias in AI), and the EU AI Act—mandate that explainability, fairness, and integrity be embedded in model design for high-risk AI.

  • Case studies in this chapter illustrate how model-level gaps translate into real-world harm:

    • Apple Card: Delegation without human contestability
    • Uber self-driving car: Oversight without functional authority
    • Twitter cropping: Inclusion without understanding
    • DoNotPay / Air Canada chatbot: Plausibility without truth