Skip to content

Chapter 05: Learning Objectives

  • Identify key model-level design choices that affect explainability, fairness, and output reliability.
  • Analyze how model reasoning, explanation tools, and fairness metrics can create the appearance of trustworthiness while masking underlying failures.
  • Evaluate the strengths and limitations of widely used interpretability methods (e.g., SHAP, LIME, token-level attribution) in revealing true model logic.
  • Apply concepts of subgroup fairness auditing and bias detection to assess model impact across different user groups.
  • Propose strategies for preventing misleading or harmful outputs, including hallucination control, prompt injection defense, and confidence signaling.
  • Relate model-level trust challenges to broader trustworthy AI principles, preparing for advanced topics such as dynamic monitoring and adversarial robustness.