Chapter 05: Learning Objectives¶

Identify key model-level design choices that affect explainability, fairness, and output reliability.
Analyze how model reasoning, explanation tools, and fairness metrics can create the appearance of trustworthiness while masking underlying failures.
Evaluate the strengths and limitations of widely used interpretability methods (e.g., SHAP, LIME, token-level attribution) in revealing true model logic.
Apply concepts of subgroup fairness auditing and bias detection to assess model impact across different user groups.
Propose strategies for preventing misleading or harmful outputs, including hallucination control, prompt injection defense, and confidence signaling.
Relate model-level trust challenges to broader trustworthy AI principles, preparing for advanced topics such as dynamic monitoring and adversarial robustness.