• Home
  • BVSSH
  • C4E
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Practice : Model Explainability Techniques

Purpose and Strategic Importance

AI systems increasingly make or influence decisions that have significant consequences for individuals — credit decisions, medical diagnoses, hiring recommendations, content moderation outcomes. Individuals affected by these decisions have a legitimate interest in understanding why they were made and a right, in many jurisdictions, to receive an explanation. Model explainability techniques provide the tools to generate these explanations at both the global level (how does the model work in general?) and the local level (why did the model produce this specific output?).

Explainability is also an engineering quality tool. When a model behaves unexpectedly — making errors that seem inconsistent with its training task — explainability techniques help diagnose the cause, revealing whether the model is relying on spurious correlations, proxy features, or dataset artefacts rather than genuine signal. Models that cannot be explained cannot be debugged, and models that cannot be debugged cannot be trusted.


Description of the Practice

  • Applies SHAP (SHapley Additive exPlanations) to produce consistent, theoretically grounded explanations of feature contributions at both global and instance level for tabular and structured data models.
  • Uses LIME (Local Interpretable Model-Agnostic Explanations) for local explanations of individual predictions, particularly for models and data types where SHAP is computationally expensive.
  • Implements attention visualisation and saliency mapping for deep learning models on text and image data, providing human-interpretable insights into what model components are driving predictions.
  • Evaluates explanation quality — fidelity, stability, and comprehensibility — not just producing explanations but verifying they accurately represent model behaviour.
  • Integrates explanations into user-facing interfaces where appropriate, providing affected users with meaningful information about why a model produced a particular output.

How to Practise It (Playbook)

1. Getting Started

  • Select explainability methods appropriate to your model types and use cases — SHAP for structured data models, attention maps for transformers, LIME for quick local explanations — and build them into your evaluation toolkit.
  • Generate global explanations for your most important production models to understand which features drive predictions at a population level, identifying whether the model is relying on expected or unexpected patterns.
  • Review explanations with domain experts to validate that the model's reasoning is sensible — if the model is making predictions for the wrong reasons, this may not be visible in accuracy metrics alone.
  • Define what constitutes a useful explanation for your users and use cases before investing in explanation tooling — the right technique depends on who needs to understand what.

2. Scaling and Maturing

  • Automate explanation generation as part of the model evaluation pipeline, producing explanation reports at every model release that can be compared with previous versions to detect changes in model reasoning.
  • Build explanation consistency checks into model quality gates — flagging when explanations become unstable or when dominant features change in ways not expected from training data changes.
  • Develop user-facing explanation interfaces for high-stakes AI decisions, tested with real users to ensure explanations are genuinely comprehensible rather than technically correct but practically opaque.
  • Invest in faster explanation methods for real-time inference contexts where full SHAP computation is computationally impractical, using approximations calibrated against full methods.

3. Team Behaviours to Encourage

  • Treat inexplicable model behaviour as a quality concern requiring investigation — if you cannot explain why a model is making a particular decision, you cannot validate that it is making it for the right reasons.
  • Engage non-technical stakeholders in reviewing explanations — domain experts, legal counsel, and representatives of affected user groups often catch issues that technical reviewers miss.
  • Be honest about the limitations of explanations — current methods provide approximations rather than ground truth accounts of model behaviour, and these limitations should be communicated clearly.
  • Use explanations actively for debugging, not only as a governance reporting mechanism — when models fail, explanations are one of the most powerful diagnostic tools available.

4. Watch Out For…

  • Treating explanation outputs as authoritative accounts of model behaviour rather than approximations — all current explainability methods have known limitations and failure modes.
  • Generating explanations that are technically correct but incomprehensible to the intended audience — a feature attribution table is not an explanation for a user affected by a credit decision.
  • Explainability techniques that are applied post-hoc to justify deployment decisions rather than used genuinely to investigate model behaviour and inform quality judgements.
  • Overfitting to explainability as a goal — in some contexts, explanations should trigger changes to the model or its deployment constraints, not serve as a substitute for addressing underlying issues.

5. Signals of Success

  • Explanation reports are generated and reviewed at every model release, with review findings documented and actioned where they reveal unexpected model behaviour.
  • Domain experts reviewing model explanations report that the model's reasoning is coherent and consistent with the features they would expect to drive predictions.
  • Users of AI systems that make high-stakes decisions can access explanations for individual decisions in terms they can understand and challenge.
  • Explainability analysis has identified and prompted investigation of at least one spurious correlation or dataset artefact that the model was relying on, improving model quality.
  • Explanations are stable across similar inputs — models that produce wildly different explanations for similar inputs are flagged for investigation.
Associated Standards
  • AI systems provide explainable outputs for high-stakes decisions
  • All AI decisions above defined risk thresholds require human review
  • AI users have accessible mechanisms to challenge or correct AI outputs

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering