• Home
  • BVSSH
  • C4E
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Standard : AI Team Psychological Safety Score

Description

AI Team Psychological Safety Score measures the degree to which team members feel safe to voice concerns about AI systems, report failures and mistakes without fear of blame, challenge approaches they believe are flawed, and raise ethical or safety concerns without negative personal consequences. It is assessed through regular pulse surveys using validated psychometric instruments adapted for AI-specific contexts.

Psychological safety is not a soft metric — it is a direct predictor of team performance, innovation, and safety outcomes. In AI contexts, it carries additional urgency: team members who are not psychologically safe are less likely to raise concerns about biased models, report data quality issues, challenge deployment decisions they believe are premature, or escalate governance concerns. The result is AI systems deployed with unresolved risks that team members knew about but did not feel safe raising. This measure ensures that the organisational conditions enabling honest, safety-conscious AI development are actively monitored and maintained.

How to Use

What to Measure

  • Composite psychological safety score from validated survey instrument (Amy Edmondson's 7-item scale or equivalent)
  • AI-specific safety subscores: willingness to raise AI concerns, confidence in escalation pathways, comfort challenging AI deployment decisions
  • Trend over time: is psychological safety improving, declining, or stable across rolling quarters?
  • Score segmentation by seniority, role, and team to identify whether safety is unevenly distributed
  • Correlation with incident reporting rates: teams with higher psychological safety typically report more near-misses, which is a positive signal

Formula

Psychological Safety Score = Mean response to validated survey items on a 1–7 Likert scale

The survey should include AI-specific items such as:

  • "I feel safe raising concerns about the safety or ethics of AI systems we are building"
  • "If I thought a model was not ready for production, I would feel confident saying so"
  • "Mistakes in our AI work are treated as learning opportunities rather than causes for blame"

Optional:

  • Composite index: normalise to 0–100 scale for easier benchmarking
  • Disaggregated view: compute sub-scores for AI safety concerns specifically vs general team dynamics

Instrumentation Tips

  • Run surveys quarterly using a consistent instrument so trends are interpretable — changing the questions makes historical comparisons invalid
  • Ensure surveys are genuinely anonymous — if team members believe their responses can be traced to them, they will not answer honestly
  • Share results with teams within two weeks of survey close so the measurement feels meaningful rather than performative
  • Distinguish between low scores caused by team dynamics (addressable by managers) and low scores caused by organisational structure (requiring leadership action)

Benchmarks

Metric Range Interpretation
Score ≥ 6.0 / 7.0 (≥ 85%) Excellent — team has strong safety to challenge, raise concerns, and report failures
Score 5.0–5.9 / 7.0 (71–84%) Good — healthy but monitor for pockets of concern; investigate lowest-scoring items
Score 4.0–4.9 / 7.0 (57–70%) Needs attention — meaningful safety gaps; team lead and HR should investigate root causes
Score < 4.0 / 7.0 (< 57%) Critical — team is likely not surfacing real concerns; immediate leadership intervention required

Why It Matters

  • Psychological safety is the primary predictor of team performance and learning Google's Project Aristotle — a multi-year study of team effectiveness — identified psychological safety as the most important factor distinguishing high-performing teams from low-performing ones, ahead of individual talent, team composition, or process rigour.

  • Low psychological safety in AI teams creates hidden safety risks If engineers do not feel safe raising concerns about biased models, premature deployments, or governance gaps, those concerns go unheard. The result is AI systems deployed with known unresolved risks — a governance failure with potentially serious consequences.

  • Engineers have the right not to deploy AI systems they have safety concerns about This is a foundational principle of responsible AI development. Organisations that do not cultivate psychological safety make this right meaningless in practice — the formal right exists but the cultural conditions to exercise it do not.

  • Team learning requires safety to fail and report failures AI teams that run many experiments will have many failures. The teams that learn fastest from failures are those where members freely share what went wrong, why, and what can be done differently — behaviours that require psychological safety.

Best Practices

  • Leaders at all levels should model psychological safety through their own behaviour — acknowledging mistakes publicly, inviting dissenting views, and responding constructively when concerns are raised
  • Create structured, explicit channels for AI safety concerns (e.g., a confidential AI safety concern escalation process) rather than relying on informal routes
  • Treat low psychological safety scores as organisational signals requiring systemic response, not as feedback about individual team members
  • Debrief AI incidents in blameless postmortems that focus on systemic factors rather than individual error
  • Recognise and reward team members who raise difficult concerns or flag risks, making concern-raising visibly valued

Common Pitfalls

  • Running psychological safety surveys without sharing results or taking action, signalling that the measurement is performative rather than genuine
  • Conflating psychological safety with happiness — a team can be unhappy about their circumstances while being psychologically safe, and vice versa
  • Not measuring psychological safety specifically in the context of AI concerns, using generic team health surveys that miss AI-specific safety dimensions
  • Assuming that high scores mean no action is needed — psychological safety requires ongoing cultivation, not a one-time assessment

Signals of Success

  • Team members openly raise AI safety and ethics concerns in sprint ceremonies and design reviews
  • The team has a documented escalation path for AI safety concerns that has been used at least once in the past year
  • Psychological safety scores are improving or stable over the past four quarters
  • At least one AI deployment has been paused or modified based on concerns raised by a team member without negative consequences for that person

Related Measures

  • [[AI Knowledge Sharing Frequency]]
  • [[AI Technical Debt Ratio]]
  • [[AI Governance Compliance Score]]

Aligned Industry Research

  • Edmondson — Psychological Safety and Learning Behavior in Work Teams (Administrative Science Quarterly 1999) Amy Edmondson's foundational research establishing psychological safety as a measurable team property and a predictor of learning behaviour and performance. Provides the validated survey instrument most widely adapted for use in technology team contexts.

  • Rozovsky — The Five Keys to a Successful Google Team (re:Work 2015) The public findings from Google's Project Aristotle research, reporting that psychological safety — above all other factors including individual skill, team structure, and process — was the strongest predictor of team effectiveness, with direct implications for AI team composition and management practices.

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering