• Home
  • BVSSH
  • C4E
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Standard : Experiment-to-Production Cycle Time

Description

Experiment-to-Production Cycle Time measures the total elapsed time from when an AI experiment hypothesis is formally initiated to when the resulting model is receiving production traffic and generating real-world feedback. Unlike Model Deployment Lead Time, which measures only the pipeline phase, this metric captures the full end-to-end journey — including experiment design, data preparation, training runs, evaluation, stakeholder approval, and deployment.

This is the most holistic measure of AI delivery velocity. It answers the question that ultimately matters for business impact: how long does it take to go from an idea about how AI can help, to a validated, deployed solution that real users are experiencing? Long cycle times accumulate opportunity cost, increase the risk of building the wrong thing, and prevent the organisation from learning quickly enough to course-correct.

How to Use

What to Measure

  • Total elapsed time from experiment ticket creation (or sprint start) to first production traffic
  • Breakdown by phase: experiment design, data preparation, training, evaluation, stakeholder review, deployment
  • Percentage of experiments completing within one sprint (two weeks) vs two sprints vs longer
  • Ratio of experiments that reach production vs experiments that are abandoned or deprioritised after initiation
  • Cycle time trend over rolling quarters

Formula

Experiment-to-Production Cycle Time = Production Deployment Timestamp − Experiment Initiation Timestamp

Optional:

  • Phase contribution: time spent in each phase as a percentage of total cycle time
  • Cycle time efficiency: Active Working Time / Total Elapsed Time — low values indicate queue time and waiting

Instrumentation Tips

  • Create a standard experiment ticket template in the team's project tracking system with defined start and completion events
  • Use the experiment tracking system (MLflow, Weights & Biases) to automate capture of training and evaluation timestamps
  • Track stakeholder review wait time separately from technical execution time to identify organisational friction
  • Review cycle time distributions — not just averages — to identify whether a small number of long-running experiments are inflating the mean

Benchmarks

Metric Range Interpretation
< 2 weeks (1 sprint) Excellent — team is operating with true agility; fast learning cycles
2–4 weeks (1–2 sprints) Good — reasonable velocity for most AI work; watch for creep
4–8 weeks Needs improvement — experiment scope may be too large or organisational friction is high
> 8 weeks Problematic — cycle time is too long for effective learning; redesign the approach to AI delivery

Why It Matters

  • Cycle time is the rate-limiting factor on AI learning velocity An organisation that can complete experiment-to-production cycles in two weeks learns six times faster than one taking twelve weeks. Over a year, this compounds into a decisive competitive advantage.

  • Long cycle times increase the cost of being wrong An experiment that takes eight weeks to reach production has consumed significant investment before the team knows whether the approach works. Sprint-scale cycles mean wrong directions are discovered and abandoned cheaply.

  • Cycle time reveals where organisational friction lives Detailed phase breakdowns often reveal that technical execution is fast but stakeholder approval or compliance review takes weeks. This points to process redesign opportunities that are often more valuable than technical optimisations.

  • Short cycles enable user-driven iteration When each cycle takes two weeks, the team can iterate based on production feedback four times in two months. When cycles take eight weeks, the team is locked into a direction for half a year before real-world learning can inform a change.

Best Practices

  • Apply timeboxing discipline: define the maximum experiment duration upfront and enforce it rather than allowing experiments to expand indefinitely
  • Use the "walking skeleton" pattern — deploy the simplest possible version of a model to production quickly, then iterate, rather than attempting to build the perfect model before any deployment
  • Run stakeholder alignment in parallel with technical work rather than sequentially to eliminate approval wait time from the critical path
  • Maintain a single, agreed "definition of done" for experiments that is understood by data scientists, product managers, and engineering leads
  • Review cycle time quarterly in the AI community of practice to share learning across teams

Common Pitfalls

  • Measuring only successful experiments — abandoned experiments that consumed significant time should be counted to avoid survivorship bias
  • Not capturing the time spent waiting for data access, labelling, or infrastructure provisioning as part of cycle time, masking systemic blockers
  • Conflating experiment scope with cycle time — a longer experiment is not inherently slower if the scope justifies the investment
  • Optimising for cycle time at the expense of experiment quality, producing fast but unreliable production deployments

Signals of Success

  • The majority of completed experiments reach production within two sprints
  • The team can articulate where cycle time is spent and has a specific plan to reduce the longest phase
  • No experiment has been in-flight for more than eight weeks without a deliberate scope justification documented and reviewed by the team lead
  • Cycle time has decreased meaningfully in the past six months as a result of deliberate process improvement

Related Measures

  • [[Model Deployment Lead Time]]
  • [[ML Pipeline Reliability Score]]
  • [[AI-Attributed Outcome Achievement Rate]]

Aligned Industry Research

  • Ries — The Lean Startup (Crown Business 2011) The Build-Measure-Learn loop that underpins Lean Startup methodology is directly applicable to AI experimentation. Ries demonstrates empirically that organisations that minimise cycle time outperform those optimising for quality of individual cycles, as the learning rate more than compensates for the slightly lower quality of any single iteration.

  • Humble & Farley — Continuous Delivery (Addison-Wesley 2010) The foundational arguments for short feedback cycles in software delivery apply with equal force to AI systems. The authors' demonstration that long cycle times are primarily caused by batch sizes rather than individual task complexity provides a useful diagnostic framework for AI teams.

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering