Navigate Ways of Working
Guide

Engineering Process

The operational reference for day-to-day work. Sprint cadence, refinement, planning, technical design, development workflow, incident management, releases, metrics, onboarding, and more.

1. Organisation Structure

Squads

Product Squads (one per product area)

  • 1 PM, 1 Lead Dev, 1 QA, 3-6 Devs
  • Each owns a distinct product area end-to-end

Platform Squad (one)

  • 1 PM, 1 Lead Dev, 1 QA, 3-4 Devs (platform squads tend smaller)
  • Owns: CI/CD, shared libraries, AI tooling, observability, cross-cutting tech debt

Staff Engineer (one per org, no squad)

  • Architecture, code quality standards, AI playbook, cross-squad consistency

PM Discovery & Stakeholders: Discovery, roadmap, and stakeholder management (pairs with Head of Product) PM Analytics & Experimentation: Analytics, experimentation, and feedback loops

For more detail on roles, squad composition, and reporting lines, see Team Structure.

Leadership

  • Head of Product: Roadmap, prioritisation, stakeholder alignment, final call on what gets built
  • Head of Engineering: Technical strategy, delivery health, hiring, final call on how things get built

2. How Work Enters the System

SourceEntry PointPrioritised By
Product roadmapSquad PMHead of Product
Customer feedback / supportSquad PMSquad PM
Sales requestsHead of ProductHead of Product
Internal stakeholdersSquad PM (or Head of Product for cross-cutting)Head of Product arbitrates conflicts
Engineering-initiated (tech debt, reliability, tooling)Lead DevHead of Eng (15—25% standing allocation, no PM approval needed)
Production incidentsOn-call engineerSeverity level determines response (see Section 8)

Rules:

  • Only the PM can add items to a squad’s backlog
  • Only the PM can inject work into a sprint mid-flight, after consulting Lead Dev on capacity impact
  • No one goes directly to a developer with requests
  • PMs have visibility of engineering allocation work but do not approve or reject it
  • Head of Product is the escalation point for priority conflicts and executive-driven interruptions

AI usage in intake: AI triages and categorises incoming requests, linking to existing backlog items. AI drafts initial assessments (rough sizing, squad ownership, roadmap overlap). AI synthesises customer feedback from support tickets to surface patterns so PMs work from data, not anecdote. For a full catalogue of AI applications across workflows, see the AI Use-Case Catalogue.


3. Sprint Cadence

Sprint length: 2 weeks

WhenWhatWhoDuration
Monday (sprint start)Sprint planningWhole squad30-45 mins
Mid-week (weekly)RefinementPM, Lead Dev, QA, 1-2 devs45-60 mins
ThursdayMid-sprint checkSquad15 mins
Friday (sprint end)DemoSquad (stakeholders invited)30 mins
Every 4 weeksRetrospectiveSquad only (no leadership)45 mins
MonthlyArchitecture reviewLead Devs, Staff Engineer, Head of Eng60 mins
MonthlyProduct strategy syncPMs, Head of Product60 mins
QuarterlyLightweight planningLeadership + squadsHalf day

Daily standups are not mandated. Squads use async updates. A squad may choose to hold a daily sync — their call.


4. Refinement

When: Weekly, mid-week. Refine work 1-2 sprints ahead, never the current sprint.

Attendees: PM (mandatory), Lead Dev (mandatory), QA (mandatory), 1-2 devs likely to build it.

PM prepares:

  • Problem statement: what is the user problem or business objective?
  • Evidence: how do we know this is real?
  • Success criteria: measurable outcomes
  • Constraints: budget, timeline, dependencies, compliance

The session produces:

  • Clear problem statement (PM owns)
  • Agreed high-level approach (Lead Dev owns)
  • Known risks and open questions
  • Rough size: small / medium / large
  • Acceptance criteria (PM + QA collaborate)

AI usage: PM arrives with AI-drafted problem statement and acceptance criteria as starting point. Lead Dev has AI-generated preliminary technical assessment. QA arrives with AI-generated test scenarios. Human value is in the debate, edge cases, and trade-off decisions.

Rules:

  • PM presents the problem, not the solution
  • If an item can’t be refined in one session, it’s too big - split or spike
  • No estimation theatre - no planning poker, no story points
  • No current sprint work discussed
  • Only refine what’s likely built in next 1-2 sprints

This refinement process feeds into the broader product process, which describes how ideas move from discovery through to delivery.


5. Sprint Planning

When: First morning of the sprint.

Attendees: Whole squad.

PM prepares:

  • Sprint goal (one sentence)
  • Candidate items in priority order (all previously refined)

The session produces:

  • Sprint goal (one sentence)
  • Committed items (priority ordered)
  • Known risks and dependencies
  • Capacity constraints noted

Process:

  1. PM presents sprint goal and candidate items
  2. Lead Dev facilitates: squad assesses capacity (holidays, on-call, overhead)
  3. Squad pulls items from priority list until realistic
  4. Devs self-assign or Lead Dev assigns
  5. QA confirms they can keep pace - if not, that constrains the sprint

Rules:

  • If an item isn’t understood, it goes back to refinement - don’t refine in planning
  • PM prioritises; engineering assesses capacity; the intersection is the sprint
  • PM does not assign work to individual devs
  • Commitment is a forecast, not a contract

AI usage: AI suggests realistic item count based on recent throughput history. AI flags potential cross-squad code-level conflicts in committed items. PM uses AI to draft sprint goal before the meeting.


6. Technical Design Review

When: After an item is picked up, before coding starts.

Process:

  1. Dev spends 30-60 minutes thinking about approach
  2. Dev writes it down (Slack message, short doc, bullet points): key architectural choices, data flow, state management, error handling, observability approach
  3. Dev shares with Lead Dev
  4. Lead Dev responds: “good, go” or has a 15-30 minute conversation about alternatives and risks
  5. For cross-cutting or infrastructure work, Staff Engineer is the reviewer instead

Output: Agreed approach, captured on the ticket or as an ADR for significant decisions.

Calibration: 2-minute Slack exchange for experienced devs on straightforward items. Longer, more structured conversation for junior devs or complex work. Lead Dev uses judgment.

AI usage: Dev pressure-tests approach with AI before the Lead Dev conversation (“here’s my plan, what are the failure modes I haven’t considered?”). AI raises the floor; Lead Dev adds codebase history, operational context, and team capability judgment that AI can’t provide.


7. Development & Deployment

Developer Workflow

  1. Dev commits to main behind a feature flag
  2. Small, frequent commits - each passes CI
  3. CI runs automatically: tests, linting, static analysis, security scan, AI code review
  4. Green CI -> auto-deploys to staging -> smoke tests -> promotes to production
  5. Target: commit to production in under 30 minutes

For more on CI/CD pipeline design, container strategy, and deployment principles, see DevOps Principles.

AI usage in development: Code generation, test writing, documentation. AI-assisted code review runs as part of CI, catching style issues, common bugs, security anti-patterns, and deviation from codebase conventions.

Feature Ownership

  • One developer owns a feature end-to-end
  • Pairing for complex/risky work (Lead Dev’s call)
  • Junior devs own their features with closer Lead Dev oversight
  • Features spanning multiple services: one owner coordinates, others contribute

Pull Requests

PRs are not the default. Code goes to main via CI gates.

PR-like review required for:

  • Shared platform code / APIs consumed by other squads
  • Database migrations
  • Security-sensitive changes (auth, payment, PII)
  • New joiners (first few weeks, as coaching)

For these: one reviewer, 4-hour SLA, auto-merge if no response.

Post-Merge Review

Lead Devs and Staff Engineer review committed code asynchronously - not a gate, but a learning and consistency mechanism. Concerns become follow-up conversations or new commits.


8. Incident Management & On-Call

Rotation: Weekly, per squad. Platform squad has its own rotation.

Escalation:

  1. On-call engineer triages and fixes/contains
  2. Not resolved in 30 mins -> Lead Dev
  3. Cross-squad -> platform squad on-call
  4. Customer-facing + severe -> Head of Eng

Severity and response:

LevelDefinitionResponse
SEV1Service down or critically degradedAll hands, immediate, stakeholder comms within 30 mins
SEV2Significant degradation, workaround existsOn-call responds within 30 mins, fix during working hours
SEV3Minor, no user impactLogged, enters normal backlog

Sprint impact: SEV1 drops everything, sprint scope renegotiated. SEV2 handled from buffer; if >half day, Lead Dev and PM discuss displacement. SEV3 enters backlog normally.

Capacity buffer: 10-15% of sprint capacity assumed lost to unplanned work.

Incident reviews: Every SEV1 and systemic SEV2s, within 3 working days. Max 3 owned, dated actions. Output stored alongside ADRs.

Compensation: Time off in lieu.

AI usage: During incidents, AI assists with log analysis, identifying related recent deployments, and suggesting similar past incidents. For incident reviews, AI drafts initial timeline from alerting data and Slack threads. Over time, AI identifies recurring incident themes across squads.


9. Release Process

Feature Flags

TypeOwnerPurpose
EngineeringDev building the featureHide incomplete work during development
ReleasePM decides flip; engineering confirms readinessControl feature visibility to users
OperationalEngineeringKill switches for production issues (no PM approval needed)

Release Decision

  1. Lead Dev confirms technical readiness
  2. QA confirms quality (exploratory testing complete)
  3. PM confirms business readiness (docs, support, comms)
  4. PM flips the flag (or requests engineering to)

Async check (Slack/ticket checklist), not a meeting.

Rollout

  • Default: Percentage rollout - As an example (vary on a case by case basis): 5-10% -> monitor 24-48 hrs -> 25% -> 50% -> 100%
  • Targeted: User-segment rollout for beta/specific tiers
  • Trivial changes: Full rollout immediately

Rollback

  • Feature flag toggle: Immediate (seconds). First line of defence.
  • Code rollback: Revert commit, redeploy (~30 mins). When flag toggle is insufficient.
  • Rollback criteria defined before release: “If error rate exceeds X or latency exceeds Y within 24 hours, roll back.”

Database Migrations

  • Must be backward compatible (old code works with new schema and vice versa)
  • Deploy migration separately from code
  • Large migrations in stages using online migration tools
  • Second pair of eyes required (Lead Dev or Staff Engineer)

Flag Cleanup

  • Feature fully rolled out or killed -> dev removes flag and conditional/dead code within one sprint
  • Platform squad reports flags older than 30 days monthly
  • Flags older than 60 days without documented reason escalated to Lead Dev

AI usage in releases: AI monitors rollout metrics in real-time and auto-reverts flags if thresholds breached. AI generates release notes from commits and tickets. AI flags stale feature flags and associated dead code. For database migrations, AI analyses scripts against current schema for locking, data loss, and backward compatibility issues before human review.


10. QA Workflow

PhaseQA Activity
RefinementAttends session. Asks about testability and edge cases. Collaborates on acceptance criteria.
During developmentWrites automated tests in parallel with dev work. Pairs with devs on testability.
At mergeOwns quality gate criteria. Automated suites run in CI. Exploratory testing for complex/risky changes.
At releaseConfirms quality as part of release decision.
Post-releaseOwns production quality monitoring - error rates, regressions, anomaly detection.

QA does not have a veto on release. QA has a documented voice - if overruled, the risk is acknowledged and logged.

AI usage in QA: AI-assisted test generation - QA reviews and curates rather than writing every test. AI-powered anomaly detection on production metrics. Automated pre/post-release behaviour comparison. AI-generated test scenarios as starting material for refinement.


11. Cross-Squad Dependencies

Platform squad requests:

  • “I need this to exist” -> enters platform squad backlog, platform PM prioritises
  • “I need this unblocked now” -> draws from platform squad’s 15% unplanned buffer

Weekly sync (15 mins): Platform PM + product squad PMs align on priorities. Head of Product arbitrates conflicts.

Squad-to-squad: Requests go through the receiving squad’s PM. Small favours (<half day) handled informally. Large requests become roadmap items.

Staff Engineer is the early warning system - spots cross-squad collisions and raises them proactively.

Shared services: Owned by platform squad with defined API contracts. Platform squad owns backward compatibility and coordinated migration.

AI usage: AI analyses sprint commitments across squads and flags code-level conflicts. Dependency tracking over time surfaces structural problems. Platform squad uses AI to auto-generate API documentation and migration guides when contracts change.


12. Metrics

Collection

All metrics derived from existing tooling. No manual data entry.

Human disciplines required:

  1. Move items through project tracker states accurately and promptly
  2. Tag unplanned items entering a sprint
  3. Tag engineering-initiated items
  4. Tag rollbacks/hotfixes in deployment tooling

Delivery Metrics (Squad Level)

MetricSourceTarget
ThroughputProject tracker (items to “done” per sprint)Trend over 4—6 sprints
Cycle timeProject tracker (“in progress” to “done”)Most items within one sprint
Deployment frequencyCI/CD pipeline logsDaily or better
Change failure rateDeployment tooling (rollbacks, hotfixes, emergency flag toggles)Trending down
MTTRAlerting tool to incident resolutionMinutes (via flag toggle)

Org-Level Metrics

MetricSignal
Planned vs unplanned ratio>20% unplanned consistently = systemic problem
Cross-squad dependency wait timeTrending up = platform squad under-resourced
Engineering allocation usageConsistently surrendered = invisible tech debt accumulation

Product Metrics

Defined per feature at refinement. Measured via analytics tooling. PM owns.

Reporting

  • Head of Eng: delivery health trends with context (weekly dashboard review)
  • Head of Product: adoption, outcomes, roadmap progress
  • Both: quarterly outcome-oriented report to leadership

Do not measure: Individual velocity, story points, hours worked, sprint commitment accuracy.

AI usage in metrics: Automated collection and dashboarding from Git, CI/CD, and deployment tooling. AI-generated weekly squad summaries. Anomaly detection when metrics deviate from trend. Correlation analysis over time.


13. Retrospectives

Frequency: Every 2 sprints (4 weeks). Duration: 45 minutes. Attendees: Squad only. No leadership. Facilitator: Lead Dev or rotating squad member.

Format:

  • What’s slowing us down?
  • What should we change?
  • What’s working that we should protect?

Output: Maximum 2 actions with named owner and date. Reviewed at start of next retro. Same issue 3 retros running with no change -> escalates to Head of Eng.

Cross-squad retro: Quarterly - Lead Devs, Staff Engineer, Head of Eng on systemic issues.

AI usage: Minimal. One application: pattern detection across retros over time (“this theme has appeared in 3 of the last 4 retros”).


14. Responsibilities Matrix

DecisionWho DecidesWho Has InputWho Has Visibility
What to build / priorityHead of Product / PMHead of Eng, Lead DevSquads, stakeholders
How to build itLead Dev / Staff EngineerPM (constraints), devsPM
Engineering allocation (15-25%)Head of EngLead DevsPMs (informed, not approval)
Tech debt prioritisationLead Dev (within allocation)DevsPM
Release timingPMLead Dev (readiness), QA (quality)Head of Product
RollbackOn-call engineer / Lead Dev-PM, Head of Eng
Sprint scope changePMLead Dev (capacity impact)Squad
HiringHead of Eng + Head of Product jointlyLead Devs-
Squad compositionHead of Eng + Head of Product jointlyLead Devs-
Architecture decisionsStaff Engineer / Lead DevsDevsHead of Eng
Security standardsStaff EngineerPlatform squad, Head of EngAll
Cross-squad priority conflictsHead of ProductPMs, Head of EngSquads

15. Onboarding Timeline

WhenActivityOwner
Pre-arrivalLaptop, accounts, tooling, AI tools provisionedPlatform squad
Day 1Environment setup (target: under 2 hours to build/deploy locally)Platform squad tooling
Days 1-5Codebase walkthrough (2-3 hours across week)Lead Dev
Days 1-5Read AI playbook, “how we work” summaryNew joiner
Days 1-5Shadow refinement and planningNew joiner
Days 2-3First real commit (bug fix, config change). PR-reviewed by Lead Dev.New joiner + Lead Dev
Weeks 2-4Small refined items. Close Lead Dev involvement in design. Pair on at least 1 item.Lead Dev oversees
Week 1 endCheck-in: what’s confusing, blocking, missing?Lead Dev
Week 2 endCheck-inLead Dev
Week 4 endCheck-inLead Dev
Month 2Medium items independently. On-call shadowing. Post-merge review replaces PR review.Lead Dev monitors
Month 2 endFull squad member at appropriate autonomy levelLead Dev confirms

AI usage in onboarding: New joiners use AI to explore the codebase (“explain what this service does”, “show me authentication flow”). AI assists first contributions. AI generates personalised codebase orientation based on areas the new dev will work in. Supplements Lead Dev walkthrough, doesn’t replace it.


16. Documentation Requirements

DocumentOwnerLocationUpdated When
Architecture overview (1 page)Staff EngineerWikiArchitecture changes; verified at monthly sync
ADRsDev who made the decisionRepoAt decision time
API contractsOwning squadRepo (auto-generated from code)API changes
RunbooksOwning squad + QARepoService launch; after every incident revealing a gap
”How we work” summaryHead of EngWikiProcess changes; annually; onboarding feedback
AI playbookStaff EngineerWikiNew patterns or tool changes
Incident reviewsLead DevRepo / wikiAfter every SEV1 and systemic SEV2

AI usage in documentation: AI generates first drafts of runbooks, API docs, and architecture descriptions from codebase. AI keeps API docs in sync with code, flagging or auto-generating updates when endpoints change. AI identifies undocumented services by comparing codebase against existing docs. New joiners query AI as a living, queryable documentation layer.


17. Security Requirements

CI Pipeline (Non-Negotiable)

CheckAction on Finding
Dependency scanningBlock on critical/high; medium/low to backlog
SASTBlock on critical
Secrets detection (+ pre-commit hook)Block immediately
Container/image scanningBlock on critical

Access Control

  • Principle of least privilege everywhere
  • Production DB: read-only for debugging; write access requires Lead Dev approval + logging
  • Secrets in dedicated tooling (Vault / AWS Secrets Manager) - never in code, env files, or Slack

Periodic Activities

ActivityFrequencyOwner
Dependency auditQuarterlyPlatform squad
Penetration testAnnual (external firm)Head of Eng commissions
Security incident reviewPer eventLead Dev runs; Head of Eng attends

AI usage in security: AI code review in CI targets security anti-patterns beyond rule-based SAST. AI monitors dependency advisories and cross-references against codebase. AI helps devs write secure code as part of normal workflow. AI analyses access logs during incident investigation. AI generates security documentation (data flow diagrams showing PII locations and movement).


18. AI Tooling Standards

Provision: Every dev, PM, and QA gets AI tooling (Claude Pro/Team or equivalent) as standard.

AI Playbook: Maintained by Staff Engineer. Documents where AI assists and where it doesn’t. Squad members contribute patterns. See the AI Use-Case Catalogue for a comprehensive breakdown by workflow stage.

Platform squad builds internal AI tooling: custom prompts, RAG over codebase/docs, automated code review.

ActivityAI Application
RefinementPM: AI-drafted problem statements, acceptance criteria. Lead Dev: preliminary technical assessment. QA: test scenarios.
Technical designDev pressure-tests approach with AI before Lead Dev conversation
DevelopmentCode generation, test writing, documentation
CI/CDAutomated code review, security pattern detection
QATest generation, anomaly detection, pre/post-release comparison
Incident responseLog analysis, related deployment identification, timeline drafting
MetricsAutomated dashboards, weekly summaries, anomaly detection
DocumentationFirst drafts of runbooks/API docs, staleness detection, sync with code changes
ReleasesRollout monitoring, auto-revert on threshold breach, release note generation, stale flag detection

19. Head of Product / Head of Eng Operating Model

Weekly 1:1 (30-45 mins): Roadmap alignment, delivery health, cross-squad prioritisation, people issues, upcoming decisions.

Shared dashboards: Both see same delivery and product metrics.

Joint quarterly planning: Head of Product brings demand; Head of Eng brings supply. Reconciled together.

Decision authority:

DomainFinal Call
What to build, priority orderHead of Product
How to build, technical approachHead of Eng
Engineering allocation (15-25%)Head of Eng
People (hiring, squad composition)Joint
Unresolvable disagreementsEscalate to CTO/CEO (should be rare)