Engineering Process — Ways of Working

1. Organisation Structure

Squads

Product Squads (one per product area)

1 PM, 1 Lead Dev, 1 QA, 3-6 Devs
Each owns a distinct product area end-to-end

Platform Squad (one)

1 PM, 1 Lead Dev, 1 QA, 3-4 Devs (platform squads tend smaller)
Owns: CI/CD, shared libraries, AI tooling, observability, cross-cutting tech debt

Staff Engineer (one per org, no squad)

Architecture, code quality standards, AI playbook, cross-squad consistency

PM Discovery & Stakeholders: Discovery, roadmap, and stakeholder management (pairs with Head of Product) PM Analytics & Experimentation: Analytics, experimentation, and feedback loops

For more detail on roles, squad composition, and reporting lines, see Team Structure.

Leadership

Head of Product: Roadmap, prioritisation, stakeholder alignment, final call on what gets built
Head of Engineering: Technical strategy, delivery health, hiring, final call on how things get built

2. How Work Enters the System

Source	Entry Point	Prioritised By
Product roadmap	Squad PM	Head of Product
Customer feedback / support	Squad PM	Squad PM
Sales requests	Head of Product	Head of Product
Internal stakeholders	Squad PM (or Head of Product for cross-cutting)	Head of Product arbitrates conflicts
Engineering-initiated (tech debt, reliability, tooling)	Lead Dev	Head of Eng (15—25% standing allocation, no PM approval needed)
Production incidents	On-call engineer	Severity level determines response (see Section 8)

Rules:

Only the PM can add items to a squad’s backlog
Only the PM can inject work into a sprint mid-flight, after consulting Lead Dev on capacity impact
No one goes directly to a developer with requests
PMs have visibility of engineering allocation work but do not approve or reject it
Head of Product is the escalation point for priority conflicts and executive-driven interruptions

AI usage in intake: AI triages and categorises incoming requests, linking to existing backlog items. AI drafts initial assessments (rough sizing, squad ownership, roadmap overlap). AI synthesises customer feedback from support tickets to surface patterns so PMs work from data, not anecdote. For a full catalogue of AI applications across workflows, see the AI Use-Case Catalogue.

3. Sprint Cadence

Sprint length: 2 weeks

When	What	Who	Duration
Monday (sprint start)	Sprint planning	Whole squad	30-45 mins
Mid-week (weekly)	Refinement	PM, Lead Dev, QA, 1-2 devs	45-60 mins
Thursday	Mid-sprint check	Squad	15 mins
Friday (sprint end)	Demo	Squad (stakeholders invited)	30 mins
Every 4 weeks	Retrospective	Squad only (no leadership)	45 mins
Monthly	Architecture review	Lead Devs, Staff Engineer, Head of Eng	60 mins
Monthly	Product strategy sync	PMs, Head of Product	60 mins
Quarterly	Lightweight planning	Leadership + squads	Half day

Daily standups are not mandated. Squads use async updates. A squad may choose to hold a daily sync — their call.

When: Weekly, mid-week. Refine work 1-2 sprints ahead, never the current sprint.

Attendees: PM (mandatory), Lead Dev (mandatory), QA (mandatory), 1-2 devs likely to build it.

PM prepares:

Problem statement: what is the user problem or business objective?
Evidence: how do we know this is real?
Success criteria: measurable outcomes
Constraints: budget, timeline, dependencies, compliance

The session produces:

Clear problem statement (PM owns)
Agreed high-level approach (Lead Dev owns)
Known risks and open questions
Rough size: small / medium / large
Acceptance criteria (PM + QA collaborate)

AI usage: PM arrives with AI-drafted problem statement and acceptance criteria as starting point. Lead Dev has AI-generated preliminary technical assessment. QA arrives with AI-generated test scenarios. Human value is in the debate, edge cases, and trade-off decisions.

Rules:

PM presents the problem, not the solution
If an item can’t be refined in one session, it’s too big - split or spike
No estimation theatre - no planning poker, no story points
No current sprint work discussed
Only refine what’s likely built in next 1-2 sprints

This refinement process feeds into the broader product process, which describes how ideas move from discovery through to delivery.

5. Sprint Planning

When: First morning of the sprint.

Attendees: Whole squad.

PM prepares:

Sprint goal (one sentence)
Candidate items in priority order (all previously refined)

The session produces:

Sprint goal (one sentence)
Committed items (priority ordered)
Known risks and dependencies
Capacity constraints noted

Process:

PM presents sprint goal and candidate items
Lead Dev facilitates: squad assesses capacity (holidays, on-call, overhead)
Squad pulls items from priority list until realistic
Devs self-assign or Lead Dev assigns
QA confirms they can keep pace - if not, that constrains the sprint

Rules:

If an item isn’t understood, it goes back to refinement - don’t refine in planning
PM prioritises; engineering assesses capacity; the intersection is the sprint
PM does not assign work to individual devs
Commitment is a forecast, not a contract

AI usage: AI suggests realistic item count based on recent throughput history. AI flags potential cross-squad code-level conflicts in committed items. PM uses AI to draft sprint goal before the meeting.

6. Technical Design Review

When: After an item is picked up, before coding starts.

Process:

Dev spends 30-60 minutes thinking about approach
Dev writes it down (Slack message, short doc, bullet points): key architectural choices, data flow, state management, error handling, observability approach
Dev shares with Lead Dev
Lead Dev responds: “good, go” or has a 15-30 minute conversation about alternatives and risks
For cross-cutting or infrastructure work, Staff Engineer is the reviewer instead

Output: Agreed approach, captured on the ticket or as an ADR for significant decisions.

Calibration: 2-minute Slack exchange for experienced devs on straightforward items. Longer, more structured conversation for junior devs or complex work. Lead Dev uses judgment.

AI usage: Dev pressure-tests approach with AI before the Lead Dev conversation (“here’s my plan, what are the failure modes I haven’t considered?”). AI raises the floor; Lead Dev adds codebase history, operational context, and team capability judgment that AI can’t provide.

7. Development & Deployment

Developer Workflow

Dev commits to main behind a feature flag
Small, frequent commits - each passes CI
CI runs automatically: tests, linting, static analysis, security scan, AI code review
Green CI -> auto-deploys to staging -> smoke tests -> promotes to production
Target: commit to production in under 30 minutes

For more on CI/CD pipeline design, container strategy, and deployment principles, see DevOps Principles.

AI usage in development: Code generation, test writing, documentation. AI-assisted code review runs as part of CI, catching style issues, common bugs, security anti-patterns, and deviation from codebase conventions.

Feature Ownership

One developer owns a feature end-to-end
Pairing for complex/risky work (Lead Dev’s call)
Junior devs own their features with closer Lead Dev oversight
Features spanning multiple services: one owner coordinates, others contribute

Pull Requests

PRs are not the default. Code goes to main via CI gates.

PR-like review required for:

Shared platform code / APIs consumed by other squads
Database migrations
Security-sensitive changes (auth, payment, PII)
New joiners (first few weeks, as coaching)

For these: one reviewer, 4-hour SLA, auto-merge if no response.

Post-Merge Review

Lead Devs and Staff Engineer review committed code asynchronously - not a gate, but a learning and consistency mechanism. Concerns become follow-up conversations or new commits.

8. Incident Management & On-Call

Rotation: Weekly, per squad. Platform squad has its own rotation.

Escalation:

On-call engineer triages and fixes/contains
Not resolved in 30 mins -> Lead Dev
Cross-squad -> platform squad on-call
Customer-facing + severe -> Head of Eng

Severity and response:

Level	Definition	Response
SEV1	Service down or critically degraded	All hands, immediate, stakeholder comms within 30 mins
SEV2	Significant degradation, workaround exists	On-call responds within 30 mins, fix during working hours
SEV3	Minor, no user impact	Logged, enters normal backlog

Sprint impact: SEV1 drops everything, sprint scope renegotiated. SEV2 handled from buffer; if >half day, Lead Dev and PM discuss displacement. SEV3 enters backlog normally.

Capacity buffer: 10-15% of sprint capacity assumed lost to unplanned work.

Incident reviews: Every SEV1 and systemic SEV2s, within 3 working days. Max 3 owned, dated actions. Output stored alongside ADRs.

Compensation: Time off in lieu.

AI usage: During incidents, AI assists with log analysis, identifying related recent deployments, and suggesting similar past incidents. For incident reviews, AI drafts initial timeline from alerting data and Slack threads. Over time, AI identifies recurring incident themes across squads.

9. Release Process

Feature Flags

Type	Owner	Purpose
Engineering	Dev building the feature	Hide incomplete work during development
Release	PM decides flip; engineering confirms readiness	Control feature visibility to users
Operational	Engineering	Kill switches for production issues (no PM approval needed)

Release Decision

Lead Dev confirms technical readiness
QA confirms quality (exploratory testing complete)
PM confirms business readiness (docs, support, comms)
PM flips the flag (or requests engineering to)

Async check (Slack/ticket checklist), not a meeting.

Rollout

Default: Percentage rollout - As an example (vary on a case by case basis): 5-10% -> monitor 24-48 hrs -> 25% -> 50% -> 100%
Targeted: User-segment rollout for beta/specific tiers
Trivial changes: Full rollout immediately

Rollback

Feature flag toggle: Immediate (seconds). First line of defence.
Code rollback: Revert commit, redeploy (~30 mins). When flag toggle is insufficient.
Rollback criteria defined before release: “If error rate exceeds X or latency exceeds Y within 24 hours, roll back.”

Database Migrations

Must be backward compatible (old code works with new schema and vice versa)
Deploy migration separately from code
Large migrations in stages using online migration tools
Second pair of eyes required (Lead Dev or Staff Engineer)

Flag Cleanup

Feature fully rolled out or killed -> dev removes flag and conditional/dead code within one sprint
Platform squad reports flags older than 30 days monthly
Flags older than 60 days without documented reason escalated to Lead Dev

AI usage in releases: AI monitors rollout metrics in real-time and auto-reverts flags if thresholds breached. AI generates release notes from commits and tickets. AI flags stale feature flags and associated dead code. For database migrations, AI analyses scripts against current schema for locking, data loss, and backward compatibility issues before human review.

10. QA Workflow

Phase	QA Activity
Refinement	Attends session. Asks about testability and edge cases. Collaborates on acceptance criteria.
During development	Writes automated tests in parallel with dev work. Pairs with devs on testability.
At merge	Owns quality gate criteria. Automated suites run in CI. Exploratory testing for complex/risky changes.
At release	Confirms quality as part of release decision.
Post-release	Owns production quality monitoring - error rates, regressions, anomaly detection.

QA does not have a veto on release. QA has a documented voice - if overruled, the risk is acknowledged and logged.

AI usage in QA: AI-assisted test generation - QA reviews and curates rather than writing every test. AI-powered anomaly detection on production metrics. Automated pre/post-release behaviour comparison. AI-generated test scenarios as starting material for refinement.

11. Cross-Squad Dependencies

Platform squad requests:

“I need this to exist” -> enters platform squad backlog, platform PM prioritises
“I need this unblocked now” -> draws from platform squad’s 15% unplanned buffer

Weekly sync (15 mins): Platform PM + product squad PMs align on priorities. Head of Product arbitrates conflicts.

Squad-to-squad: Requests go through the receiving squad’s PM. Small favours (<half day) handled informally. Large requests become roadmap items.

Staff Engineer is the early warning system - spots cross-squad collisions and raises them proactively.

Shared services: Owned by platform squad with defined API contracts. Platform squad owns backward compatibility and coordinated migration.

AI usage: AI analyses sprint commitments across squads and flags code-level conflicts. Dependency tracking over time surfaces structural problems. Platform squad uses AI to auto-generate API documentation and migration guides when contracts change.

12. Metrics

Collection

All metrics derived from existing tooling. No manual data entry.

Human disciplines required:

Move items through project tracker states accurately and promptly
Tag unplanned items entering a sprint
Tag engineering-initiated items
Tag rollbacks/hotfixes in deployment tooling

Delivery Metrics (Squad Level)

Metric	Source	Target
Throughput	Project tracker (items to “done” per sprint)	Trend over 4—6 sprints
Cycle time	Project tracker (“in progress” to “done”)	Most items within one sprint
Deployment frequency	CI/CD pipeline logs	Daily or better
Change failure rate	Deployment tooling (rollbacks, hotfixes, emergency flag toggles)	Trending down
MTTR	Alerting tool to incident resolution	Minutes (via flag toggle)

Org-Level Metrics

Metric	Signal
Planned vs unplanned ratio	>20% unplanned consistently = systemic problem
Cross-squad dependency wait time	Trending up = platform squad under-resourced
Engineering allocation usage	Consistently surrendered = invisible tech debt accumulation

Product Metrics

Defined per feature at refinement. Measured via analytics tooling. PM owns.

Reporting

Head of Eng: delivery health trends with context (weekly dashboard review)
Head of Product: adoption, outcomes, roadmap progress
Both: quarterly outcome-oriented report to leadership

Do not measure: Individual velocity, story points, hours worked, sprint commitment accuracy.

AI usage in metrics: Automated collection and dashboarding from Git, CI/CD, and deployment tooling. AI-generated weekly squad summaries. Anomaly detection when metrics deviate from trend. Correlation analysis over time.

13. Retrospectives

Frequency: Every 2 sprints (4 weeks). Duration: 45 minutes. Attendees: Squad only. No leadership. Facilitator: Lead Dev or rotating squad member.

Format:

What’s slowing us down?
What should we change?
What’s working that we should protect?

Output: Maximum 2 actions with named owner and date. Reviewed at start of next retro. Same issue 3 retros running with no change -> escalates to Head of Eng.

Cross-squad retro: Quarterly - Lead Devs, Staff Engineer, Head of Eng on systemic issues.

AI usage: Minimal. One application: pattern detection across retros over time (“this theme has appeared in 3 of the last 4 retros”).

14. Responsibilities Matrix

Decision	Who Decides	Who Has Input	Who Has Visibility
What to build / priority	Head of Product / PM	Head of Eng, Lead Dev	Squads, stakeholders
How to build it	Lead Dev / Staff Engineer	PM (constraints), devs	PM
Engineering allocation (15-25%)	Head of Eng	Lead Devs	PMs (informed, not approval)
Tech debt prioritisation	Lead Dev (within allocation)	Devs	PM
Release timing	PM	Lead Dev (readiness), QA (quality)	Head of Product
Rollback	On-call engineer / Lead Dev	-	PM, Head of Eng
Sprint scope change	PM	Lead Dev (capacity impact)	Squad
Hiring	Head of Eng + Head of Product jointly	Lead Devs	-
Squad composition	Head of Eng + Head of Product jointly	Lead Devs	-
Architecture decisions	Staff Engineer / Lead Devs	Devs	Head of Eng
Security standards	Staff Engineer	Platform squad, Head of Eng	All
Cross-squad priority conflicts	Head of Product	PMs, Head of Eng	Squads

15. Onboarding Timeline

When	Activity	Owner
Pre-arrival	Laptop, accounts, tooling, AI tools provisioned	Platform squad
Day 1	Environment setup (target: under 2 hours to build/deploy locally)	Platform squad tooling
Days 1-5	Codebase walkthrough (2-3 hours across week)	Lead Dev
Days 1-5	Read AI playbook, “how we work” summary	New joiner
Days 1-5	Shadow refinement and planning	New joiner
Days 2-3	First real commit (bug fix, config change). PR-reviewed by Lead Dev.	New joiner + Lead Dev
Weeks 2-4	Small refined items. Close Lead Dev involvement in design. Pair on at least 1 item.	Lead Dev oversees
Week 1 end	Check-in: what’s confusing, blocking, missing?	Lead Dev
Week 2 end	Check-in	Lead Dev
Week 4 end	Check-in	Lead Dev
Month 2	Medium items independently. On-call shadowing. Post-merge review replaces PR review.	Lead Dev monitors
Month 2 end	Full squad member at appropriate autonomy level	Lead Dev confirms

AI usage in onboarding: New joiners use AI to explore the codebase (“explain what this service does”, “show me authentication flow”). AI assists first contributions. AI generates personalised codebase orientation based on areas the new dev will work in. Supplements Lead Dev walkthrough, doesn’t replace it.

16. Documentation Requirements

Document	Owner	Location	Updated When
Architecture overview (1 page)	Staff Engineer	Wiki	Architecture changes; verified at monthly sync
ADRs	Dev who made the decision	Repo	At decision time
API contracts	Owning squad	Repo (auto-generated from code)	API changes
Runbooks	Owning squad + QA	Repo	Service launch; after every incident revealing a gap
”How we work” summary	Head of Eng	Wiki	Process changes; annually; onboarding feedback
AI playbook	Staff Engineer	Wiki	New patterns or tool changes
Incident reviews	Lead Dev	Repo / wiki	After every SEV1 and systemic SEV2

AI usage in documentation: AI generates first drafts of runbooks, API docs, and architecture descriptions from codebase. AI keeps API docs in sync with code, flagging or auto-generating updates when endpoints change. AI identifies undocumented services by comparing codebase against existing docs. New joiners query AI as a living, queryable documentation layer.

17. Security Requirements

CI Pipeline (Non-Negotiable)

Check	Action on Finding
Dependency scanning	Block on critical/high; medium/low to backlog
SAST	Block on critical
Secrets detection (+ pre-commit hook)	Block immediately
Container/image scanning	Block on critical

Access Control

Principle of least privilege everywhere
Production DB: read-only for debugging; write access requires Lead Dev approval + logging
Secrets in dedicated tooling (Vault / AWS Secrets Manager) - never in code, env files, or Slack

Periodic Activities

Activity	Frequency	Owner
Dependency audit	Quarterly	Platform squad
Penetration test	Annual (external firm)	Head of Eng commissions
Security incident review	Per event	Lead Dev runs; Head of Eng attends

AI usage in security: AI code review in CI targets security anti-patterns beyond rule-based SAST. AI monitors dependency advisories and cross-references against codebase. AI helps devs write secure code as part of normal workflow. AI analyses access logs during incident investigation. AI generates security documentation (data flow diagrams showing PII locations and movement).

18. AI Tooling Standards

Provision: Every dev, PM, and QA gets AI tooling (Claude Pro/Team or equivalent) as standard.

AI Playbook: Maintained by Staff Engineer. Documents where AI assists and where it doesn’t. Squad members contribute patterns. See the AI Use-Case Catalogue for a comprehensive breakdown by workflow stage.

Platform squad builds internal AI tooling: custom prompts, RAG over codebase/docs, automated code review.

Activity	AI Application
Refinement	PM: AI-drafted problem statements, acceptance criteria. Lead Dev: preliminary technical assessment. QA: test scenarios.
Technical design	Dev pressure-tests approach with AI before Lead Dev conversation
Development	Code generation, test writing, documentation
CI/CD	Automated code review, security pattern detection
QA	Test generation, anomaly detection, pre/post-release comparison
Incident response	Log analysis, related deployment identification, timeline drafting
Metrics	Automated dashboards, weekly summaries, anomaly detection
Documentation	First drafts of runbooks/API docs, staleness detection, sync with code changes
Releases	Rollout monitoring, auto-revert on threshold breach, release note generation, stale flag detection

19. Head of Product / Head of Eng Operating Model

Weekly 1:1 (30-45 mins): Roadmap alignment, delivery health, cross-squad prioritisation, people issues, upcoming decisions.

Shared dashboards: Both see same delivery and product metrics.

Joint quarterly planning: Head of Product brings demand; Head of Eng brings supply. Reconciled together.

Decision authority:

Domain	Final Call
What to build, priority order	Head of Product
How to build, technical approach	Head of Eng
Engineering allocation (15-25%)	Head of Eng
People (hiring, squad composition)	Joint
Unresolvable disagreements	Escalate to CTO/CEO (should be rare)

Guides

Cheatsheets

1. Organisation Structure

Squads

Leadership

2. How Work Enters the System

3. Sprint Cadence

4. Refinement

5. Sprint Planning

6. Technical Design Review

7. Development & Deployment

Developer Workflow

Feature Ownership

Pull Requests

Post-Merge Review

8. Incident Management & On-Call

9. Release Process

Feature Flags

Release Decision

Rollout

Rollback

Database Migrations

Flag Cleanup

10. QA Workflow

11. Cross-Squad Dependencies

12. Metrics

Collection

Delivery Metrics (Squad Level)

Org-Level Metrics

Product Metrics

Reporting

13. Retrospectives

14. Responsibilities Matrix

15. Onboarding Timeline

16. Documentation Requirements

17. Security Requirements

CI Pipeline (Non-Negotiable)

Access Control

Periodic Activities

18. AI Tooling Standards

19. Head of Product / Head of Eng Operating Model

Ask about David's work