Your Backlog Can Be Shared Working Memory For Humans And AI
The most expensive part of a handoff is rebuilding intent.
Not status. Not the ticket number. Not which branch the code is on.
Intent.
What were we trying to learn? Why did the acceptance criteria change? Which assumption was still open? What did the last person discover before they had to stop?
On a real team, that context rarely lives in one place. Some of it is in Jira. Some of it is in Slack. Some of it is in a meeting note. Some of it is in a screenshot attachment. Some of it is in the head of the last developer who touched the story.
Now add AI coding assistants to the workflow.
Claude Code, Codex, Copilot, Cursor, and similar tools can help a lot. But they have the same problem a new teammate has: they only know what they can see. If the real story is scattered across tickets, docs, chat, browser tabs, and memory, the AI gets a thin slice of the work and the team still has to replay the rest.
That is the shared-context problem.
HDD Is Not Just A Planning Template
Hypothesis Driven Development has been around for a while. Barry O'Reilly's Thoughtworks article, How to Implement Hypothesis-Driven Development, frames software work as an experiment: state the hypothesis, define the expected outcome, decide what signal will show whether it worked, run the experiment, and use the learning to decide what happens next.
That is useful because it pushes teams away from order-taking.
Instead of treating a story as "build this thing," the team treats it as "we believe this change will create this outcome, and we will know by seeing this signal."
The part I care about most right now is what happens after the hypothesis is written.
Does the work stay understandable tomorrow?
Can a developer pick it up after lunch without asking someone to replay the last meeting?
Can an AI coding assistant read enough context to help without hallucinating intent?
Can a manager or product lead see the decision path without forcing the team back into status-reporting mode?
That is where HDD becomes more than a planning ritual. It becomes a way to create shared working memory.
The Story Is The Memory
For HDD to work beyond the planning conversation, the story has to carry the pieces people usually have to reconstruct:
- acceptance criteria
- hypothesis
- success metrics
- assumptions
- demo plan
- design notes
- diagrams
- implementation plan
- comments
- attachments
- feedback and retrospection
That list is not valuable because it is documentation.
It is valuable because it lets the next person, or the next AI assistant, recover the state of the work without asking someone to replay it from memory.
The backlog becomes shared working memory for humans and AI.
A Concrete Example
In the Flexion workflow that led me deeper into this, the problem was not "Jira is bad."
The problem was flow recovery.
Teams lost context when work paused, priorities shifted, people rotated, or conversations happened across Slack, Jira, meetings, personal notes, and memory. Handoffs depended too much on whoever touched the story last.
In the demo, an HDD story was given a predictable structure and worked through in phases:
- Define the outcome.
- Prove the outcome.
- Confirm the outcome.
The workflow captured acceptance criteria, success metrics, demo notes, design notes, implementation planning, and open assumptions directly in the story folder.
The important part was not that the team produced more artifacts.
The important part was that the story became resumable.
When the session stopped and started again later, the next step was visible. The assumptions were still there. The plan was still there. The evidence gate was still there. A human could resume. An AI assistant could resume. The team did not need to reconstruct the whole story from scattered context.
That is a different kind of backlog.
Why This Matters More With AI
DORA's 2025 State of AI-assisted Software Development frames AI as an amplifier of an organization's existing strengths and weaknesses. That matches what I see in day-to-day delivery work.
There is a stronger research thread underneath this than "AI needs better prompts." Team cognition research has been studying how teams build the mental state they need to anticipate and coordinate. In a 2021 meta-analysis, Niler, Mesmer-Magnus, Larson, Plummer, DeChurch, and Contractor synthesized 107 independent studies with 7,778 participants. They found that team cognition has an overall positive relationship with team performance, with stronger relationships in conditions that sound a lot like modern software work: high external interdependence, temporal dispersion, and geographic dispersion.
xychart-beta
title "Team cognition is associated with team performance"
x-axis ["Overall", "External", "Time", "Location"]
y-axis "rho" 0 --> 0.5
bar [0.35, 0.41, 0.36, 0.35]
Source: Conditioning team cognition: A meta-analysis, Organizational Psychology Review, 2021. In the graph, "External" means high external interdependence, "Time" means temporal dispersion, and "Location" means geographic dispersion. This is not direct proof that HDD improves delivery. I read it as evidence for the underlying bet: making team context explicit and shared is a real lever, not just a documentation preference.
If the team already has clear context, AI can amplify that.
If the context is scattered, stale, or implicit, AI can amplify that too.
This is why I don't think the leadership question is only "Which AI coding tool should we standardize on?"
A better question is:
If a developer or AI assistant picked up this story tomorrow, would they know the outcome, current decision path, and evidence gate without asking someone to replay it?
If the answer is no, the team has a shared-working-memory problem.
What The HDD Skill Adds
The HDD skill is a guide for keeping that memory useful.
It does not try to turn every story into a large process document. It pushes for the smallest slice that can prove the outcome.
It asks for the hypothesis.
It asks what success looks like.
It records assumptions.
It creates a demo plan early, then keeps it current as the work changes.
It creates a design and plan when the story needs one.
It uses red/green/refactor in the implementation plan so the next person can see the test-first intent, the smallest green change, and the cleanup step.
And when the work changes, it records why the path changed instead of leaving that only in chat.
That last part matters. A lot of delivery risk is not in the code. It is in the unrecorded reasoning around the code.
What I Want Engineering Leaders To Look For
You don't need this workflow for every tiny task.
Use it where context loss is expensive:
- the story is vague
- the work spans multiple days
- more than one person may touch it
- a handoff is likely
- the team needs an AI assistant to help with more than a narrow code edit
- the decision path matters as much as the final diff
Then inspect one active story and ask:
- Could a teammate pick this up tomorrow?
- Could an AI assistant read the real intent?
- Is the success signal visible?
- Are the open assumptions written down?
- Is the next step clear without asking the author?
If not, the issue is not just story quality. It is shared memory quality.
Try It On One Story
Pick one story that is currently at risk of becoming a handoff problem.
Write the hypothesis.
Write the success signal.
Capture the assumption that would make the story fail.
Add the smallest demo plan.
Then ask whether the story is easier for a human and an AI assistant to resume.
If you want to see the workflow, I recorded a short walkthrough of how I use the HDD skill with imdone: https://youtu.be/GE48aDZwfPQ
If you want to try it in a repo, install imdone-cli from npm.
The goal is not more process.
The goal is less replay.
Your backlog should help the team remember what it is trying to learn.