Framework

Your Personal AI Operating System: Five Layers That Make AI Actually Work

By Guus Witjes · Step Ahead AI · 22 May 2026 · 14 min read

Last week, while I was at lunch, one of my agents worked through seven of our digital marketing process flows. In the format I use myself. With explicit blocks marking what it did not know. And with three observations I had not asked for. I came back, reviewed, adjusted, shipped. It runs because the structure underneath it is built.

Most people never get there. They ask AI to draft something, and what comes back is technically correct and somehow completely off. The reflex is to blame the prompt. Write a better one. Add more detail. None of it really works, because the prompt was never the problem.

This piece is about the discipline that makes autonomous AI work. There are two names for it in the field right now, and they belong together. Context engineering is the work of getting the right information into your AI. Harness engineering is the work of constraining and validating what your AI does. Together, they form what I call a Personal AI Operating System. Personal, because it is yours. AI, because it runs on language models. Operating system, because it is a stack of layers that work together, not a single tool you bought.

Why does AI not do what you want?

AI fills the gaps in your brief with plausible guesses, the same way any colleague would. The problem is rarely the prompt. The problem is the missing context behind the prompt.

Take the colleague metaphor seriously. You ask someone to take over a piece of work. You think you tell them everything. But there are a dozen steps in your head that feel so obvious to you that you forget to mention them. The audience. The thing you tried last year that did not work. The unspoken rule that this client never gets a cold opener. Your colleague does the work, hits a gap, and fills it. Not because they are lazy, because they have to.

AI does exactly the same thing. People call it hallucination, but it is not a bug. It is what any system does when context is missing. The more context is missing, the more it has to invent.

There is a second trap worth naming early. Most people, once they hear the gap-closing story, decide the answer is more context. They dump every document they have into the chat and feel they have done their job. They have not. More context is not better context. If your documents contradict each other, if half of them are out of date, the AI now has to choose between them. And again, it guesses.

Context engineering is therefore not about volume. It is about getting the right essentials in, without contradictions, in a form that is easy for the system to use. Less, but sharper. Aligned, not exhaustive.

Key takeaway

AI does not hallucinate because it is broken. It hallucinates because you forgot to tell it half the brief. Closing the gaps is the work, not writing cleverer prompts.

Why do most AI projects stall before they start?

Most AI projects stall because they jump to automation before the information layer underneath is organised. Digital transformation happens in five stages, and stages one and two are not optional.

I first saw this model from Esther van Popta. She did not claim to have invented it, and I have not traced the original source, but the model travels well because it captures something true.

Stage 1, Digitise. Get knowledge out of your head and out of paper. Into the computer in a form another system can read.
Stage 2, Organise. Put it where it can be found. Folders, tags, structure. Shareable, retrievable, current.
Stage 3, Automate. Now that the inputs exist and are tidy, let a system handle the repeat work.
Stage 4, Streamline. Look at the automated flow and trim it. Fewer handoffs, cleaner edges, less friction.
Stage 5, Transform. Rebuild how the place actually works. Steer on output, not on hours.

Digitise Most are here

Organise Most are here

Automate Skip-ahead trap

Streamline Refine

Transform Everyone wants this

The gap most organisations live in: they want stage 5, they have not finished stages 1 and 2.

Every organisation I talk to wants to be at stage five. Output-driven, AI-augmented, fluid. Almost none of them have finished stages one and two.

That is the gap. People reach for AI at stage three while their information layer is still a mess. They put a chatbot on top of a SharePoint nobody maintains. They train an agent on a knowledge base that contradicts itself. And then they are disappointed.

The model also has a personal version. Many people cannot fully describe what they do all day. Not because they are slacking, but because nobody ever asked them to write it down. Work is built around presence and habit, not around a clearly described output. Stage one of digital transformation, at the individual level, is "write down what you actually do, why, and what it delivers." Most people have not done this. And without it, AI has nothing to anchor to.

What is context engineering and harness engineering?

Context engineering describes what your AI knows. Harness engineering controls what your AI is allowed to do. Together they form a Personal AI Operating System.

Context engineering is the practice of doing stages one and two of digital transformation for your work and your AI together. You describe what you do, the rules that apply, the knowledge you rely on, and the outputs you owe, in a form your AI can read. Once that exists, your AI can act with the right information, instead of guessing.

Harness engineering is the practice of building the control structure around an agent so it can act autonomously without going off the rails. The term comes from an equation the field has settled on: Agent = Model + Harness. The model is the language model, which you do not control. The harness is everything else, which you do. Instruction files. Permission scoping. Sensors that check the output. Skills that encode workflows. Hooks that enforce rules deterministically.

	Context engineering	Harness engineering
Question	What does the AI know?	What is the AI allowed to do?
Output	Identity, memory, knowledge	Skills, hooks, scoped tools
Failure mode	Hallucinates from gaps	Acts beyond its boundaries
Enough for	Chat work, you review	Autonomy while you sleep
Discipline of	Writing things down	Engineering the guardrails

You need both. Context engineering on its own is enough for chat work, where you sit at the keyboard and review every response. The moment you want agents that act for you while you sleep, draft for you, triage your inbox, file your tasks, post your content, you need a harness too. The harness is what makes autonomy safe.

Three things matter most.

One artefact, two readers. You are not building documentation for humans, and you are not building prompts for AI. You are building one body of work that serves both. The voice guide that onboards a new colleague is the same voice guide that keeps the AI on tone.

Boundaries make autonomy safe. A good system is not one where the AI is more clever. It is one where the AI is more constrained. You decide what it may do, what it must do, and what it must never do. Inside those boundaries, you grant real autonomy. This is the shift most people miss.

Act on context, instead of guessing. Less hallucination, less back-and-forth, less re-explaining tomorrow what you already said today. The system gets quieter, and the output gets sharper.

You will see these ideas framed in different vocabulary elsewhere. People also call it Context, Data, Intelligence, Automation and Build. Others call it Personal Knowledge Management, or building a Second Brain. Some call it AI readiness, AI literacy or AI fluency. The naming does not matter much. The principle does. Do not start with the tool. Start with the structure underneath it.

What are the five layers of a Personal AI Operating System?

The five layers are identity, memory, skills, hooks, and tools. The first two are context engineering. The last three are harness engineering. Each layer solves a specific problem, and skipping any of them weakens the rest.

IdentityContext

Who I am, what I do, where to find what.

MemoryContext

Inbox, stable rules, goals.

Context engineering · Harness engineering

SkillsHarness

Reusable workflows.

HooksHarness

Deterministic enforcement.

ToolsHarness

Scoped connections to the outside world.

Build order: inside out. Context first, harness second, tools last.

What is the identity layer?

The identity layer is the document the AI reads first every session. It describes who you are, what rules apply, and where to find what. Most setups call this file CLAUDE.md or AGENTS.md. The name does not matter, the function does.

A working identity file is short. Peer-reviewed work shows large language models use long contexts unevenly, and no study isolates instruction-file length as a variable. "Keep your identity file tight" is an applied inference from that body of research, and it is the same inference Anthropic's own engineering team makes when they recommend compaction and just-in-time retrieval over front-loading everything into the system prompt. In practice this lands somewhere around 200 to 300 lines, with a small set of hard rules, and everything else pushed into referenced files that load on demand.

A weak identity file has only descriptive content. A strong one has five instruction types: descriptive (what is), prescriptive (do this), prohibitive (never do this), explanatory (because), and conditional (when X, read Y).

One warning. Do not generate this file with AI. Research suggests AI-written context files reduce task performance by around 3% and raise cost by 20% or more. The model writes for completeness, not signal-to-noise. Write it yourself, even if the first version is ugly.

What is the memory layer?

The memory layer fixes the fact that AI has no memory by default. Every session is session one unless you build the layer that holds raw notes, stable rules, and goals across sessions.

Without this layer, you keep re-teaching the AI the same lessons. With it, every correction sticks. Raw notes go into an inbox. Once a week you review the inbox and promote anything that has come up three or more times into its own file, with a one-line "why" and a one-line "how to apply." Goals sit in their own folder, so the AI weighs every piece of work against what you are actually building.

What is the skills layer?

Skills are the reusable workflows that govern the repeat work in your life: how you draft a LinkedIn post, how you triage your inbox, how you prepare for a coaching session. Each one captured once, then reused.

In harness language, skills are the agent's playbook. The skill removes the surface where the AI has to figure out the steps. It just runs them. Ten minutes of explaining becomes one command.

Do not build a skill for a one-off task. Build it when you notice you have explained the same workflow three times. That is the signal it deserves to be encoded.

What is the hooks layer?

Hooks are the rules that must always hold. Not "should" but "must." If a rule lives only in the identity layer, the AI follows it most of the time and misses it sometimes. If it lives in a hook, the system enforces it deterministically.

Banlist words, security checks, file paths that must never be touched. Hooks fire automatically. They do not ask. In harness language, hooks are the sensors and the enforcers: they validate output before it ships and they block what should never reach the world.

Hooks should be few. If a rule is "preferred but not strict," leave it in the identity layer as guidance. Hooks are reserved for the things that must hold under all conditions.

What is the tools layer?

Tools are the scoped connections to the outside world: your calendar, your task system, your email, your health data. Often connected through MCP, a standard that lets AI talk to your systems. Each tool is a permissioned capability, and each agent only gets the tools it actually needs.

Permission scoping is one of the most important harness moves. It shrinks the surface for mistakes. A daily briefing agent gets read access to calendar, tasks, and recovery data, and cannot write anywhere. A task agent gets write access to one task list at a time, never both.

Why does the build order matter?

Build the inside first, then connect to the outside. Context first (identity, memory). Then the harness (skills, hooks). Only then, tools.

The reason is simple. If you connect your AI to your task system before you have written down how you work, the AI reads your tasks and immediately makes up its own approach. If you connect it to your email before you have a voice guide, it replies in a tone that is not yours. Tools amplify whatever is underneath them. Connect tools to chaos, and you have amplified chaos.

People skip this all the time. They install five tool integrations on a setup with an empty identity file, and then wonder why the AI feels generic. The order matters.

Key takeaway

Five layers, inside out. Identity and memory are context. Skills, hooks, and tools are harness. Build them in order. Skipping any layer weakens every layer above it.

How do you start building your own AI OS?

You start by writing yourself down. Open an empty document and answer four questions about what you actually do, what is repeat versus fresh thinking, the outputs that matter, and the rules you always apply.

What do I actually do in a typical week? Not the job title. The activities. Example: I review three campaign briefs, I run two one-to-ones, I write one external post, I prep one workshop.
Which of these are repeat work, and which need fresh thinking? Repeat work is where AI helps most. Example: review and one-to-one prep are repeat. The workshop design is fresh thinking.
What are the outputs that matter, and who relies on them? Outputs, not meetings. Example: a weekly content plan that my team relies on, a monthly KPI report that my manager reads.
What rules do I always apply, even when I do not think about them? The unspoken ones. Example: I never publish without a banlist check. I never reply to a client mail before lunch. I always start a brief with the audience.

That document is the seed of your identity layer. The first version is ugly. That is fine. Most of the value is in the act of writing it, because it forces you to articulate things that were vague. After a week of using it, you see what is missing, what is wrong, and what needs to move into deeper layers. You add. You prune. You move on.

From there, the memory layer grows. Every time you correct your AI on something it should have known, that correction is a candidate to write down. Not in a chat log. In a memory file the system reads every session. After two weeks of doing this consistently, you stop having the same conversation twice.

That is your context engineering layer. With just those two layers in place, your chat-based AI already behaves a lot better than what most people experience.

Now the harness comes in. Skills come next, but only for work you do more than once. Hooks come after, and they should be few. Only after this internal stack is in place do you reach for tool connections. By that point you know what you want them for. You are not collecting integrations. You are extending a system that already works.

The honest version of this stage takes weeks, not days. Most people who try it give up early because the gains are not yet obvious. Push through. The compounding starts after a few weeks.

How does this scale from yourself to your team?

Once your own stack works, you carry the same logic into a team. At team level the questions shift from "what do I do" to "what do we share." Voice, conventions, handoffs, quality standards. The rules that hold across people, regardless of who is doing the task today.

The team's identity layer is not the sum of everyone's individual layers. It is the layer that defines what makes the team the team. This is also where most teams discover their first painful gap. They thought they had shared conventions. They had not. They thought everyone knew the brief format. Everyone had a slightly different version. The act of writing the team layer is the act of finding and resolving those small contradictions. It is the meeting nobody wants to have, and the one that matters most.

At team level, skills become assets that are shared. A workflow one person builds becomes a workflow the whole team uses. Onboarding shortens. New hires get the team's skills handed to them on day one, and reach productivity in a fraction of the time.

At organisation level, the same pattern repeats one floor up. Strategy, brand, compliance, the things every team must respect. Most organisations are not ready for this level. Not because they lack the technology, but because they have not done the writing-down work at levels one and two. The companies that move fastest with AI are not the ones with the biggest budgets. They are the ones whose people have already done their own digital transformation, individually and in their teams.

What makes the setup compound instead of rot?

A few architectural choices decide whether your stack compounds or quietly rots over a few months. Strict context isolation, a human-in-the-loop promotion decision, agents scoped to one job, and mistakes that become permanent rules.

Strict context isolation between worlds. I run three worlds in the same workspace: university, side business, personal. Each has its own identity file, its own voice, its own rules. The root identity file points the AI at the right world depending on what I am working on. When the AI is in side-business mode, it does not see university data. When it is in personal mode, it does not see either of the others. This is the guardrail that stops work data from bleeding into external content.

There is research behind keeping each layer tight. Long contexts degrade in three predictable ways, well before you hit the model's window limit.

Lost in the middle. Information buried in long files gets skimmed. The model reads the start and the end carefully. Everything between paragraph twelve and paragraph thirty-six of a long file becomes fuzzy.
Attention decay. The five-hundredth token of a context gets less weight than the fiftieth. The five-thousandth barely registers.
Context poisoning. Irrelevant details crowd out what matters. Put lunch preferences next to work methodology and the model treats them with equal weight.

Practical fix. Put your most critical identity information, your voice, your hard constraints, your three or four non-negotiable rules, in the first part of your identity file. Everything else lives in referenced files that load on demand. Compact, retrieve just in time, keep each layer tight.

The decision to promote stays human. Some setups auto-promote a pattern from the inbox to a stable rule after it has appeared three times. I do not. A script flags clusters, but I decide whether a cluster is a real rule or a coincidence. Auto-promotion gives the AI rules I never wrote, and that is a quiet way to lose control of voice.

Agents sit on top of skills, scoped to one job. On top of around twenty skills sit roughly a dozen small specialist agents. A content agent that knows the side-business voice. A task agent that routes work into the right list. A daily briefing agent that combines calendar, tasks and recovery data into a single morning page. Each agent has its own reading list, not a generic shared one. Each agent gets only the tools it actually needs.

Mistakes become permanent rules. When the AI makes an error or I correct a wrong assumption, that correction does not stay in the chat. It gets written to memory with date, the trigger condition, what happened, and the rule. The system builds immunity to repeating the same mistake.

Without these choices, even a well-built static stack degrades. Static setups fail in three predictable ways: context bloat (adding without pruning), context staleness (trusting cached context that no longer reflects reality), and context fragmentation (rules that contradict each other across files).

What are the nine patterns of a setup that compounds?

The nine patterns are the architectural choices that decide whether your setup gets sharper over time or quietly rots. Use them as a diagnostic: when something feels off, scan the list and find the one you have not done yet.

P1, Progressive disclosure. Your identity file is a routing table, not a content dump. It points to deeper files. The AI loads only what is relevant to the task.
P2, Knowledge files as agent memory. Each agent has its own reading list. The relevant context is pre-loaded as working memory before the agent acts.
P3, Hooks as deterministic guardrails. Instructions are probabilistic, hooks are guaranteed. Anything that must always hold goes in a hook.
P4, Three-layer memory. Raw observations (episodic), stable rules (semantic), goals. Missing any one breaks the loop.
P5, Compound learning loop. Capture during work, scan at the end of the week, promote into stable memory when a pattern repeats and you confirm it is a rule.
P6, Self-correction protocol. Mistakes become permanent rules. Each rule file uses the format: rule, then Why, then How to apply.
P7, Context surfaces. Every place the AI can read from is a context surface. A well-designed surface serves humans and AI both.
P8, Spec-first. Before a multi-session build, write a short plan. Problem, approach, deliverables, acceptance, risks. The agent reads the spec before writing the first line.
P9, Agent tool restriction. Each agent gets only the tools it needs. Less surface, fewer off-task tool calls, cleaner behaviour.

You will not implement all nine on Monday. Use the list as a checklist when your setup feels off.

What changes when your Personal AI OS works?

The aim of a Personal AI OS is to flip the ratio of repeat work to fresh thinking. Without a system, knowledge work is roughly 80% repeat and 20% fresh. With the stack in place, the ratio inverts: 30% repeat, 70% fresh.

Reports that look like last quarter's reports. Emails that read like emails you have sent before. Meeting prep that follows the same pattern every time. The repeat part eats your week. The system absorbs it because it now knows how you do it. Your job shifts from doing the work to directing the work. From player to coach. From writer to editor. From operator to manager of a small team of agents that handle the things you used to do by hand.

This is not hypothetical. I have a social media agent helping draft posts at the university, saving the team around four hours a week, close to two hundred hours a year. I have a triathlon coaching agent handling my training. I have a content engine that runs without me holding it together. I am not exceptional. I just did the work of writing things down, in layers, in build order, and refused to skip stages one and two.

And throughout, the human stays in the lead. Not in the loop. In the lead. The AI is convincing even when it is wrong, so your judgement is still the gate. You set the direction. The system executes. You review and ship.

What can you do this week to start?

You can start your Personal AI Operating System in four weeks, in build order, inside out. One layer per week, starting with writing yourself down.

Week 1, write yourself down. Open an empty document. Answer the four questions above. Get it ugly and incomplete out of your head into a file your AI can read. Expect: a messy first version. That is the point.
Week 2, start a memory inbox. Every time you correct your AI on something it should have known, write that correction in one file. One line, dated. End of the week, scan it. Anything you have corrected three times becomes its own rule. Expect: you stop having the same conversation twice.
Week 3, encode your first skill. Pick one repeat workflow that costs you the most time. Write down the steps as a short standard operating procedure. Give it a name. Expect: ten minutes of explaining becomes one command.
Week 4, add one hook, then one tool. Pick one rule that must always hold and write it as a deterministic check. Then connect one external tool you have a real workflow for. Not five. One. Expect: the system enforces a rule you used to enforce by hand.

After four weeks you have a working stack, end to end. Identity, memory, one skill, one hook, one tool. From there it grows by accretion. Each new repeat workflow becomes a skill. Each new correction becomes a memory entry. Each new hard rule becomes a hook. The setup compounds.

Do not start with the tool. Do not start with the prompt. Start with the structure underneath.

Key takeaway

Four weeks. Identity, memory, one skill, one hook, one tool. From there it grows by accretion. The setup compounds because every correction sticks and every workflow gets encoded once.

If you want a more practical starting point for the skills layer, the BUILD framework covers how to turn one workflow into your first AI team member. The BUILD Playbook is the step-by-step version with copy-paste prompts.

This piece is my own synthesis, shaped by building this stack in practice across three separate worlds of work. The five-stage digital transformation model was first shown to me by Esther van Popta. The context engineering concept and the five-layer setup draw on work by Iwo Szapar. On context degradation, the Stanford "lost in the middle" paper (Liu et al., 2024), NVIDIA's RULER benchmark (Hsieh et al., 2024), and Chroma's Context Rot research (2025) do the actual work. Anthropic's engineering team reaches the same applied conclusions.

Frequently asked questions

What is a Personal AI Operating System?

A Personal AI Operating System is a five-layer setup that combines context engineering (identity, memory) with harness engineering (skills, hooks, tools). It is the structure underneath your AI tools that decides whether they produce useful output or generic noise. Built inside out, it lets you grant real autonomy to AI agents while staying in the lead.

What is the difference between context engineering and harness engineering?

Context engineering describes what the AI knows: identity, memory, knowledge files. Its failure mode is hallucination from gaps. Harness engineering controls what the AI is allowed to do: skills, hooks, scoped tools. Its failure mode is agents acting beyond their boundaries. Context is enough for chat work where you review every response. Harness is what makes autonomy safe.

How long does it take to build a Personal AI Operating System?

The four-week starter stack covers identity, memory, one skill, one hook, and one tool. Most people see real compounding after six to eight weeks of consistent use. The honest version takes weeks, not days. The gains are not obvious in week one or two, which is why most people give up too early.

Why should you not generate your CLAUDE.md with AI?

Research suggests AI-written context files reduce task performance by around 3% and raise cost by 20% or more. The model writes for completeness, not signal-to-noise. Your identity file is the document the AI reads first every session. Write it yourself, even if the first version is ugly. The act of writing it is most of the value.

What tools do you need to build a Personal AI OS?

You need an AI platform that supports persistent instructions, file uploads, and ideally hooks and skills. Claude Code, Claude Projects, ChatGPT Custom GPTs, and Gemini Gems all support the context layers. Hooks and deterministic enforcement require a developer-oriented setup like Claude Code or a custom wrapper. The tool choice comes last, after you have written yourself down.

How is this different from the BUILD framework?

The BUILD framework is how you create one AI team member for one task. It covers the skills layer of the Personal AI OS in depth. The Personal AI Operating System is the full stack underneath: identity, memory, skills, hooks, and tools, scaled from yourself to your team to your organisation. BUILD is one ingredient. The Personal AI OS is the kitchen.

Your Personal AI Operating System: Five Layers That Make AI Actually Work

Why does AI not do what you want?

Why do most AI projects stall before they start?

What is context engineering and harness engineering?

What are the five layers of a Personal AI Operating System?

What is the identity layer?

What is the memory layer?

What is the skills layer?

What is the hooks layer?

What is the tools layer?

Why does the build order matter?

How do you start building your own AI OS?

How does this scale from yourself to your team?

What makes the setup compound instead of rot?

What changes when your Personal AI OS works?

What can you do this week to start?

What is a Personal AI Operating System?

What is the difference between context engineering and harness engineering?

How long does it take to build a Personal AI Operating System?

Why should you not generate your CLAUDE.md with AI?

What tools do you need to build a Personal AI OS?

How is this different from the BUILD framework?

Start with one AI team member