greg@gregsplace ~ / writing/index.md

ai / engineering

Shipping apps solo with an AI agent fleet

Agents for throughput, a human as the reliability gate — how one operator ships iOS and Android apps to production.

The setup

Over the last few months I shipped several mobile apps to the App Store and Google Play on my own — NFN Connect, MyPepTracker, myFaxxer, and a couple of internal tools. Not by typing faster. By running a small fleet of AI coding agents and treating myself as the part that doesn't scale: judgment and review.

The fleet is deliberately mixed. Claude (Opus) does the reasoning-heavy work — architecture, gnarly debugging, anything where being wrong is expensive. A second, cheaper model handles the mechanical grunt work: boilerplate, refactors, test scaffolding, the tenth near-identical screen. They run on separate subscriptions on purpose — spend the expensive tokens where judgment matters, the cheap ones where it doesn't.

Division of labor

Reasoning model — plans, designs, reviews, and untangles the hard bugs. The one I trust to be right under ambiguity.
Worker model — runs in parallel across isolated git worktrees, one task per branch, so a dozen independent changes happen at once without stepping on each other.
Me — write the specs, read every diff, run the build, decide what merges. The only thing in the loop that is actually accountable.

The rule that makes it safe: I'm the reviewer

The agents are capable but not reliable the way a senior engineer is reliable, so the workflow is built around that gap. Agents work on branches and never push to main. I read the diff, I run the tests, I verify the claim — "tests pass" gets checked, not trusted. Nothing ships because an agent said it was done. It ships because I confirmed it.

When I catch a mistake worth not repeating, I don't just fix it — I write it down as a durable lesson the worker loads next time. The model stays stateless; the accumulated competence lives in the harness, and the fleet gets quietly better at my particular codebases over time.

Catching the hallucinations

For anything high-stakes I run an audit tournament: several models, each in its own sandbox, all reviewing the same code blind, then a final pass that separates real findings from confident nonsense. Models disagree, and the disagreement is the signal — one model's false positive is usually another model's "no, that's fine." Majority and adversarial checks kill most of the hallucinated bugs before they waste my time.

Where it breaks

Vague specs produce vague code. The agent can't see inside my head; a loose task comes back loose. Writing tight, self-contained specs is most of the actual work.
Confident wrongness. The failure mode isn't "I can't do it," it's "done!" — with a subtle bug and a passing-looking test. This is why the human gate is non-negotiable.
It does not replace knowing the system. I can only review what I understand. The day I stop understanding the code is the day this stops working.

What actually made it work

Not the models — the harness around them. Isolated worktrees so parallel work can't collide. CI that builds and signs on a real Mac runner, so "it compiles" means it compiles. Verification baked into the loop instead of bolted on afterward. The agents provide the throughput; the structure provides the trust.

Agents for volume, structure for safety, a human who reads every diff. Solo doesn't mean unsupervised — it means I'm the only supervisor.

security / governance

Board email the IT team can't read

S/MIME end-to-end with board-owned keys — structural privacy, not policy privacy.

The problem with "just use Google Workspace"

When a company runs on Google Workspace or Microsoft 365, the domain admin can read any email. That's not a bug — it's the product. Admins can delegate mailbox access, run eDiscovery queries, and pull any message in the org. Cloud providers also comply with subpoenas, government requests, and their own internal audit processes.

For most email this is fine. For board communications it's a governance problem. Compensation committee discussions, M&A deliberations, whistleblower concerns, and audit committee findings are the precise categories of communication that should not be accessible to the employee being evaluated, the executive whose pay is being set, or the CTO who manages the infrastructure those emails live on.

The threat model

Admin-level access. Whoever holds super-admin credentials can read any mailbox. In most orgs that's IT leadership — the same people sometimes under board scrutiny.
Cloud provider legal exposure. A subpoena to Google or Microsoft reaches board communications. A subpoena to your EC2 instance needs to come to you first.
Data-at-rest breach. Cloud provider compromise exposes plaintext mail. Your isolated instance with encrypted EBS and no internet-facing data path has a much smaller blast radius.
Insider investigation. If the subject of a board investigation controls the email infrastructure, you have a structural conflict. Architecture should remove that conflict, not rely on the subject's good behaviour.

The architecture

Three components. Each one removes a separate failure mode.

Isolated EC2 instance. Private VPC subnet. No public IP on the mail data volume. TLS-only SMTP/IMAP ingress through a dedicated security group. CloudTrail on. The instance serves only board-tier addresses — it is not the company's general mail server. EBS volume encrypted at rest. Access to the instance itself is IAM-controlled and logged, with reports going to the board chair — not IT.
S/MIME enforcement at the MTA. The mail server is configured to reject unencrypted messages between board member addresses. If a message arrives without a valid S/MIME signature and encryption, it bounces. There is no opt-out. This is a policy enforced in software, not in a handbook.
Board-owned keys. This is the critical part. Each board member generates their own keypair. The private key never leaves their custody. IT receives only the certificate signing request (CSR) — the public portion. IT signs it, returns the cert, and deletes the CSR. The private key was never present. Communications encrypted to that cert are unreadable by IT, structurally, not by promise.

The key ceremony

This step is where most implementations cut corners and break the guarantee. Do it right.

Board resolution first. Formally document the key policy: no escrow, no IT custody, key loss means cert reissuance, departing members' certs are revoked.
Generate on the member's device. Each board member generates their keypair locally. openssl req -newkey rsa:4096 -keyout member.key -out member.csr — or, better, on a hardware token (YubiKey 5 series). Hardware-bound keys are non-exportable by design.
IT receives only the CSR. The signing authority (a private CA running on the same isolated EC2, or AWS Private CA) signs the CSR and returns a cert. The CSR is deleted after signing.
Member imports the cert + their private key into their mail client. Apple Mail, Outlook, and Thunderbird all support S/MIME natively. Once configured, encryption and signing are automatic.
IT can verify the cert is valid. IT cannot decrypt messages encrypted to it.

Where this pattern is already used

This isn't a novel idea. It's standard practice in several sectors, often for legal or regulatory reasons:

BigLaw. Attorney-client privilege requires that privileged communications be demonstrably secured. Large law firms routinely run on-premises or isolated mail infrastructure for client matter communications, separate from their general corporate email.
Financial services. OCC and FRB governance guidance for bank boards creates pressure to segregate board communications. Deal teams at investment banks use isolated environments for M&A communications specifically to limit who has access while a transaction is live.
M&A project rooms. When a public company is in deal negotiations, it's common practice to run board and advisor communications on a separate, access-controlled mail domain for the duration of the deal. Leaks from general corporate email have killed deals and triggered insider trading investigations.
Delaware corporate governance. Audit committee and compensation committee communications carry fiduciary weight. Post-Sarbanes-Oxley, many public company boards formalized communication segregation specifically to reduce the risk of the executive team having visibility into governance discussions.
Defense contractors. Cleared facilities running government contracts often maintain separate cleared networks for board-level communications with cleared directors. SMIME with CAC card or PIV credentials is the federal standard.

What IT can and cannot see

Visible to IT: From, To, Date, Subject, message size. S/MIME encrypts the body and attachments, not the headers.
Not visible to IT: Message body, attachments, any content of the communication.
If full metadata privacy is required: S/MIME supports opaque signing with an inner encrypted layer. The outer envelope carries only routing; the subject line is moved inside the encryption. Fewer mail clients handle this cleanly, so evaluate the tradeoff.
Infrastructure access is separately logged. CloudTrail and VPC flow logs give the board chair an independent record of who accessed the server infrastructure and when. IT access to the instance is not invisible — it's just not the same as access to message content.

The governance value of structural limits

The implementor can prove they couldn't read the mail. That matters when the board is conducting a sensitive investigation, setting executive compensation, or deliberating on a matter where the IT function has a conflict. Policy says "we won't look." Architecture says "we can't." The second one is the one that holds up.

It also protects IT. If there's ever an allegation that the CTO monitored board deliberations about their own performance, cryptographic architecture is the cleanest possible defense. You can't be accused of reading mail you were structurally unable to decrypt.

Board-owned keys, server-enforced S/MIME, isolated infrastructure, audit logs to the chair — not the admin. That's the complete pattern.

incident management

The 3-pager that fixed on-call

Clarified scope, severity levels, and paging rules.

Problem

Ambiguous scope: no one agreed what on-call "owned."
Severity levels meant different things to different teams.
Excessive noise → false pages → responder fatigue.
Runbooks existed but weren't authoritative or consistently used.

The solution (3 pages, one owner)

Scope. Systems and services explicitly in scope; escalations for everything else.
Severity. SEV-1 (customer/business critical), SEV-2 (degraded/functional), SEV-3 (nuisance/ops toil) — with concrete examples.
Paging rules. Pages only for SEV-1 and high-confidence SEV-2 signals. Everything else → ticket + business hours.

Guardrails we added

Golden signals. Latency, traffic, errors, saturation — per service.
Runbook links inline for each common alert.
Comms script. Who says what, where, and when (Slack / status page / email).
Single owner for updates to prevent drift.

Results

~40% fewer pages (noise removed), better sleep, better focus.
Meaningful pages → faster time-to-mitigation and clearer handoffs.
Post-incident reviews improved because severity was consistent org-wide.

Short, living, and owned. That's what made it work.

leadership

From IC to leader: a lightweight mentoring path

How I help strong ICs become calm, trusted incident leaders.

The path

Shadow. Join incident channels as a quiet observer; review post-mortems together.
Co-pilot. Run a small portion (notes, timeline, or comms) with a senior lead present.
Lead a drill. Tabletop exercises with clear injects and measurable outcomes.
Own an incident. Senior leader backstops; feedback within 24 hours.

What we practice

Clarity over certainty. Call the severity with available info, then adjust.
Small batches. One change at a time, explicit rollback plan.
Comms cadence. External and internal updates on a timer, not a feeling.

Artifacts

Incident commander checklist — roles, comms, handoffs.
Runbook skeleton — preconditions, steps, expected results, rollback.
After-action template — facts → findings → fixes → follow-through (owners + dates).

Mentoring is a system: reps, feedback, and a safe runway to try leading for real.

culture

Blameless ≠ consequence-free: making post-mortems stick

Turn incidents into durable improvements without witch hunts or wheel-spinning.

Principles

Blame the system, not the person. Design makes errors likely or unlikely.
Bias to facts. Timeline first, opinions later.
Right-sized fixes. Priority is preventing recurrence, not boiling the ocean.

The template

Timeline. Facts with timestamps and sources (dashboards, logs, comms).
Customer impact. Who, how long, severity.
Contributing factors. Technical and organizational.
Actions. Fix now (days), fix next (weeks), invest (quarter) — with owners and dates.
Follow-through. Review action status weekly until done.

What changed when we did this

Repeat issues dropped because actions had owners and deadlines.
Engineers participated more; psychological safety increased.
Leaders got better signal on where to invest — people, tooling, or process.

Post-mortems pay off when they drive change. That means owners, dates, and visible follow-up.

Writing