Carrick: Linux ABI Emulation on macOS — 1,473 Commits in 20 Days

Running Linux binaries on macOS as native processes, via Hypervisor.framework and Rust. From a blank repo to 124k lines, Node.js, PID namespaces, image build, and a Docker-compatible API.

Bryan Cantrill—my former colleague at Joyent, now CTO of Oxide Computer Company—loves to tell a story from Tracy Kidder’s The Soul of a New Machine. At Data General in the late 1970s, the senior engineers on the Eagle project handed a new recruit what they considered an impossible task: build a cycle-accurate simulator. It was a snipe hunt. Something to keep the kid busy. But nobody actually told him it was impossible, so he just went and did it. He came back finished and asked what was next.

I think about that story a lot.

Back in my time at Joyent, I spent my days working on Node.js and debugging core dumps from Linux versions of Node.js on SmartOS using mdb. If you’ve ever had to trace memory leaks four bytes at a time inside a core file, you know the exact mix of patience and stubbornness it requires. That was the landscape when we launched LX Branded Zones—a thin veneer, a system call translation layer, that let unmodified Linux binaries run inside native, lightweight OS-level containers with direct, host-native observability. I was surrounded by people who could build things like that: Bryan, Dave Pacheco, Robert Mustacchi, Joshua Clulow, Jerry Jelinek (who actually resurrected the lx brand), Keith Wesolowski, and so many others—most of whom are now building rack-scale computers from scratch at Oxide. I don’t have their depth of systems engineering experience—I can only imagine what Keith would have to say about this project—but I’ve spent a long time watching them work, and you pick things up.

Fast forward to a Sunday evening in May 2026. I’m sitting at my desk with my MacBook Air—a machine that is, frankly, a monster for what it is—and I’m staring at Activity Monitor watching Docker’s Linux VM hold onto gigabytes of memory and disk. This is a fanless laptop, so it’s not even complaining audibly. It’s just quietly allocating resources to a full operating system I don’t actually need, because I want to run a test suite.

And I thought: what if I just… didn’t need the VM?

The idea isn’t new. The temptation to bypass the virtual machine tax is evergreen. In 2004, Sun Microsystems introduced Solaris Zones, proving that operating system containers could offer near-zero overhead. Branded Zones (BrandZ) followed in 2007 with the original lx brand for emulating the Linux ABI. Joyent was running these containers in production in 2006, years before Docker popularized the container image format and developer workflow in 2013. When Oracle eventually dropped support for the original lx brand, Joyent resurrected it on SmartOS in 2014-2015 so developers could run Docker containers directly on bare metal without any guest Linux kernel or virtual machine overhead.

But that was then, and this is macOS on Apple Silicon.

We’ve seen others try the same idea on different platforms since. WSLv1 translated Linux syscalls to the Windows NT kernel. The archived noah project first brought userspace Hypervisor.framework translation to Intel Macs. FreeBSD continues to maintain its own Linux compatibility layer.

And yet, here we are in 2026, and everyone on a Mac still runs Docker inside a full Linux VM—Docker Desktop, Lima, Colima, UTM. To be clear: Docker is a fantastic tool. I know a lot of people there—there’s even some overlap of folks from the Joyent days who are now at Docker—and they dramatically improved the developer workflow around container images. That contribution is exactly why Carrick itself runs OCI images and uses a Docker-compatible set of CLI arguments. But a real Linux kernel in a VM will probably always be more accurate, because the long tail of syscalls, obscure edge cases, and undocumented behaviors is incredibly long. It’s just that accuracy comes with overhead—a full guest OS in memory, a virtual disk on your filesystem, balloon drivers trying to claw back what they can—and no amount of memory management tuning fully changes that tradeoff. Carrick is an attempt to find a different compromise.

So, armed with my trusty intern—by which I mean AI coding agents—I set off to build a modern-day noah. I both “didn’t know any better” and, frankly, why not?

The first git commit was Sun May 17 20:44:58 2026 -0700: “Bootstrap Carrick runtime foundation.” Today is Sat Jun 6 2026. In twenty days and 1,473 commits, Carrick has gone from a blank repository to ~124,000 lines of Rust across 12 crates, with a conformance suite that gates carrick against Docker on 500+ curated LTP syscall tests, the Go standard-library test suite at parity, 498/507 libuv tests passing, Node.js worker threads running, PID namespaces, image build (carrick build), a Docker-compatible API (carrick serve), a daemonless container lifecycle (carrick run -d), and io_uring support. The peak day was May 20th—144 commits—the day threading arrived and the Big Kernel Lock was born. Two days later it was retired.

Is this project a dead end? Highly probable. But it’s an interesting, highly educational dead end that might still offer real value for high-performance, low-overhead workloads with deep host-level observability.

What It Looks Like

Here’s the short version. You type this:

$ carrick run ubuntu:24.04 python3 -c "import platform; print(platform.machine())"
aarch64

That’s an unmodified Linux CPython binary, pulled from an OCI image, running on your Mac. There’s no VM to boot, no kernel to wait for. The process shows up in your host ps. You can kill it, lsof it, or point dtrace at it. The filesystem and networking are direct—sockets are standard Darwin sockets talking to the host TCP/IP stack, and guest files map straight to host paths. No FUSE daemon, no VirtIO mount syncing.

A cold start takes about ~90 ms. Subsequent forks clock in around ~5.7 ms.

How It Works

The key difference between Carrick and everything that came before is where the translation layer lives. LX Branded Zones required custom illumos kernel modules. WSLv1 was an in-kernel translation subsystem within Windows NT. FreeBSD’s compatibility layer is compiled directly into the FreeBSD kernel.

Carrick doesn’t touch the macOS kernel at all. Instead, it uses Apple’s Hypervisor.framework to trap guest execution entirely in host userspace:

+--------------------------------------+                 +--------------------------------+
|        Linux Guest (ARM64 EL0)       |                 |     Carrick Runtime (Rust)     |
|                                      |  Exception Trap |                                |
|  [ ELF Binary Code ]                 | -------------->|  [ Syscall Dispatcher ]        |
|  [ svc #0 (e.g., epoll_wait) ]       |  (VBAR_EL1 /    |  [ Translates to Darwin kqueue]|
|                                      |   hvc #0 exit)  |                                |
+--------------------------------------+                 +--------------------------------+
                                                                         |
                                                                         | Dispatch
                                                                         v
                                                         +--------------------------------+
                                                         |      macOS Host Kernel         |
                                                         +--------------------------------+

Because we’re running on Apple Silicon, we can configure the CPU to run the guest binary at ARM64 EL0 (unprivileged userspace) while controlling the Exception Vector Table (VBAR_EL1) from our host harness. When the guest executes an svc #0—the ARM64 equivalent of a Linux syscall—it traps instantly into a host exception vector, which issues an hvc #0 to drop control back into the Carrick userspace runtime. The runtime decodes x8 (syscall number) and x0–x5 (arguments), dispatches to a clean-room Rust handler that maps them to native Darwin calls, writes the result back into x0, and resumes the vCPU. The whole cycle happens in userspace. No kernel module, no guest kernel, no VM lifecycle.

(One fun Apple Silicon detail: the guest EL1 vector handler originally forwarded syscalls via hvc #0, but Apple’s Hypervisor.framework silently consumes certain hvc #0 calls as SMCCC when x0 happens to look like a function ID. The fix was switching to hvc #2. These are the kinds of things you only learn by having your syscalls vanish into thin air.)

The Big Kernel Lock—and Killing It

Threading landed on day three (May 20th, 144 commits that day alone). The first implementation was the obvious one: a single global mutex—a Big Kernel Lock—that serialized every syscall across every guest thread. It worked. Everything ran. It was also a complete scalability dead end.

Two days later, the BKL was retired. The replacement is per-subsystem locking: carrick-runtime’s syscall dispatcher wraps its internals—fs::IoState, proc::ProcState, creds::CredState, signal::SignalState, mem::MemState—in separate Mutex and RwLock guards. Syscall handlers accept a SyscallCtx containing narrow borrows of only the state they need. A guest thread reading a socket and another thread writing to the heap execute concurrently with zero lock contention. Fork is a page table copy and an HVF context rebuild (~5.7 ms), about 16× cheaper than a cold boot, with no global serialization.

The no-panic lint gate ([lints.clippy] in Cargo.toml denying unwrap_used, expect_used, panic, todo, and unimplemented crate-wide) means the supervisor never crashes on guest input. Release builds run with overflow-checks = true so arithmetic on guest-controlled integers traps to a contained abort rather than wrapping silently. The combination—no panics, no overflows, per-subsystem locks—is what makes it safe to take the BKL off.

The Conformance Machine

You can’t know what’s broken if you don’t measure. From the beginning, Carrick’s approach has been to use Docker as an oracle: run the same binary under both Carrick and Docker, diff the outputs, and if they disagree, write a probe.

The conformance suite lives in conformance-probes/—298 standalone Linux ELF binaries cross-compiled to aarch64-unknown-linux-musl. Each probe exercises a specific syscall invariant (e.g., “does FUTEX_WAKE(INT_MAX) return exactly N when N waiters are parked on a MAP_SHARED word?”) and produces machine-diffable output. The test harness runs every probe under both Carrick and Docker, diffs the results, and fails CI if they disagree. Today there are 289 owned invariant probes covering 502 curated LTP tests at 100%—every single curated test has an owning probe.

This is deliberately different from running LTP itself. LTP is the discovery oracle—slow, VM-jitter-flaky, count-based, needs a registry. It tells us where to dig. A probe nails the specific behavior so it can never silently regress. The probe is the deliverable; the LTP match count is just confirmation.

The headline baselines:

Go runtime: ~876/880 standard-library test binaries pass — at parity with Docker.
LTP syscalls: 568/896 oracle-validated tests match (63%). Strong: scheduler 76%, timers 74%, signals 73%, filesystem 68%. Weaker: memory management 34%, IPC 38%.
CPython modules: 425/492 regrtest modules match (86.4%). test_subprocess and test_multiprocessing now run — the nested-fork bug that blocked them is fixed.
libuv: 498/507 tests pass (98.2%)—including the entire async event loop, pipe, and IPC surface.
Node.js: the full node-core plan passes 5301/5304 (99.9%) on Node 24 and Node 26. Worker threads create, communicate, and tear down correctly.

Node.js: The Real Test

Getting Go and Python working was satisfying. Getting Node.js working was humbling.

Go is compiled and static—it carries its own runtime and makes opinionated syscall choices. CPython is a dynamic interpreter that mostly delegates to libc. But Node.js sits on V8 (aggressive JIT, pointer-compressed heaps, custom memory hinting) and libuv (an async I/O engine that exercises the absolute limits of your poll, threading, and IPC implementation). If there’s a shortcut in your syscall emulation, V8 and libuv will find it.

They found several:

V8’s madvise expectations. V8 issues MADV_DONTFORK and MADV_DOFORK to control page inheritance across forks. Carrick returned ENOSYS. V8 treated that as a fatal environment violation. Fix: map both to no-op success.
dup(2) returning the wrong fd. Carrick’s dup had an internal floor of fd 3, preventing it from returning stdin/stdout/stderr if they were closed. libuv’s child-process pipe setup does close(0); dup(pipe_read) and expects to get 0 back. We returned 4. The child wrapped a dead descriptor and crashed.
writev discarding partial progress. Carrick’s writev looped over iovecs and, if a later one hit EAGAIN, returned the error—discarding the byte count from earlier successful writes. libuv retried from offset 0, creating an infinite write loop. The fix: return the accumulated byte total on a mid-iovec EAGAIN.
Worker thread teardown. After a V8 worker completed its JavaScript payload, the parent process stayed parked in kevent forever. Three interlocking bugs: scheduler queries (sched_getscheduler/sched_getparam) failed to resolve sibling thread IDs from the ThreadRegistry; exit_group from the main thread didn’t terminate sibling host threads; and an epoll in-memory wake drain left a trailing waiter unreachable.

Each of those bugs had a probe written and committed before the fix shipped.

What Else Landed

The raw numbers tell part of the story—1,473 commits across 20 days, averaging ~74 commits/day—but the feature surface tells the rest:

PID namespaces. carrick run containers now run in a private PID namespace by default (--pid host|private). Guest processes see PID 1 as their init. Orphans are reparented. kill, getpriority, sched_*, raise, and pthread_kill all translate namespace-local PIDs correctly.
Daemonless container lifecycle. carrick run -d backgrounds a container; carrick ps, carrick stop, carrick kill, carrick rm manage it. No daemon process—state lives in a local registry file.
io_uring support. Raw io_uring_setup → mmap rings → submit (NOP, WRITE, READ, READV) → io_uring_enter → reap CQEs, end-to-end. IORING_SETUP_SQPOLL is correctly rejected when no real SQ polling worker exists.
Functional FIFOs. mknod(S_IFIFO) creates a real host FIFO. Opening it wraps a HostPipe so a writer-less open can’t wedge the dispatcher. Bidirectional O_RDWR FIFOs work.
POSIX timers. timer_create/_settime/_gettime/_getoverrun/_delete with SIGEV_SIGNAL delivery.
Real flock(2). Advisory locking forwarded to the host kernel, so cross-process conflicts are real.
memfd_create, sync_file_range, cachestat, openat2, signalfd4. The long tail of syscalls that frameworks probe for and fall over when they get ENOSYS.

The crate structure is: carrick-cli → carrick-engine → {carrick-image, carrick-runtime} → carrick-spec, plus carrick-hvf (the HVF trap engine, 10.8k lines), carrick-abi, carrick-mem (guest page tables, 4.4k lines), carrick-host, and carrick-guest-mem. carrick-runtime is the monolith at ~65,000 lines—the dispatch tables, filesystem backends, VFS, signal routing, and process lifecycle are too coupled to split cheaply, and it’s where the actual translation happens.

So What Actually Breaks?

Emulating the Linux ABI is a treadmill that doesn’t stop. There will always be another ioctl to implement, another obscure socket option, or a nested multithreaded fork race condition. That last one—a forkserver Heisenbug that blocked CPython’s test_subprocess and test_multiprocessing—was the hardest bug of the project, and it’s now fixed. Memory management (34% LTP pass rate) and IPC (38%) remain the weakest subsystems. ptrace(2) is Phase-1 only—no gdb or delve on guest binaries yet. x86_64 Linux binary support via Rosetta 2 now works for dynamic glibc images (Debian, Ubuntu)—the guest’s x86_64 instructions JIT-translate through Apple Rosetta 2; static-musl binaries (Alpine) still hit Apple’s own static-PIE limitation (the same one Docker Desktop’s Rosetta backend hits).

This isn’t going to replace your production Kubernetes cluster. But for running local CLI tools, instant test suites, or inspecting guest execution with host-native observability, it fills a gap that nothing else on macOS does today.

Update — late June 2026

Since this was written, a few big things shipped. Carrick is now public and open-source under Apache-2.0 OR MIT. And it stopped being macOS-only.

The runtime got split along two axes behind a hardware-abstraction layer (carrick-hal), so the same clean-room Linux-ABI translation can drive more than just Hypervisor.framework:

Host / VMM backends: macOS/HVF (the mature, shipping path), plus Linux/KVM, FreeBSD/bhyve, and NetBSD/NVMM under active bring-up.
Guest ISAs: AArch64 on the mature macOS path, plus native x86_64 coming up through KVM, bhyve, and NVMM via a shared carrick-x86 long-mode engine. (That’s a different path from the macOS Rosetta 2 translation described above.)

That work — new VMM backend crates, BSD/Linux host-primitive layers, cross-process futex coherence on bhyve, a real Linux USDT probe path for bpftrace — grew the workspace to 25 crates and ~194,000 lines of Rust, with 332 conformance probes still pinning the 502 curated LTP tests at 100%. Full-suite LTP differential conformance climbed to 75% (671/893 oracle-valid, up from 63%). The non-macOS and x86_64 lanes are source-visible and only partially live; they are not equivalent to the macOS release path. The HAL & platform architecture doc tracks current status.

Try It, Break It, Tell Us

If you want to see what works and what doesn’t, check out the compatibility matrix and the docs. Run your own binaries. File issues. The syscall coverage is transparent—we publish the baselines because hiding them would be worse.

Install it with Homebrew (Apple Silicon macOS):

brew tap carrick-sh/carrick
brew install --HEAD carrick

Carrick can also build images (carrick build) and speak a Docker-compatible API (carrick serve)—see using carrick with Docker.

Nobody told the kid at Data General that building the simulator was impossible, so he just did it. Nobody told my AI agents that implementing epoll on top of kqueue was a bad idea, so they just did it. And twenty days and 1,473 commits later, here we are—still debugging, still stubborn, still tracing things four bytes at a time. Keith would probably call this amateur hour. He might not even be wrong. But some habits don’t change.