The Week AI Stopped Assisting and Started Running the System

Four developments in a single week show artificial intelligence crossing a quiet but important line — from a tool people reach for to the system they depend on. Here’s what changed, and why it matters.

Every transformative technology eventually reaches a threshold. For a while it’s an add-on: something you switch on when you need it, bolt onto an existing process, point to in a quarterly update. Then, almost without anyone announcing it, it becomes the floor everything else stands on. Electricity did it. The internet did it.

This past week, artificial intelligence crossed that threshold four separate times — inside a Big Four firm’s finance practice, in the everyday workflow of software engineers, on the balance sheet of a once-celebrated consumer brand, and in the decision-making of an emergency room. None of these stories is about AI getting smarter. They’re about AI moving into the systems of record where real decisions get made. Taken together, they sketch the shape of what comes next.

1. PwC didn’t adopt Claude. It rebuilt a practice around it.

Most enterprises approach AI the same cautious way: they take a process that already works, attach a model to one part of it, and call the result a transformation. PwC did something more deliberate. It stood up an entire business unit — the Office of the CFO — designed around Anthropic’s Claude from the ground up, and put it into live production rather than another sandboxed pilot.

The figure Anthropic chief executive Dario Amodei pointed to at launch captures why that distinction matters: insurance underwriting that once took ten weeks now takes ten days. Nothing about the regulatory environment changed. The compliance obligations are identical, the data is the same, the rules haven’t moved. What changed is the system the work flows through. You don’t compress a ten-week process into ten days by making an old workflow faster — you do it by redesigning the workflow so the AI is the spine of the operation rather than an attachment to it. That single design decision is the whole story.

The partnership now spans finance, healthcare, and life sciences, built on Claude Cowork, Claude Code, and the Claude Developer Platform, and powered by the Opus 4.6 and Sonnet 4.6 models. In finance, the work covers liquidity forecasting, scenario modeling, and capital-markets intelligence. In healthcare, it extends to utilization management, care coordination, and revenue cycle. In life sciences, it reaches target identification, regulatory submissions, and manufacturing quality. On top of all of this, PwC is building its own industry-specific skills, connectors, and plugins — with role-based oversight and a human in the loop from the first day, not bolted on later.

The scale is already concrete. PwC certified 30,000 of its U.S. professionals on Claude in the past month, with hundreds of thousands more expected globally. And it isn’t moving alone: three of the Big Four — PwC, Deloitte, and KPMG — are now running Claude in production, while EY has aligned with Microsoft. When the most risk-averse institutions in professional services have already chosen their platforms, the pressure shifts to everyone else. The open question is how quickly the rest of the market can redesign its own systems before the productivity gap becomes impossible to ignore.

2. The quiet death of the status update

The bottleneck in AI-assisted work has shifted, and most teams haven’t named it yet. The doing is no longer the hard part. The explaining is.

Anyone who has worked alongside an AI agent knows the pattern. Claude Code spends an hour chasing down a bug, reviewing a pull request, or reconstructing an incident — and the moment it finishes, the questions begin. What did it find? What actually changed? What’s still unresolved? Where’s the evidence? The work is complete, but a second round of labor has just started: turning that work into something a team can understand.

Anthropic’s new Artifacts feature is built to remove that second round entirely. Instead of leaving the output trapped inside a terminal session, Claude Code can publish it as a live web page — a debugging report, a pull-request walkthrough, a release checklist, an incident timeline. The page is assembled from material Claude has already worked with: the code, the connected tools, the conversation itself. As the work continues, the page refreshes at the same link, with version history attached.

In Anthropic’s own testing, debugging emerged as one of the strongest use cases. Picture an engineer launching an incident investigation before the morning stand-up. Claude works through the logs, flags the suspect commits, assembles a timeline, and publishes a report. By the time the team gathers, no one is waiting for a verbal recap — everyone is already reading the same source of truth.

The larger shift is worth sitting with. As AI agents start to function less like tools and more like teammates, how clearly they communicate begins to matter as much as how well they execute. Artifacts isn’t really about producing prettier reports. It’s about eliminating the collaboration tax that lands after every successful session — the hours teams quietly lose translating finished work into shared understanding.

3. Allbirds sold the sneakers. Now it sells compute.

While much of the world was watching the World Cup group stage, one of retail’s strangest reinventions was unfolding off-camera. The company that built a brand on wool sneakers now trades as Smartbird, and it has set its sights on becoming an AI infrastructure provider.

The pivot is not a side bet. Smartbird has sold off the Allbirds brand and footwear assets outright, and is rebuilding itself around leasing high-performance, low-latency compute to enterprises — the kind of GPU capacity that, improbably, puts it in the same conversation as specialist players like CoreWeave and Crusoe.

The backdrop makes the move look even bolder. After its 2021 IPO, Allbirds was valued at close to $4 billion. By early 2025, its market value had collapsed below $20 million. Then, in April, came the announcement: stop selling shoes, start selling AI infrastructure. Some investors called the decision bizarre and openly alarming. Wall Street saw it differently — the stock briefly spiked 800% before surrendering most of the gain.

On June 17, Smartbird named Nadia Carlsten as president and chief executive, with Joe Vernachio departing both the company and its board. Carlsten brings a serious infrastructure résumé: she previously ran the GPU-compute firm DCAI, helped launch a sovereign AI supercomputer in partnership with NVIDIA, and worked on Amazon’s quantum-computing service at AWS. She joins the board as well, under new chair Lily Yan Hughes. Her read on the rebrand is blunt — she expects the shoes to be forgotten within months, and notes, with some relish, that she’s a heels person who has never worn sneakers.

The strategy points squarely at the mid-market: pharmaceutical companies, financial-services firms, and nations building sovereign AI capacity. These are customers who want single-tenant GPU clusters they can own and control, and who would rather not construct all that infrastructure themselves. The market is engaged — Smartbird doubled its convertible financing facility from $50 million to $100 million, and BIRD shares jumped as much as 50% to $5.92 on volume above 41 million shares, far beyond the norm for a stock its size.

Which leaves the live question hanging: is this a genuine infrastructure comeback, or just another AI narrative dressed up as a turnaround trade?

4. Medical AI grows hands

For roughly two years, medical AI has mostly talked. It reads the chart, answers the question, offers a differential diagnosis — and then a human does the consequential work: ordering the labs, booking the surgery, signing the prescription. A new paper in Nature sets out to close that gap between recommendation and action.

The contrast frames the stakes. In April 2025, Google’s AMIE demonstrated that a conversational AI could outperform primary-care physicians on diagnostic dialogue across 159 structured cases. But AMIE only spoke. The newer system acts.

Published this month by a German research team, MIRA — short for Medical Intelligence for Reasoning and Action — is an autonomous agent that operates inside a sandboxed electronic health record compliant with FHIR and six medical coding standards, including ICD, LOINC, and SNOMED-CT. It does not draft a plan for a physician to carry out. It takes the patient history, orders and interprets labs, imaging, and microbiology, builds a differential, and then prescribes medications, schedules procedures, and arranges admissions. Its action space spans more than 85,000 options across 11 tools.

The team evaluated MIRA on 574 real, de-identified cases drawn from the MIMIC-IV dataset, covering eight emergency-department diagnoses — among them appendicitis, pneumonia, pulmonary embolism, and pancreatic cancer. On a matched subset run under identical conditions, the diagnostic numbers were striking: 87.8% accuracy versus 78.1% for board-certified physicians, and 71.1% for a mixed cohort of residents and attendings. The widest margin appeared in pancreatitis, where MIRA reached 95.2% against 78.6% for the board-certified group.

Diagnosis, though, is the part you would expect a capable model to handle well. The harder test is everything that follows — and that’s where the results become genuinely interesting. MIRA recommended the correct clinical procedures 53.5% of the time, against 38.3% for board-certified physicians. It matched all 124 laparoscopic appendectomies in the data and 90.6% of laparoscopic cholecystectomies. On guideline adherence, it outperformed physicians by an average of 35 percentage points across prescribing categories.

The caveat is the entire game. An accuracy score of 87.8% inside a sandbox is one thing; a live emergency department — with real patients, real consequences, and real prescriptions — is something else entirely. The study is worth reading in full, and worth pausing on a question it provokes: what would you need to see before you’d let an agent like this write an order for an actual patient?

The real contest from here

Pull these four stories together and a single through-line emerges. In each case, AI stopped behaving like an assistant standing beside a human and became part of the system of record itself — the place where decisions are not just suggested but made and executed.

That shift reframes the questions worth caring about. The interesting contests from here aren’t really about capability; the capability is clearly arriving. They’re about trust, legibility, and accountability: whether people can understand and verify what these systems do, and who answers for it when an autonomous agent acts. PwC’s human-in-the-loop controls, Anthropic’s emphasis on making AI work instantly understandable, and the open question hanging over MIRA’s live deployment are all versions of the same problem. The technology has moved into the system. Now the harder work begins — building the trust the system requires.