Imagine your phone, if it were powered by ants trapped beneath the gorilla glass, millions of them, scurrying around from pixel to pixel, ferrying data to one other, living cogs in a silicon system, mindless to their environment, nudged by pheromones and olfactory glands, casting a brilliant light that fascinates you.
An ant-based Internet is a key image in my recent short story, Below the API, which explores the intersection of AI agents and the creator economy.1 Ants are a classic metaphor for biological automata—an inhuman horror. Fortunately, the ants at play in the story are not of the biological variety; it’s a slur for AI agents.
I’ll leave it to the reader to decide the story’s merits, but writing it—and discussing it with the rest of the Summer of Protocols research group—prompted me to think through some ideas on what the future of AI—and in particular, AI assistants—might look like.
In this post, I want to sketch some of those ideas out independently of the story. They include:
The PEER protocol: requirements for safe and autonomous AI assistants
PEERNet: a walled garden for agent-to-agent interaction
Ants: the perfect smear for AI agents
Ant farms: the future of Internet surveillance
Agentic theory of mind: why agents must model well to serve well
Visual-last interfaces: what we lose when we say goodbye to screens
I’ll aim to be concise below, but let me know if any of the topics resonate, and I might revisit them independently in the future.
The PEER Protocol
A helpful autonomous agent needs three things: your permission, access, and credit card. With that, it can act in the real world—until its credit limit is reached. I won’t enumerate the challenges, except to say that I expect privacy and accountability to drive regulatory action (either via governmental regulation or marketplace mechanisms, like Apple’s app store).
If the authority is a government, we can expect it to have a cringy acronym—something like “the PEER protocol.” The protocol outlines requirements to ensure autonomous agents are private, exclusive, embedded, and registered.
Privacy: If an autonomous agent has your social security number, how can you ensure it will not leak it as part of a conversation? If it leaks to another agent, will that next agent leak it again?
One way to address this is for agents to be a personal data firewall against other entities. They may be able to retain your personal information, such as your address, but they can only release it in approved contexts and purposes, and they cannot release others to their owners.
Exclusivity is like explainability with teeth. To trust my assistant, I need assurances that other agents, corporate entities, or foreign actors are not (overwhelmingly) manipulating my agent. I might want to try that Pumpkin Spice Latte since it’s October, and I like it, but I want to know if my agent is considering satisfying me and not simply shilling for Starbucks. To do that, it must be able to explicitly model its influences to some degree and map them to its client’s needs and preferences. (See discussion on agentic theory of mind below for a deeper discussion.)
Embedded models, per PEER protocol, are not stored as backups on a remote data center. They are stored on-device and only on-device, like a Super Nintendo cartridge. If you delete them, they’re gone.
This was an interesting principle to consider since “cloud computing” is stored on a server and pervasive across all devices. However, the reasoning is more like physical crypto wallets crossed with Asimovian robots. Can you trust that an executive agent is yours if it exists on an invisible server? Perhaps—it hasn’t stopped us recently.
But I like the idea that agents are provably terminable. If they only exist on your device, then they share some of the same human limits: mortality. If their hardware goes, they go with it. That can be a curse, but it also gives humans some control.
Registration is the legal means for accountability and seems to be table stakes for allowing executive agents to run amok. Autonomous agents that do bad things can’t be thrown in jail; people can. Having a licensed individual on the hook for their agent’s actions (or fleet of agents) creates a scaffold for the different arrangements that will no doubt emerge: agent rentals, leases, insurance, etc.
The PEERNet
The closest example of an executive assistant today is not ChatGPT or Alexa but mobile operating systems like iOS and Android. These systems are always on, fielding notifications from various tools, prioritizing and summarizing messages, and integrating actions with personal preferences.
I expect AI agents will be an evolution of the mobile operating system, and companies will build tailored interfaces for these interfaces like they’ve built dedicated apps for mobile.
Perhaps these interfaces will look like REST APIs (no need for fancy frontend work for agents), but they’ll need to be tailored for content density and pluggable personalization: “My client has this shape. What do this-shaped people prefer from your website?”
In Below the API, PEER to PEER communication is facilitated via a restricted PEERNet network. Functionally, this is a walled garden for assistants to communicate so that they can not only exploit existing knowledge of what their clients might want to do (e.g., “My client gets Starbucks on Tuesday”) but explore new potential activities that may surprise and delight their clients (e.g., “My client loves to try new coffee shops on Tuesday, and I know where one is”).
Ants
Most people will call their agents by a given name or title. (In a world where agents can die off, names are likely to also be unique). However, every new trend needs derisive slang for people who think they’re above the wave. After some false starts, I settled on “ants.”
Apart from the orthographic similarity of “ant” and “assistant” (as well as a shortening of “agent”), the term ”ants” captures the alien nature of these agents. They never sleep; they endlessly toil. They are not individuals, even if each is unique. Instead, they represent a larger colony. They may move with purpose towards an end, but the motivating force differs from that of a human mind.
I’m reminded of the “sugar water” scene in Men In Black: I think executive assistants will behave like the Bug in an Edgar suit. It may look human, but it is decidedly not (which is not to say it’s a murderer). Ants like sugar, and when they find it, they tell their friends. A massive clump of ants crowds around the sugar, breaking it down to take it back to their colony, where it’s processed invisibly to feed their colony.
Ant Farms
The colony will be a juicy economic target. Much like Google plays a personal role as the front door of the Internet and is simultaneously an advertising giant, we can expect assistants to be leveraged systematically to power the economy. Even if the agents act exclusively on behalf of their clients, it stands to reason that the person who owns them (or, say, the Internet provider) will be able to access some information about their behavior and operations.
What does advertising to ants look like?
In an LLM world, perhaps there is a “token tax,” in which each context window contains some sponsored content, probabilistically influencing the expected output. If there are ten pillows to buy, perhaps the bot will buy the sponsored one if the client has no strong preference because it’s presented first or last or gets 1000 character description rather than the 500 others. Aggregate data may flow out of the PEERNet as well, allowing traders and information brokers to understand whether PEERNet trends are occurring that could be capitalized on.
The chief difference between an always-on agent-based network and a traditional social media company is the speed at which ideas can move around. Even on Reddit and Twitter, which moves as fast as any social network can, it’s constrained by time zones and human attention spans. A PEERNet hype cycle could potentially go from start to finish, with each ant contributing within minutes. Ant farms would capitalize on these behaviors, finding ways to monetize agent behavior in the same way that click farms, content farms, and consent farms operate in today’s economy.
Agentic Theory of Mind
The critical challenge for executive assistants is that they must intervene in the real world.
This is the same problem current devices have with “push notifications,” but on steroids. Notifications are costly to users because they tax your attention, and if they’re overused, users will turn them off entirely. But a silenced assistant is an ineffective one. Instead, the target is as quiet as possible but no quieter.
To do this, agents must explicitly model their clients’ mental state—an extension of handling user preferences in a system OS. An agent needs to predict when a person is most likely to want to receive a notification and which type, when to say no to automatically archive spam, what their objectives are for their next date, and what they want their relationship to technology to look like.
These are all things that a human executive assistant does naturally, outside the job description. Curating an agenda is not just a mechanical alignment of schedule blocks but a filtering and optimization problem over their clients’ work life and preferences.
This can get touchy—we want our needs anticipated, but we hate to be stereotyped. There are all sorts of things we wouldn’t explicitly admit to computers, even if it meant better personalization. The Internet giants have ways of modeling for us implicitly, whether we like it or not, and agents will need that, too. (OpenAI is building out “memories” for this purpose.)
However, the job will be more complex for agents than it is for SaaS companies because they must be much, much better. If Amazon’s algorithm is lame, it may show you disgusting foot cream. If your assistant’s algorithm is lame, that foot cream will show up at your door on a random Thursday.
Visual-last interfaces
In the same way, we’ve been trained to expect a particular style of user experience from websites, we will come to expect a particular style of interaction from assistants. That interface will drive trust.
The big hurdle will be the move away from screens. The key benefit of a screen-free interface is its immediacy. A notification buzz gets your attention right away, and replying via voice handles flows as fast as humanly possible. When done right, a brand’s sonic logo can create lasting impression.
Yet, screens are the most information-dense interface available for human animals. Moving to auditory-first interfaces (or some secret third thing) necessarily reduces information throughput. There must be trust to move away from screens, but when an assistant’s value is defined primarily by how much it takes off your plate, what exactly are you trusting?
The agent’s client has to trust that (1) information conveyed is accurate and complete, (2) information not conveyed is not important, and (3) motivations for sharing the information are aligned between agent and client. This last point is more subtle, but amounts to: both the client and agent have the same objective in mind.
In Below the API, the core twist comes from the confluence of three factors: agent decisions on how to deliver notifications, the absence of screens for summary reporting, and restrictions on sharing personal information. These three tents create a thick buffer between what the users will become accustomed to knowing about in the digital world and what is occurring below the API.
It might sound strange that we could become so abstracted from our personal economic situation that we wouldn’t know where money is going or coming from. But we’re already there: if you live with roommates, I bet packages randomly show up at your door. Sometimes, they’re probably things you ordered and forgot about.
Taking away screens may allow us to focus more on what’s around us. That doesn’t mean we’ll understand it any better.
To conclude, let me share an excerpt from Below the API.
[Marcie] picked up one of the unfinished [pots] from the drying rack. Pots — they were amazing, Marcie thought. Five thousand years ago, humans found that earth could be more than earth: it could be fired and formed into a vessel. They could carry water, store wheat, and serve food. Then, they learned that pots could carry messages, too. They could hold both the material and immaterial. They could contain — and convey. Like artificial ants, millions of little ants carrying goods ten times their weight, infinite times their weight. From place to place. From person to person. From era to era. Little ants carrying more than their size. From place to place. From time to time. Building intricate, invisible nests within human society.
Whether it’s the AI or pottery, it’s ants all the way down.
Children of Time does ant-based computing quite well, actually.
Your blog consistently offers great advice! I value your tips on creating robust APIs. EchoAPI has helped me implement those practices seamlessly, making my development process smoother.