I’ve been using the internet for more than 2 decades. Websites have evolved from static HTML and CSS to rich, interactive experiences. But the core navigation and usage of the web has stayed the same - we all mostly use Google to discover and browse information.
I think that the next version of the web will be based around large language models (LLMs). I browse the internet to perform tasks - examples are learning a new concept, or researching the best daycare for my kids. With the current web, a single task may require me to spend hours visiting hundreds of webpages.
LLMs improve this experience. With ChatGPT, you type in a question and get an instant answer, no navigation needed. But I find myself using LLMs as a supplement to browsing the web, not a replacement. Chat isn’t as efficient as a dedicated interface, and there isn’t an open way to integrate information sources and APIs into LLMs.
The solution is something I call the AI web. The goal of the AI web is to give you a local LLM agent that can help you complete tasks. This agent will be analogous to a browser today. It will automatically create interfaces for your tasks, and pull in information as needed.
In this post, I’ll outline how the AI web could work, and how it will be more efficient than the current web. I’ll also outline risks, limitations, and potential future directions.
Let’s break down a single task you might use the internet for - learning how to make homemade pizza:
This task, even though it’s simple information retrieval, takes several steps with scrolling and information overload. If you want details, you have to search multiple times in different browser tabs. It’s no wonder ChatGPT has become popular quickly.
We can use ChatGPT to perform the same task:
With ChatGPT, we can do this in 2 steps, with some caveats:
- I have limited ability to dig deeper. For example, I can’t click on each recipe step to get more information.
- Everything is in a long chat message. I can’t check off recipe steps as I go, or view only the ingredients I need for each step.
It’s more efficient than using a search engine, and less efficient than a dedicated interface, like a recipe app. But the dedicated interface is single-purpose, not customizable, and requires installation and registration.
With the AI web, we can get the benefits of a dedicated interface with the flexibility of an LLM. An LLM agent running on your computer, as part of your browser, will create a custom interface, and pull in the right data from APIs.
Let’s take a look at what a task will look like using the AI web. Here’s a rough mockup I made by prompting ChatGPT then adapting the code:
- Bread flour (3 cups) - Dry yeast (1 package) - White sugar (1 teaspoon) - Salt (2 teaspoons) - Olive oil (2 tablespoons) - Warm water (1 cup - Cornmeal (3 tablespoons) - Tomato sauce (1/2 cup) - Mozzarella cheese (1 cup) - Toppings (as desired)
- Preheat oven to 450 degrees F (230 degrees C).
The preview above isn’t polished, but shows the potential for AI to create custom interfaces. I created 4 Svelte components, then used the prompts below to lay those components out into an interface:
You're a UX designer figuring out how to design an interface for a task. The interface will be designed using Svelte. The components you have available are: [list of components] Each component has a key that helps the system parse the interface. Here is an example interface layout: <Container> <Button key="next-button">Next</Button> <Textarea key="content"></Textarea> </Container> Design an interface that will help someone cook a homemade pizza. It should enable further investigation by the user if they have questions about a step in the recipe. It should clearly show the ingredients and each step, and enable moving between steps.
These prompts could be customized to give you more control over the layout. The components and styles could also be themed for more personalization.
In a live interface, you’ll be able to click on the buttons. Each button click will be handled by an LLM. We pass in data about the task, and the information on the screen. The LLM then re-renders the information.
Here’s a prompt showing how that could be done:
You're an assistant helping a someone complete a task on their computer. They're currently making a homemade pizza. Here are the elements on their screen: - Ingredients: [list ingredients here] - Current step: Preheat oven to 450 degrees F (230 degrees C). They have just clicked the next button, which will advance the recipe to the next step. Generate the data that should be stored in the ingredients and current step sections.
Instead of hard-coding button actions, we use an LLM to respond in real-time based on the current user session. In this case, the LLM can create the next step without calling an external API.
I hard coded a list of fake daycares, but in a live scenario, you’d want to be able to look them up. This is where the browser would need to connect to external APIs. I’d see this happening through manifest servers, which will index APIs on the internet. They’ll be analogous to search engines - they’ll help you find the right APIs to complete a task.
You will be able to specify which manifest servers you want your browser to query. The LLM then queries your specified manifest servers for APIs that answer a specific question:
For this query, we might get the APIs for Yelp and Google Local Search. This same process can be applied to access APIs that take actions, like bank APIs.
Users will also be able to directly specify certain APIs in cases where authentication is needed, or specific services need to be accessed.
In the current web, most state is stored remotely, and then either synced to a browser cache, or rendered into HTML. With the AI web, most state can be stored locally. For example, if we want to make a list of our favorite daycares, we can store that state locally.
Since the LLM agent is in control of the interface, it will understand what each piece of state means. We can then use the same state in other tasks down the road.
This will prevent user data from being locked in to single services. Some centralized data storage will still be needed, such as to store social media likes (although moving to a peer to peer model could remove this need, see below).
Let’s talk about how this model changes how we build websites. In the current internet, websites have 3 layers:
- Presentation - the interface that the user sees
- Information - information that is stored and/or pulled from a database to render the interface
- Action - actions that impact the outside world, like transferring money from your bank account
An LLM agent can replace most of the presentation layer and some of the information layer:
- Presentation - most interfaces can be generated by an LLM, although some complex interfaces (like JupyterLab) will still need to be server-rendered
- Information - LLMs can directly generate information, removing the need for some of this layer. LLMs can store some information locally (like storing our daycare favorites), and then retrieve it later.
Most websites will become APIs that interface with LLMs to drive real-world actions (like a banking API).
Having a local LLM allows for everything to be user-driven:
- A user specifies a task
- An agent identifies the correct information sources, and the right interface for the task
- The interface is rendered locally
- The agent connects to any APIs that are needed for information and actions
It could be interesting to add peer to peer connections, to enable information exchange directly between LLM agents. For example:
- Contact an LLM agent directly to book a meeting on someone’s calendar
- Look up information about a person by querying their agent
- Look up how someone else has interacted with certain data (for example, if they bookmarked a web page)
This could lead to a web where most information is queried peer to peer instead of from centralized servers.
The AI web breaks some of the nice parts of the web, like discoverability and hyperlinks. Since backend services are APIs, there’s nothing connecting one API to another. This could be resolved by:
- Using some kind of deep linking schema - this would enable linking to API content directly
- Having the agent be both a traditional web browser, and an LLM agent. That would enable it to open regular links as well as have a task-centric AI web view. I think this will be necessary as the AI web and the current web coexist.
This proposal will make the risks associated with superintelligent AI higher - the information you access will be directly filtered by an AI, and the AI will be able to easily communicate with APIs and other AIs.
Given how much of the backend of websites is already or soon will be LLM-driven, I’m not sure how much of a risk this is. The information will be touched by LLMs at some point either way. I think having local agents versus one centralized AI actually lessens the risk. Each individual AI will interact with fewer people and have fewer parameters than a large centralized AI.
An LLM agent also means less transparency around where information is coming from, and how different APIs are being used. Implementing an audit trail could help with this, so it’s easy to inspect what the AI has done, and where each piece of information in the interface came from.
There are ambitious ideas here that will take a long time to fully realize. But many of the ideas can work with the current web as is. If we built a task-centric LLM agent browser, it could use existing catalogs of APIs (like RapidAPI) to identify the right information sources. Gorilla is an early example along these lines.
I think that this vision for the AI web has the power to make the web much more user-friendly. I’m excited to see how we progress and develop these ideas. If you’re working on anything along these lines, please reach out - I’d love to hear from you.