In conventional application architecture, client-side components interact with backend services to deliver functionality and value to end-users. Application developers must possess a comprehensive understanding of the application programming interface (API) to facilitate this interaction. Moreover, the application must be adaptable to any modifications introduced in the API, or the API must maintain backward compatibility to accommodate the spectrum of legacy application versions – the longtail app problem.
Theoretical solution: API-less client
This theoretical solution discusses a possibility for a new type of architecture that doesn’t require detailed API knowledge and code on the client’s side but employs LLM agents on the server side to translate vague client requests to API code.
To make it work the app needs to communicate with the server-side LLMs but for this only implements a single API (like de-facto standard OpenAI API).
The client uses Natural Language style vague requests that are processed by a group of agents like Autogen.
Agents utilize multiple things
- RAG (Retrieval Augmented Generation) to find similar questions and cached answers
- Skills to access the API
- Fine-tuning to use the API and effectively answer the client’s request
- RAG to provide context about the API and the stored data
Pros and cons
Pros
- Lowering the bar to integrate with a backend service
- Eliminate the longtail apps problem
- The client requires no or minimal updates to utilize improvements in the service
Cons
- Cost
Currently, inference has a relatively high cost (will get cheaper over time) - Predictability
As the agent workflow generates the desired code on the fly the outcome might vary. This can be mitigated by very straightforward instruction in prompting and well-defined skills. - Latency
This process will introduce multiple compute-intensive steps that will increase latency.
Summary
It’s clear the current state of models and the compute costs are currently not supporting this theoretical architecture, but how far are we?
Search experiences are replaced by custom-generated websites (ARC browser).
Video games use AI to generate frames which is cheaper than rendering the frame on the same device (Nvidia’s DSSL).
Advancements in modeling like Teaching small models how to reason, or GQA (Grouped Query Attention) improve quality and lower the inference cost. Computing techniques like quantization and improvements in hardware also take us closer. Over time the cons will fade away.
And when we’ll get there do we even need apps, when you’ll have a set of experts always available to give you the answer to all of your questions in a format you prefer whenever you need?