AiThority Interview with Jon Bratseth, Vespa.ai

By Rishika Patel

AiThority Interview with Jon Bratseth, Vespa.ai

Jon, as the founder of Vespa.ai, can you share the inspiration and vision that led to the creation of Vespa?

Vespa grew out of my teams work on web search - we were competing with Google and the other before we were acquired by Yahoo twenty years ago. Through this work we realized two things: Firstly, when you want to scale to large amounts of data the traditional approach of pulling data out of a database to do something with it just takes too long and completely overwhelm your switches. You need to send what you want to do to the data instead. And secondly, to achieve good quality you need to apply a lot of intelligence to that data.

Putting that together led us to realize we needed to build a platform that allowed us to both distribute and index large amounts of data and perform distributed computations over that data in real time. And once we had built and generalized that platform - which took us a decade - we could apply it to lots of other problems besides search that would benefit from these capabilities, such as recommendation, personalization, ad serving and RAG.

There are some requirements that are a must for these organizations: Different data and workloads must run on separate hardware that is never mixed, and these separate data planes must be under the customer's control, so in their account and VPCs etc. The challenge comes from combining that with providing a managed system that makes it easy for these companies to run their workloads reliably at scale in production, change them and deploy new ones. Vespa ensures that by what we call Vespa Cloud Enclave, where we combine a shared control plane for management with private customer-controlled data planes.

In addition there's of course the general basics of security - encrypted data and communications, minimal privileges, fleet endpoint monitoring, software supply chain security, continuous upgrades and OS patches, red teaming, a bounty program and so on, which we also manage on the systems that we run.

To be honest it hasn't changed all that much as we as developers of a deep platform rely on anticipating developments far in advance from first principles. For example, we started developing support for tensors more than a decade ago, before TensorFlow, from seeing that the need for computing over large structured spaces of numbers would increasingly be valuable and economical.

One thing we did not anticipate was the rapid rise of LLMs, so over the last couple of years we have spent a lot of effort seeing how these can be integrated productively into serious enterprise systems and building out support for those use cases.

You need two things to succeed: An LLM that is sufficiently intelligent to solve the use cases you have in mind, and providing it with the information it needs to do it (RAG). For many (not all) business use cases, applying sufficient intelligence requires using very large LLMs, and those are run most economically by companies that specialize on doing that at scale so you should use those.

The second part, providing the LLM with information it needs, is where most organisations succeed or fail. When we provide search for a human employee, the relevance of the returned information is important, but not critical. Humans constantly absorb information by going to meetings, reading their mail and so on, and if they can't find the information they need to solve a problem they will usually be able to use their already absorbed information, or at least know that there are things they don't know. LLMs are not like that. After their training is done they absorb nothing, and so rely completely on the information we are able to surface at the time when they are solving a problem. In other words, relevance becomes even more important. Search relevance is a field with established best practices, all of which apply to providing LLMs rather than humans with information, and organizations that are successful adopt a variety of these to their needs, guided by evaluations.

We see organizations working on this across many areas across e-commerce, finance, health care and others. What it seems to me that all of the first wave of successful applications have in common is that there is a human in the loop. These systems aren't yet at a stage where we can trust them to make high.stakes decisions on their own without competent human effectively checking their work, and this is most effective when conceptualized as a collaborative problem solving between the human and the system.

1. In the short term, the knowledge that when dealing with text, just using simple text search (with bm25) gives better result than simple vector embeddings, and that combining both vectors and text is superior to either.

2. As people move from proof of concepts to serious enterprise applications, scalability, reliability and security will come to the forefront, which require more comprehensive platforms than the initial experiments.

3. Retrieval methods such as ColBert, which apply tensor computations in more advanced ways to achieve superior results will continue to proliferate as the work we and others are doing to maker these economical at scale becomes more well known.

4. Visual retrieval such as ColPali will continue to increase in popularity and gradually take over document search.

5. As the world transitions to long reasoning as pioneered by OpenAI also in RAG applications using private data, low query latency and high request rates will become important also in internal RAG applications as all the reasoning steps taken by these models will translate to thousands of queries for each problem to be solved.

Previous articleNext article

POPULAR CATEGORY

entertainment

10856

discovery

4851

multipurpose

11265

athletics

11445