Databricks as a unified platform for Generative AI Development

Databricks is a pioneer in AI innovations for over a decade now and with the rise of Generative AI, it will take a new step in AI innovation. Leveraging a variety of applications and tools inside the Data Intelligence Platform, Databricks empowers to develop and deploy AI models with speed, reliability and full governance all available through one platform. In this blogpost, we will cover the key tools Databricks offers to deploy your generative AI solution and have a deeper focus on DBRX, its latest announcement. Furthermore, we will cover how you can access and manage different LLMs and embeddings through Model Serving, the different ways how you can use, test and deploy the different Foundation Models available on Databricks, how you can customize your generative AI application using Vector Search and how to evaluate your application.

Model serving

You can access and manage the Foundation Model of your choice through Model Serving. Databricks Model Serving provides a unified interface to deploy, govern and query AI models. Its dynamic scalability saves infrastructure costs while optimizing latency performance by automatically adjusting to demand fluctuations.

These models can all be securely customized with your private data. As Model Serving is built on top of the Data Intelligence Platform, the integration of features and embeddings is simplified. The governance and monitoring of these models can all be centrally managed in one place through the Serving UI. Here you can manage permissions, track and set usage limits and monitor the quality of these models. Databricks has implemented multiple optimizations to ensure the best throughput and latency. Endpoints automatically scale up or down, depending on the needs and saving infrastructure costs while optimizing latency performance. Model Serving is designed for high-availability and low-latency production use and can support over 25 thousand queries per second with less than 50 ms overhead latency. The workloads are protected by multiple layers of security to ensure a secure and reliable environment.

The platform supports a range of models:

  • Models available by Foundation Model APIs: these are curated open foundation model architectures that support optimized inference.
    • Pay-per-token: base models that are available for immediate use, like Databricks-DBRX-Instruct, Llama-2-70-B-chat, BGE-Large and Mistral-7B
    • Provisioned throughput: workloads that require performance guarantees and fine-tuned model variants
  • External models: models hosted outside Databricks. Endpoints that serve external models can be centrally governed and rate limits can be established with access control. E.g. GPT-4, Anthropic Claude and Amazon Bedrock models.
  • Custom models: Python models packaged in MLflow format. These can be registered in Unity Catalog or in the workspace model registry. E.G. scikit-learn, PyTorch and Hugging Face Transformer models.

Using the Foundation Model APIs, you can control the inputs and outputs of the model such that they do not contain any hateful or harmful content. Guardrails prevent the model from interacting with unsafe content that is detected. Unsafe content can be detected in one of the following categories:

  • Violence and hate
  • Sexual content
  • Criminal planning
  • Guns and illegal weapons
  • Regulated or controlled substances
  • Suicide & self-harm

On top of this, you can also define some custom functions using Databricks Feature serving for some additional processing, such as filtering sensitive data specific to you company needs.

DBRX

Recently, Databricks released their own LLM named DBRX, available through the Foundation Model API. The model utilizes a fine-grained mixture-of-experts (MoE) architecture with 132B parameters in total, of which 36B parameters are active on any input. DBRX excels in summarization, question-answering and coding and is extremely powerful when using RAG where the accuracy of the retrieved documents is important. Another important feature is its large context window of up to 32 thousand tokens. DBRX is competitive and even surpassing leading LLMs, such as GPT-3.5, Gemini 1.0 Pro and Llama2-70B over different measurements such as inference speed and accuracy over different tasks, such as programming language understanding and math. The performance of answering questions using RAG is extremely high (hown in Table 1).

ModelDBRX InstructMixtral InstructLLaMa2-70B ChatGPT 3.5 Turbo (API)GPT 4 Turbo (API)
Natural Questions60.0%59.1%56.5%57.7%63.9%
HotPotQA55.0%54.2%54.7%53.0%62.9%
Table 1: Accuracy of different LLMs based on natural questions and HotPotQA when model is provided with top 10 passages from Wikipedia corpus

Usage of Foundation Models

Databricks makes it simple to access and build different LLMs. It includes libraries like Hugging Face Transformers and LangChain, enabling the integration of existing pre-trained models as well as other open-source libraries into the workflow. The Databricks platform can then be utilized to fine-tune LLMs using your data.

1.     Hugging Face

The Hugging Face Transformers on Databricks facilitates the scaling of your NLP batch applications and fine-tune models for LLM applications. Hugging Face Transformers is an open-source framework for deep learning. It provides APIs and tools to download state-of-the-art pre-trained models and further tune them according to your needs to maximize the performance. It offers default models selected for different tasks, which makes it easy to get started.

Hugging Face provides a model hub containing many pre-trained models, the transformers library supporting the download and use of these models and the transformers pipelines that have a simple interface for most NLP tasks.

2.     LangChain

LangChain is a software framework designed to facilitate the creation of applications that utilize LLMs and combine them with external data to bring more training context for your LLMs. The LangChain integration helps to:

  • Load and query data from PySpark DataFrame with the PySpark DataFrame loader
  • Interactively query data using natural language with Spark DataFrame Agent or Databricks SQL Agent
  • Wrap Databricks served model as a LLM in Langchain. This allows for example to use DBRX via API

Using the Spark DataFrame Agent, you can analyze data in a new way. When using the agent, you can query and perform different operations on dataframes using only natural language. However, it is important to note that LangChain runs the Python code that is generated by the LLM, meaning that it can cause problems when the question is harmful or malicious.

In this example, we ask a simple question to count the events that took place in January that are present in the dataframe.

Figure 1: Spark Dataframe Agent

A more powerful tool is the Databricks SQL Agent, which lets user interact with a specified schema in Unity Catalog and generate insights on the data using natural language. The Databricks SQL Agent can only query the tables and provide information but does not create tables.

We will use the Databricks SQL Agent to get some information about certain tables and perform some queries, which were given in natural language. There are multiple tables in the schema, which are all connected to each other using foreign keys.
In the first example, we will use the agent to describe a table and get some more information about the columns, such that we do not need to define the whole schema ourselves.

Figure 2: SQL Database Agent description

In the second example, we will ask a more difficult question. Here we will ask in natural language a question where multiple tables (namely the events, tickets and categories table) must be joined together to get the right answer. The connection between the tables is not defined beforehand, such that the agent should find out themselves.

Figure 3: SQL Database Agent query

3.     AI Functions

AI Functions are built-in SQL functions that allow SQL users to

  • Use Databricks Foundation Model APIs to complete various tasks on company data
  • Access external models and experiment with them
  • Query models hosted by Databricks model serving endpoints from SQL queries

Databricks already provides several functions to perform important natural language tasks, such as sentiment analysis (ai_analyze_sentiment()), classification (ai_classify()) and masking (ai_mask()). These general functions can be used in multiple languages, however they are tuned for English and will provide best results in English-only tasks. Currently, the underlying LLM that is used for these AI Functions is Mixtral-8x7B Instruct. If you want to define your own generative AI SQL function or choose another LLM for the task, you can do this by using the ai_query() function.

4.     AI Playground

You can test and compare the different LLMs easily using the AI Playground available in Databricks. Here you can test multiple LLMs together and compare them to each other, based on output quality and tone, the number of output tokens, speed and latency. This way, you can choose the right LLM for your specific use case interactively. In this example, we compare DBRX-Instruct, Llama2 and GPT-3.5-Turbo to each other on the same question, specifically “What is Generative AI?”. Here we can already see that DBRX-Instruct is much faster regarding the generation of tokens and has a low latency. We can also compare the different outputs to each other, and can then choose which output we are most satisfied with given a situation.

Figure 4: AI playground

Vector search

One of the most important competitive advantages in generative AI applications is to customize your generative AI applications with your own data. Therefore, Databricks provides Vector Search, a serverless vector database that is integrated in the Data Intelligence Platform and leverages its governance and productivity tools,  to power your chatbot and RAG applications. Vector Search gives the ability to automatically synchronize your data from source to index, eliminating costly and complex pipeline maintenance. Thanks to it being serverless, it easily scales to support billions of embeddings and thousands of real-time queries per second.

The following key components play an important role in Vector Search:

  • Embeddings represent data and queries in a multi-dimensional numeric space. Embeddings can translate text and images into a numerical vector space, in such a way that words that are related to each other are close in the vector space. You can choose an embedding model that is available in Model Serving to align with your use case.
  • A vector search index is created automatically from a Delta table, including embedded data with metadata.  
  • To retrieve relevant documents for the queries, a similarity search is conducted between the query vector (the question asked) and document vector (the data that is available). By default, Vector Search uses the Hierarchical Navigable Small World (HNSW) algorithm to perform the similarity search and the L2 distance metric to measure vector similarity.  You can also opt to choose cosine similarity, but then you need to normalize your datapoint embeddings before feeding them to Vector Search.

One of the main advantages of Vector Search is its automated data ingestion. Instead of building and maintaining a pipeline where raw data needs to be cleaned, processed and embedded with a specific embedding model to store vectors in the database, Databricks Vector Search is fully integrated in the Data Intelligence Platform, enabling it to automatically pull data and embed it without needing to build and maintain these new pipelines. Vector Search runs on delta tables, such that you can create it directly on a delta table inside Unity Catalog and it is plunged into your existing Databricks ecosystem. With the use of the Delta Sync API, it automatically synchronizes source data with vector indexes. As data is added, updated or deleted, the vector index is automatically updated. Vector Search manages failures, handles retries and optimizes batch sizes to provide best performance and throughput.

You can provide the vector embedding in of the following ways:

  • A source Delta table containing your data in text format. Databricks calculates the embeddings, using a specified model. The text data that should be embedded should be stored in a single column. If you want to embed multiple columns, you should first concatenate these columns into a single column before embedding. As the Delta table is updated, the index is synced with the Delta table.
  • A source Delta table is provided containing pre-calculated embeddings. As the delta table is updated, the index status is synced with the Delta table. In this scenario, you can choose the embedding model of your choice and define the characteristics of it yourself.
  • A source Delta table is provided that contains pre-calculated embeddings. There is no automated syncing with the Delta table, you must manually update the index.

Databricks Vector Search leverages the same security controls and data governance that already protects the Data Intelligence Platform enabled by the integration with Unity Catalog. The vector indexes are stored as an entity within Unity Catalog and leverage the same unified interface to define policies on data. This in contrast to most current vector databases, as these do not have robust security and access controls or require an organization to build and maintain a separate set of security policies separate from their data platform. Every request to Vector Search is logically isolated, authenticated and authorized and data is encrypted at rest and in transit. You can authenticate through a personal access token or a service principal.

Databricks Vector Search is performant out-of-the-box where LLMs return relevant results quickly with minimal latency and no work to tune and scale the database. It is incredibly fast for queries and shows performance up to 5 times better than some of the leading vector databases. Whereas a lot of leading vector databases show good results with a small amount of data and fall short in performance or scalability in production environments, this is not a problem for Databricks Vector Search.

The following requirements are needed to implement Vector Search:

  • Unity Catalog must be enabled
  • Serverless compute
  • Source table must have Change Data Feed enabled
  • CREATE TABLE privileges on catalog schema(s) to create index

Evaluation

The evaluation of a generative AI model is different in comparison to traditional ML models, as the ground truth is not so stringent. However, MLflow provides an API to help with the evaluation of the generative AI models. The evaluation through MLflow can be conducted in one of the following ways:

  • Use the default MLflow evaluation metrics, such as exact-match and toxicity.
  • LLM-judged correctness where an LLM compares the ground truth with the output and gives a numerical score and a justification for that score.
  • Custom LLM-judged metric where you can write a prompt to define an evaluation metric. You will have to define a definition, metrics grading criteria, examples, input and output data.

Besides the MLflow evaluation metrics, Databricks also provides Lakehouse Monitoring built on Unity Catalog. This is a unified data and AI monitoring service that tracks the quality of the data and AI assets. It scans RAG applications’ outputs for toxic or unsafe content. Lakehouse monitoring will maintain profile and drift metrics, let you configure alerts, generate quality dashboards and provide a lineage graph for root-cause analysis. It helps to quickly diagnose errors, such as stale data pipelines or unexpected model behaviour and fully manages monitoring pipelines.

Conclusion

Databricks offers a wide range of tools to develop, deploy and evaluate your generative AI applications. Thanks to the integration on the Lakehouse and Vector Search, it provides a powerful tool to customize your generative AI solution on big data automatically, without having to worry about synchronizing your data or maintaining complex pipelines. Using Model Serving, you can access a wide range of different LLM and embedding models which can easily be compared in the playground such that you can choose the models that meet the needs of your application. The addition of DBRX, the LLM provided by Databricks, offers an additional powerful model which can be used in multiple natural language cases, is extremely powerful in combination with RAG and performs incredibly well on both speed and latency. On top, Databricks also provides AI Functions, such that analysts can easily use generative AI solutions in SQL commands. This way, all the necessary tools to deploy  generative AI applications are integrated in one unified platform.

Pieter Verfaillie

consultant @ Aivix