Azure AI Studio Architecture: Leveraging RAG and LLM for Chat with SaaS Systems

In this article, we will explore the architecture of an Azure AI Studio built, LLM-enhanced chat with any SaaS system.

What is Azure AI Studio? Azure AI Studio is a comprehensive platform hosted by Microsoft Azure for developing, deploying, and managing AI applications. It offers a suite of tools and services to help data scientists, machine learning engineers, and developers create AI solutions efficiently.

We use a RAG-based architecture, where RAG stands for Retrieval-Augmented Generation. RAG is a technique that combines retrieval-based and generation-based approaches to improve the performance of language models, particularly in generating more accurate and contextually relevant responses.

The diagram below (“Rag Flow”) shows how RAG generates a response to users queries against various data sources. This flow is orchestrated by an Azure AI Studio prompt flow that interacts with any of 53 LLM models that support conversational or chat completion inference tasks. We used GPT-4 with success. Here is a list of such models that are available from Azure AI Studio at the time of this writing:

LLM Models available from Azure AI Studio that support Chat Completion and Conversational Inference Functions
RAG Flow
  1. After a user enters a query through an application front end, the front end calls the deployed prompt flow that was deployed from Azure AI Studio. The prompt flow then invokes a node that leverages the chosen LLM to extract parameters from the user input. These parameters are those that are required by the content system to query the system for the content superset that will be used to answer the user’s query. This varies across each content system, and developers must provide instructions to help the LLM extract the correct parameters.
  2. Content systems are any system that contains content, including structured data in SaaS systems and databases as well as document repositories.
  3. The results from the content system query are then fed into the LLM as it’s grounded data context.
  4. The original query is then processed against the grounding data and returned back to the user.

In one case, we leveraged the above architecture to create a chat experience with ServiceNow. ServiceNow is an IT service management system, supporting the full life cycle of IT tickets and provisioned with a IT system knowledge base. Our aim was to build an interface that allows users of the platform to ask questions against its knowledge base and database of incidents / tickets.

Here is a deep dive into the architecture:

RAG Architecture
  1. SharePoint Authentication & JWT Retrieval: We built the user interface using SPFx that is hosted within SharePoint Online. For us, this is an advantage as we can use the Entra Id-generated tokens to automatically retrieve JSON Web Tokens (JWTs) from an Entra Id App. These JWTs can then be used to authenticate with the API Management Service. We created the Entra Id App specifically to be able to return these JWTs.
  2. API Management Services: We created the API Management Service as a public endpoint that is called by the SPFx web part which accepts the JWTs along with the user query. This service serves as a proxy for the deployed Prompt Flow REST API. The API Management service then connects with the deployed prompt flow using key-based authentication, with they key being stored in an Azure Key Vault.
  3. Prompt Flow: The prompt flow accepts the key-based authentication along with the user query from the API Management Service. The Prompt Flow starts with an LLM node that interacts with a deployed LLM through the OpenAI service. In our case, we chose GPT-4 as our LLM. The LLM, with our node’s instructions parses the user input and creates a series of variables that will be used to query ServiceNow for knowledge articles and incident information.

    The next node makes a REST call to ServiceNow using a service account with the variables extracted from the user prompt. Once the Prompt Flow receives content items back from its ServiceNow REST call, the next LLM node processes the user’s original query against the ServiceNow items. These items serve as grounding data for the LLM node respond to the user query. At this point the response is returned back to the SPFx control.

    NOTE: Using this architecture, results from ServiceNow are not security-trimmed to the user’s access rights. We will publish a separate blog post on how we can interact with the OKTA SSO service to pass end-user credentials to ServiceNow with results being security-trimmed to the end user’s access.
  4. Open AI Service and the Deployed LLM: The OpenAI service is a Microsoft-hosted API which allows Prompt Flows to interact with any LLM in an LLM-agnostic manner. We tried a few different LLMs and settled on GPT-4 as providing the best answers. We then deployed our own GPT4 LLM instance.
  5. ServiceNow: ServiceNow has two main REST endpoints that we use to query knowledge articles and incidents. Each endpoint query method is passed variables extracted from the user prompt such as date ranges, topics, authors, etc. Results are returned as JSON to the calling Prompt Flow node.
  6. Common Azure AI Services: Some common services that we use, including monitoring and a key vault. We use the Azure Key Vault to store our ServiceNow credentials as well as our Prompt Flow key. We use the monitoring service to capture queries and results for quality assurance and tuning the prompt flow.

Leave a Reply

© Copyright 2021 Distributed Logic Corp.

contact via web form  |  info@distributedlogic.com   |    (866) 229-8621

Discover more from Insights into Microsoft AI & M365

Subscribe now to keep reading and get access to the full archive.

Continue reading