The landscape of GenAI application development is rapidly evolving. We are witnessing a significant shift from simple prompt engineering to the creation of compound AI systems. These systems encompass a complex flow of steps including retrieval, pre-processing, post-processing, guardrails, code interpretation, LLM routing, model ensembling, memory management in a hybrid orchestration (static and dynamic) such as Retrieval-Augmented Generation (RAG) and AI Agent are becoming increasingly prevalent.
This complexity, while empowering, also brings challenges. Developers need to understand each step in the flow during execution to gain insights, debug, evaluate, and improve their GenAI applications. As more GenAI applications transition from proof-of-concept to production, the need for post-deployment observability becomes paramount. This allows developers to monitor the behavior and performance of GenAI applications, respond to user feedback, and debug edge cases.
As highlighted in this blog, the lifecycle for GenAI application development remains consistent, yet it has become increasingly challenging due to the complexity of the systems involved. This diagram serves as a visual guide to navigate these complexities, emphasizing the systematic approach required for successful GenAI application development.
With the prompt flow SDK, you can easily track and monitor the execution of your GenAI application, from the input to the output. You can gain visibility into the intermediate results, measure execution times, and access detailed logs for each function call within your GenAI workflow. You can also inspect the parameters, metrics, and outputs of each AI model that you use in your application . This helps you debug and optimize your GenAI application, as well as understand how the AI models work and what they produce.
The prompt flow SDK supports tracing to various endpoints including local environments, Azure AI Studio, and other OpenTelemetry collectors such as Azure Application Insights. This flexibility ensures that you can integrate tracing with any Python-based code, facilitating testing, evaluation, and deployment across different orchestrations and existing GenAI frameworks with ease.
In addition to local tracing, we also offer a more robust cloud-based tracing on Azure AI Studio, which is a unified platform to build GenAI applications. This significantly enhances collaboration, persistence, and management of test histories.
With cloud tracing, you can gain several key advantages:
In situations where your application encounters an error, the trace functionality becomes extremely useful. It allows you to delve into the function causing the error, assess the frequency of exceptions, and troubleshoot using the provided exception message and stack trace. To get started with tracing LLM application scenarios, refer to the example: Tracing with LLM applicable.
For RAG applications, such as a Q&A chatbot based on expert enterprise knowledge, it's often challenging to debug unexpected results, and determine whether potential improvements reside within the retrieval process or in the LLM's prompt generation.
However, with the newly introduced tracing function, you can effortlessly observe and analyze the retrieval and generation processes for each test case. For instance, you can observe the context that has been retrieved based on the test question and identify the parameters that require fine-tuning for optimal retrieval.
Multi-agent scenarios are frequently used in the context of LLM applications. One example is a framework that offers conversable agents powered by LLMs, tools and human, which can be used to perform tasks collectively via automated chat. This framework allows tool use and human participation through multi-agent conversation.
To get started with tracing in multi-agent scenarios, refer to the example: Tracing with AutoGen. In such scenarios, the trace view becomes an invaluable tool. It allows you to monitor the flow of the conversation and the interactions within the LLM's intermediate auto-calling.
Flex flow is a new feature in the prompt flow SDK that increases adaptability and control over your GenAI application. It empowers you to incorporate your own application into prompt flow for comprehensive batch testing and evaluation. Getting started with flex flow!
With the introduction of flex flow and enhanced tracing capabilities, you can now execute local evaluation runs of your application—now adapted to flex flow—and log the results and metrics directly to the cloud. This ensures that your data is easily accessible for viewing, sharing, and long-term storage. Get started with local run!
Additionally, you have the option to submit the evaluation run on cloud-based compute session, which will record the run and its results on the cloud, facilitating efficient tracking. Get started with cloud run!
With the tracing feature enabled, debugging a failed case within an evaluation run becomes easier. You can delve into its trace view for a detailed examination.
Additionally, flex flow facilitates the integration of any Python codes into your prompt flow, thereby allowing you to capitalize on the robustness of the Python ecosystem.
By default, all prompt flow cloud authoring and testing activate the advanced trace capability, offering developers superior observability and debuggability. In addition, all historical test records are stored in a list, where aids in tracking and debugging with trace.
Having created and rigorously tested your GenAI application, with its quality and performance assured, you can smoothly deploy it to Azure AI Studio, our cloud-based platform for GenAI development. Azure AI Studio provides you with a secure and scalable environment to run your GenAI application and offers you various features to enhance your deployment experience, and the flexibility of integrating with for comprehensive post-deployment monitoring.
In the post-deployment phase, developers often aim to delve deeper into their applications' performance to optimize it further. For instance, you might want to monitor your GenAI application's performance, usage, and costs. In this scenario, the trace data for each request, the aggregated metrics, and user feedback become vital.
This in-depth analysis can be facilitated by trace monitoring, which automatically triggers the collection of trace data for each request. This provides a more detailed level of monitoring and analytical information.
Prompt flow SDK provides a new `/feedback` API to help customers collect feedback from online serving stage. With the Application Insight enabled, the feedback data will be saved to the trace exporter target customer configured.
Last but not least, some other features in Azure AI Studio prompt flow to accelerate GenAI application development include:
The new compute session allows you to set up the cloud compute resource in seconds for a quick authoring and testing.
Two enhancements have been implemented to improve the scalability of our batch runs. Firstly, we have eliminated the 1,000 data record limit, allowing batch runs to process larger datasets within a 10-hour duration. Secondly, we have introduced the ability to resume from where your previous run was interrupted, which can be done using the "pf run create –resume-from" command.
To facilitate the online serving, you can also use fastapi, a modern and fast web framework, to serve your GenAI application with high performance and reliability.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.