Secure, scalable and fast execution of AI generated code with Azure Container Apps Dynamic Sessions
TLDR: Azure has a new service that helps you solve complex problems by dynamically generating and executing AI generated code. Here a walkthrough on how to integrate that with Streamlit, Langchain, Azure OpenAI and Azure Container Apps.
If you have not been living under a rock you probably built a couple intelligent applications leveraging Generative AI and learned that you can achieve amazing results when feeding a well defined prompt to a large language model to ask it to solve a complex problem based on not only the embedded knowledge but also grounded on your own data.
Naturally you have also been using coding assistants like GitHub Copilot to support you in the implementation of these apps. But what if you could go a step further and have the model dynamically generate the code needed to solve an abstract problem and execute it right away on top of custom data to return only the result to you and not the give you just the puzzle pieces?
Technically that is of course already possible but leaves you with an operational challenge to securely execute that generated code and prevent it from breaking the app itself, leaking data between sessions or causing noisy neighbour problems when handing over the prompt to an end-user. With the launch of the preview of Dynamic Sessions there is now an interesting service that helps you to overcome that challenge and I want to show you how it can be used to build your own AI based code execution agent.
For the scope of this project I want to show you how you can build a large language model enabled application that can take files and abstract tasks in human language for which the model will generate a plan to solve it to generate, execute and iterate with custom python code until the objective is achieved. At the end the model can send you the result back in the form of a file or a response.
So what is the new service about? In short the new Azure Container Apps dynamic sessions provide fast access to managed secure sandboxed environments that are ideal for quickly spinning up a runtime for code or applications that require strong isolation from other workloads.
The following architecture pattern might be relevant, if any of the following requirements are applicable for your scenario:
- You need to execute AI generated or untrusted code on the fly
- You need to isolate tool execution for different user sessions
- You need to provide dedicated compute/file resources for a session
- You need to scale out compute sessions in less than a second
- You need to prevent generated code to connect to the network
By the end of the blog post you should have something that can solve all of this. If you are like me and want to see the code first look in my GitHub repo.
The whole solution is made up from multiple components and unless you are already familiar with all of them I want to introduce you to a couple of basic concepts from Streamlit, LangChain, OpenAI and Azure Container Apps Dynamic Sessions that are relevant to make the pieces work together.
Lets start from the frontend. I have decided to use Streamlit because it helps you create a simple web application quickly and especially relevant for the integration for generate AI it provides you with easy hooks on how to connect user interface elements, user session state and hooks for triggering code execution.
In particular we need to generate a unique session id that we will keep connected with the conversation flow, the state store that is persisting the files uploaded and the session id for executing code. Ideally you have a user id from an authenticated session that solves this for you but to keep things simple. The session id for Azure Container Apps dynamic sessions needs to follow a specific format which is why I am using the code below to create it and keep in the Streamlit session state:
def get_session_id() -> str:
id = random.randint(0, 100000000000)
return "00000000-0000-0000-0000-" + str(id).zfill(12)
if "session_id" not in st.session_state:
st.session_state["session_id"] = get_session_id()
print("started new session: " + st.session_state["session_id"])
st.write("You are running in session: " + st.session_state["session_id"])
This session id will also be passed on to the Dynamic Session management API which needs also the management endpoint of the dedicated Azure resource to be made available in the form of a LangChain tool.
from langchain_azure_dynamic_sessions import SessionsPythonREPLTool
pool_management_endpoint = os.getenv("POOL_MANAGEMENT_ENDPOINT")
repl = SessionsPythonREPLTool(
pool_management_endpoint=pool_management_endpoint,
session_id=st.session_state["session_id"])
The next problem we need to solve is how to authenticate from inside the app towards the Azure Container Apps Dynamic Sessions control plane API (to create new sessions, download/upload files from/to a session) and the Azure OpenAI account (to create completion requests with the managed OpenAI model). The assigned managed identity in the Azure Container App running the Streamlit app will be assigned two permissions required for making this work:
// Cognitive Services OpenAI User
var openAiUserRole = subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd')
resource openaiuser 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
scope: openai // Use when specifying a scope that is different than the deployment scope
name: guid(subscription().id, resourceGroup().id, principalId, openAiUserRole)
properties: {
roleDefinitionId: openAiUserRole
principalType: 'ServicePrincipal'
principalId: principalId
}
}
// Azure ContainerApps Session Executor
var sessionExecutor = subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '0fb8eba5-a2bb-4abe-b1c1-49dfad359bb0')
resource sessionPermissions 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
scope: dynamicSessions // Use when specifying a scope that is different than the deployment scope
name: guid(subscription().id, resourceGroup().id, principalId, sessionExecutor)
properties: {
roleDefinitionId: sessionExecutor
principalType: 'ServicePrincipal'
principalId: principalId
}
}
Of course we want to avoid using shared keys and secrets when operating the app in the cloud but we still need a solution for local testing. To make this work the azd (Azure Developer CLI) will retrieve the OpenAI API Key and make it available in the local .env file but the bicep templates will not configure the same environment variables in the cloud. You also need to assign your own identity the same permissions to your own Entra Id account to support local execution.
In the code we are implementing a switch to authenticate from LangChain’s AzureChatOpenAI object to the hosted Azure OpenAI Endpoint.
llm: AzureChatOpenAI = None
if "AZURE_OPENAI_API_KEY" in os.environ:
llm = AzureChatOpenAI(
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
azure_deployment=os.getenv("AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME"),
openai_api_version=os.getenv("AZURE_OPENAI_VERSION"),
temperature=0,
streaming=True
)
else:
token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
llm = AzureChatOpenAI(
azure_ad_token_provider=token_provider,
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
azure_deployment=os.getenv("AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME"),
openai_api_version=os.getenv("AZURE_OPENAI_VERSION"),
temperature=0,
openai_api_type="azure_ad",
streaming=True
)
Also notice that we pass on streaming=True to enable the streaming of LLM responses to the user interface as soon as the first Tokens are processed. That has a very positive effect on the user experience and is natively supported by Streamlit. To ensure (relatively) predictable behaviour of the model we also set the temperature to 0 to control the randomness of the output and generate deterministic results.
The core part of the scenario is about the usage of tools which extend the capabilities of the model to interact with existing local code, remote APIs and databases. This works by defining the name, parameters and the expected output for each tool in the expected format along with a detailed description of what the tool and do and when it is supposed to be used. All that will be passed along with the first user prompt and enables the model to choose if and which tools can be leveraged to solve a problem.
There are several popular AI orchestrator frameworks out there and I selected LangChain for this project because it has a rich community for various components and building blocks. Among them is also a package for connecting to Azure Dynamic Session Service . This enables the model to prepare the parameters to execute tool, have your app execute the tool and afterwards evaluate the output in the subsequent model call to answer the original user question with the new input from the tool and the original prompt.
In the same way the Dynamic Sessions service is announced in the prompt as a tool that the LLM can use to execute generated code. This is how the SessionsPythonREPLTool is being described towards the LLM:
class SessionsPythonREPLTool(BaseTool):
"""A tool for running Python code in an Azure Container Apps dynamic sessions
code interpreter.
Example:
.. code-block:: python
from langchain_azure_dynamic_sessions import SessionsPythonREPLTool
tool = SessionsPythonREPLTool(pool_management_endpoint="...")
result = tool.invoke("6 * 7")
"""
name: str = "Python_REPL"
description: str = (
"A Python shell. Use this to execute python commands "
"when you need to perform calculations or computations. "
"Input should be a valid python command. "
"Returns a JSON object with the result, stdout, and stderr. "
)
We need to announce one additional tool to enable the LLM to map the intent of the user to download a file from the Dynamic Session to a command that we can execute against the Azure service by annotating our python function with the “@tool” annotation and providing an understandable description of the tool in the docstring like this:
@tool
def download_file(filepath: str) -> str:
"Download a file from the given path to the user and return the file path"
filename = os.path.basename(filepath)
print("Downloading file:-", filename + "-")
f = repl.download_file(remote_file_path=filename)
st.download_button("Download file", f, file_name=filename)
return "Sending file: " + filename
In the function we will trigger the operation of the SessionsPythonREPLTool to download a remote file into a BufferedReader and forward it to the Streamlit download button.
Finally we are aggregating all the tools in the required syntax and combine them with a system prompt that will feed the parameters, prompt template and guidance to the tools into the LLM with the user question als input:
pool_management_endpoint = os.getenv("POOL_MANAGEMENT_ENDPOINT")
repl = SessionsPythonREPLTool(pool_management_endpoint=pool_management_endpoint, session_id=st.session_state["session_id"])
@tool
def download_file(filepath: str) -> str:
"Download a file from the given path to the user and return the file path"
filename = os.path.basename(filepath)
print("Downloading file:-", filename + "-")
f = repl.download_file(remote_file_path=filename)
st.download_button("Download file", f, file_name=filename)
return "Sending file: " + filename
tools = [repl, download_file]
promptString = """Answer the following questions as best you can.
You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
If there is a reference to a file use the following file in that location: {file_path}
Your work directory for all file operations is /mnt/data/
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}
"""
prompt = PromptTemplate.from_template(promptString)
agent = create_react_agent(llm, tools, prompt)
agent_executor = agents.AgentExecutor(
agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)
As a convenience function we are also remembering the last uploaded file in a dedicated session variable that we can feed into the prompt to make the reference more clear to the LLM when the user is talking about a file. Finally we will tie the Streamlit chat input to the execution of the agent_executor function with the prompt, the file_path variable and the callback to write back the response.
if prompt := st.chat_input():
st.chat_message("user").write(prompt)
with st.chat_message("assistant"):
st_callback = StreamlitCallbackHandler(st.container())
response = agent_executor.invoke(
{"input": prompt, "file_path": st.session_state['file_path']},
{"callbacks": [st_callback]}
)
st.write(response["output"])
Finally lets take a look at how all this works together, when we upload a json file and combine that with the following user prompt: “Open the file and count the objects in it. Tell me what properties the object has”.
As you can see the application is able to understand the problem, generate the right code and respond to the question correctly. Unfortunately it also assumed that it should offer the file as download and triggered the download_file tool. Some more prompt engineering might be required to correctly tune the selection of the download_file tool and prevent that. The generated code looks like this and does what was asked:
So does this mean that I can get the app to do anything like deleting all the files or exfiltrate all data to a random internet service? Well of course the LLM will happily generate all the code you ask it for:
That is of course not good but since the generated code is executed in a sandbox it will just break the one user session but not the app itself or the data form other users. To check want you packages are installed in the default container image you can check the list in my repo. If what you need is not available you can build your own base image.
What about network operations? By default the session pool is configured to disable egress and only keep a session alive for a configurable period (that you are paying for).
resource dynamicSessions 'Microsoft.App/sessionPools@2024-02-02-preview' = {
name: name
location: location
tags: tags
properties: {
poolManagementType: 'Dynamic'
containerType: 'PythonLTS'
maxConcurrentSessions: 0
scaleConfiguration: {
maxConcurrentSessions: 100
}
dynamicPoolConfiguration: {
executionType: 'Timed'
cooldownPeriodInSeconds: 300
}
sessionNetworkConfiguration: {
status: 'EgressDisabled'
}
}
}
If the network egress is disabled on the service that means that all code generated network operations will fail with an error message that the model cannot handle well because it will try to iterate on different ways on how to implement a network operation. Some more prompt engineering might be required to properly avoid the loop until the model eventually gives up.
Overall the new service offers some exciting capabilities and solves a very relevant problem when leveraging AI generated code. You can use the code in my repo to get started with your own evaluation. Feel free to send feedback and PRs.