Ai Agents For Cloud & Devops Engineers: Rag Operations

There’s a constant theory that you’ll see primarily from sales and marketing people stating “AI is taking jobs” and “software development jobs are going away”, none of which is valid. In fact, AI can potentially create a ton of new jobs in engineering disciplines including cloud and DevOps.
????
Fun fact: it already is creating new jobs. In my fractional consulting business, I’ve already had 3 requests for creating AI services like Azure Foundry AI and it’s only February at the time of writing this.
In this blog post, you’ll learn about one potential way with AI Agents.
What Is An AI Agent
Think of AI Agents as a subset of tasks that are spread out across various roles. For example, you can have an Agent that would write Python code for creating a particular data set and you could have another AI Agent that would write Terraform code for you. The important thing to remember is the goal of an Agent is to have it do a particular thing and do that particular “thing” very well. Because of that, you don’t want an Agent that does multiple things.
????
We unfortunately see a lot of tech go down the direction of “this product can do 10 different things, so let’s have it do those 10 different things” even though it may only be really good at 2 of those things. If you try the same with Agents, you’ll have unfortunate outcomes.
An Agent always starts with a base model like GPT-4o or Llama and with that base Model, you can either use a RAG or fine-tune the Model so the Agent becomes as close to an expert as possible on the specific task you want it to do.
It’s why Agents are becoming so good at being virtual assistants and even your pair programming colleague.
Whats RAG
Retrieval Augmented System (RAG) enhances a Large Language Model (LLM), as in, increases it’s knowledge/what it’s currently trained to know by giving it access to external data sources in real-time.
For example, let’s say you want to use an LLM and realize that the training it currently has isn’t up to speed with the latest information that has to do with which GPUs are available in particular regions. You can find a doc/URL that does have the information and feed it to the LLM through a RAG.
RAG vs Fine-Tuning
The biggest difference with fine-tuning is that you’re adjusting the model, or in other words, re-training it on a particular data set to perform a particular task.
With a RAG, you can feed slivers of information to a Model. With fine-tuning, you re-train it to understand particular pieces of knowledge at a deeper level.
Weights
A Weight is all about asking a Model a particular question and getting a response that’s deemed valid. That’s why the idea of prompt engineering is so important. When you ask your favorite chatbot a question, it looks at all of the words and adjusts the “weight” which decides what words should come next.
The more “weight” that’s used means more influence on the output of the words that are in the output when you ask a question.
So, an LLM weight is just a tiny number that helps the model decide how important different words and ideas are when forming a response.
Using A Base Model
Now that you know a bit about GenAI, RAG, and Agents, let’s learn how to put it all together with a framework called CrewAI, which is essentially a wrapper around various APIs that makes it way easier to create and implement AI Agents.
First, to use CrewAI, you’ll need the library on your local machine. You can do this by bringing down the packages to your global config or in a Python virtual environment. You’ll also need the os
library.
from crewai import Agent, Task, Crew, Process, LLM
from crewai_tools import SerperDevTool, WebsiteSearchTool
import os
Next, you’ll need an OpenAI API key and a Serper API key.
????
Serper is a Google search API and is completely free.
os.environ.get('OPENAI_API_KEY')
os.environ.get('SERPER_API_KEY')
Create two variables - one for Serper and another for the website search tool.
serper = SerperDevTool()
web = WebsiteSearchTool()
????
The above config for the Website Search Tool is for RAG operations, which we’ll talk about later.
Now that the libraries and variables are set up, you can on the specific CrewAI actions. The first is to set up the Agent itself. The Agent allows you to specify the role, how the data is searched, and the Model that’s used for GenAI.
modelType = LLM(model='gpt-4o', temperature=0.2)
search = Agent(
role="Performance Optimizer",
goal="Optimize the performance of containerized workloads based on region that they are deployed to.",
backstory="based on performance and cost, deploying containerized solutions in a specific region may be necessary.",
tools=[serper, web],
# if you don't specify a Model, it will default to gpt-4
llm=modelType,
)
The last configuration is the Task, which is the job that’s conducted. In this case, the example provides an input/output to tell you the best region to deploy to in Azure from a cost and performance perspective.
job = Task(
description="tell me the best region to deploy Kubernetes workloads to based on performance and cost.",
expected_output="At the time of writing this, as in, right this second, what is the best region in Microsoft Azure\
to deploy applications on Azure Kubernetes Service (AKS) based on performance and cost?\
The performance is incredibly important so do not choose a region only based on cost\
if the performance is bad. Please provide the region you recommend for this use case.",
agent=search
)
You can now call upon the Crew
class which allows you to tie the Agent and the jobs/tasks together to be called upon.
crew = Crew(
agents=[search],
tasks=[job],
verbose=True,
process=Process.sequential
)
The kickoff()
method starts/runs the Agents.
crew.kickoff()
Altogether, you can wrap the logic into a function and run it like in the below.
from crewai import Agent, Task, Crew, Process, LLM
from crewai_tools import SerperDevTool, WebsiteSearchTool
import os
def main():
os.environ.get('OPENAI_API_KEY')
os.environ.get('SERPER_API_KEY')
serper = SerperDevTool()
web = WebsiteSearchTool()
modelType = LLM(model='gpt-4o', temperature=0.2)
aksDocs = WebsiteSearchTool(website='https://docs.microsoft.com/en-us/azure/aks/')
aksMultiRegion=WebsiteSearchTool(website='https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks-multi-region/aks-multi-cluster')
dr = WebsiteSearchTool(website='https://learn.microsoft.com/en-us/azure/aks/ha-dr-overview')
search = Agent(
role="Performance Optimizer",
goal="Optimize the performance of containerized workloads based on region that they are deployed to.",
backstory="based on performance and cost, deploying containerized solutions in a specific region may be necessary.",
tools=[serper, aksDocs, aksMultiRegion, dr, web],
# if you don't specify a Model, it will default to gpt-4
llm=modelType,
)
job = Task(
description="tell me the best region to deploy Kubernetes workloads to based on performance and cost.",
expected_output="At the time of writing this, as in, right this second, what is the best region in Microsoft Azure\
to deploy applications on Azure Kubernetes Service (AKS) based on performance and cost?\
The performance is incredibly important so do not choose a region only based on cost\
if the performance is bad. Please provide the region you recommend for this use case.",
agent=search
)
crew = Crew(
agents=[search],
tasks=[job],
verbose=True,
process=Process.sequential
)
crew.kickoff()
if __name__ == '__main__':
main()
Using A Local Model
In the previous section, you learned how to set up an Agent using a, what some may call “SaaS-based LLM” as it’s called upon via an API and managed for you, but you can’t actually download it.
In this section, you’ll learn how to do the same thing, but with a local Model.
First, download Ollama on your machine. You can find the download here.
????
Running a Model locally can take a fair amount of resources (CPU/memory), so you’ll want to do it on a machine with enough resources available.
Next, use the langchain_openai
library to use the ChatOpenAI
class which will allow you to call upon the model locally.
from langchain_openai import ChatOpenAI
Pull the Model locally.
ollama pull ollama/phi4:14b
Create a variable that will consist of:
- The Llama Model
- The temperature
- The base URL, which is your local machine
modelType = ChatOpenAI(model="ollama/phi4:14b", temperature=0.1, base_url="http://localhost:11434")
Altogether, the code within a main()
function will look like the one below.
from crewai import Agent, Task, Crew, Process, LLM
from crewai_tools import SerperDevTool, WebsiteSearchTool
import os
from langchain_openai import ChatOpenAI
def main():
os.environ.get('OPENAI_API_KEY')
os.environ.get('SERPER_API_KEY')
modelType = ChatOpenAI(model="ollama/phi4:14b", temperature=0.1, base_url="http://localhost:11434")
serper = SerperDevTool()
web = WebsiteSearchTool()
search = Agent(
role="Performance Optimizer",
goal="Optimize the performance of containerized workloads based on region that they are deployed to.",
backstory="based on performance and cost, deploying containerized solutions in a specific region may be necessary.",
tools=[serper, web],
llm=modelType,
)
job = Task(
description="tell me the best region to deploy Kubernetes workloads to based on performance and cost.",
expected_output="At the time of writing this, as in, right this second, what is the best region in Microsoft Azure\
to deploy applications on Azure Kubernetes Service (AKS) based on performance and cost?\
The performance is incredibly important so do not choose a region only based on cost\
if the performance is bad. Please provide the region you recommend for this use case.",
agent=search
)
crew = Crew(
agents=[search],
tasks=[job],
verbose=True,
process=Process.sequential
)
crew.kickoff()
if __name__ == '__main__':
main()
Creating A Rag
In the previous two sections, you learned how to generate output via a Model using an AI Agent with a general RAG using the default website search tool to search the entire internet.
In the three variables below, you’ll see the website search tool used for RAG operations, but notice how websites are specified. This is how, in this instance, the RAG can pull unstructured data.
aksDocs = WebsiteSearchTool(website='https://docs.microsoft.com/en-us/azure/aks/')
aksMultiRegion=WebsiteSearchTool(website='https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks-multi-region/aks-multi-cluster')
dr = WebsiteSearchTool(website='https://learn.microsoft.com/en-us/azure/aks/ha-dr-overview')
Within the tools
variable list in the Agent()
class, you can specify each variable for the RAG operations and use it within the tools utility.
tools=[serper, web, aksDocs, aksMultiRegion, dr],
The final script can look like the following:
from crewai import Agent, Task, Crew, Process, LLM
from crewai_tools import SerperDevTool, WebsiteSearchTool
import os
from langchain_openai import ChatOpenAI
def main():
os.environ.get('OPENAI_API_KEY')
os.environ.get('SERPER_API_KEY')
modelType = ChatOpenAI(model="ollama/phi4:14b", temperature=0.1, base_url="http://localhost:11434")
serper = SerperDevTool()
web = WebsiteSearchTool()
aksDocs = WebsiteSearchTool(website='https://docs.microsoft.com/en-us/azure/aks/')
aksMultiRegion=WebsiteSearchTool(website='https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks-multi-region/aks-multi-cluster')
dr = WebsiteSearchTool(website='https://learn.microsoft.com/en-us/azure/aks/ha-dr-overview')
search = Agent(
role="Performance Optimizer",
goal="Optimize the performance of containerized workloads based on region that they are deployed to.",
backstory="based on performance and cost, deploying containerized solutions in a specific region may be necessary.",
tools=[serper, web, aksDocs, aksMultiRegion, dr],
# if you don't specify a Model, it will default to gpt-4
llm=modelType,
)
job = Task(
description="tell me the best region to deploy Kubernetes workloads to based on performance and cost.",
expected_output="At the time of writing this, as in, right this second, what is the best region in Microsoft Azure\
to deploy applications on Azure Kubernetes Service (AKS) based on performance and cost?\
The performance is incredibly important so do not choose a region only based on cost\
if the performance is bad. Please provide the region you recommend for this use case.",
agent=search
)
crew = Crew(
agents=[search],
tasks=[job],
verbose=True,
process=Process.sequential
)
crew.kickoff()
if __name__ == '__main__':
main()
Generating Code
Last but not least is code generation. The key difference is the allow_code_execution
block within the Agent. This allows the Agent to write and execute code.
codeDeploy = Agent(
role="AKS Creator",
goal="Create an AKS Cluster",
backstory="You are a DevOps Engineer that knows how to write the code for an AKS cluster and how to deploy it",
allow_code_execution=True,
llm=modelType
)
????
One thing I set out for was to have an Agent deploy real code for me on Azure. Unfortunately, that is out of scope for what an Agent can do. You can, however, still have the code generated for you to deploy.
It’ll also be very important to ensure that you perform proper prompt engineering
job = Task(
description="Write a config with Terraform to deploy an AKS cluster and then deploy the AKS cluster to East US",
agent=codeDeploy,
expected_output="Write a config with Terraform to deploy an AKS cluster and then deploy the AKS cluster to East US"
)
The final output looks like the below:
from crewai import Agent, Task, Crew, LLM
from crewai_tools import GithubSearchTool
import os
from langchain_openai import ChatOpenAI
def main():
os.environ.get('OPENAI_API_KEY')
os.environ.get('SERPER_API_KEY')
modelType = ChatOpenAI(model="ollama/phi4:14b", temperature=0.0, base_url="http://localhost:11434")
codeDeploy = Agent(
role="AKS Creator",
goal="Create an AKS Cluster",
backstory="You are a DevOps Engineer that knows how to write the code for an AKS cluster and how to deploy it",
allow_code_execution=True,
llm=modelType
)
job = Task(
description="Write a config with Terraform to deploy an AKS cluster and then deploy the AKS cluster to East US",
agent=codeDeploy,
expected_output="Write a config with Terraform to deploy an AKS cluster and then deploy the AKS cluster to East US"
)
crew = Crew(
agents=[codeDeploy],
tasks=[job],
verbose=True,
)
crew.kickoff()
if __name__ == '__main__':
main()
Closing Thoughts
If you’re in the DevOps/Cloud/Platform Engineering space and you’re thinking to yourself “How can I learn this whole AI thing?” without just using chatbots to ask questions and perform some true engineering tasks, a great way to learn is by creating and deploying AI Agents.