Building a Streaming AI CLI Tool with Microsoft Agents Framework (C#)
Introduction
The Microsoft Agents Framework provides a powerful abstraction layer for building AI-powered applications in .NET. In this post, we’ll explore how to create a command-line interface (CLI) tool that leverages this framework to stream AI responses in real-time, providing users with immediate feedback as the AI generates content.
Why Streaming Matters
When working with large language models (LLMs), responses can take several seconds to generate. Streaming allows us to display tokens as they’re generated, rather than waiting for the complete response. This creates a more responsive user experience and makes the application feel faster and more interactive.
Prerequisites
To follow along with this tutorial, you’ll need:
- .NET 9.0 SDK or later
- Ollama installed and running locally
- Basic knowledge of C# and async programming
Setting Up the Project
First, create a new console application and add the required NuGet packages:
dotnet new console -n StreamingAICLI
cd StreamingAICLI
dotnet add package Microsoft.Agents.AI --version 1.0.0-preview.251028.1
dotnet add package OllamaSharp --version 5.4.8
dotnet add package Spectre.Console --version 0.49.1
The packages we’re using:
- Microsoft.Agents.AI: The core framework for building AI agents
- OllamaSharp: Client library for interacting with Ollama
- Spectre.Console: Rich console UI library for enhanced terminal output
Core Implementation
Connecting to Ollama
Start by establishing a connection to your Ollama server and retrieving available models:
using Microsoft.Extensions.AI;
using Microsoft.Agents.AI;
using OllamaSharp;
using Spectre.Console;
using System.Text;
// Configure the Ollama server connection
var serverUrl = "http://localhost:11434";
var ollama = new OllamaApiClient(serverUrl);
// Retrieve available models
var models = await ollama.ListLocalModelsAsync();
var modelNames = models.Select(m => m.Name).ToList();
if (!modelNames.Any())
{
AnsiConsole.MarkupLine("[red]No local models found. Please install a model first.[/]");
return;
}
Creating the Chat Agent
The ChatClientAgent class from the Microsoft Agents Framework wraps the AI model and provides streaming capabilities:
// Select a model
var selectedModel = "llama3.2:latest";
// Create the chat client
var chatClient = new OllamaApiClient(serverUrl, selectedModel);
// Configure the agent with instructions
var agent = new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "AI Assistant",
Instructions = "You are a helpful assistant that provides clear and concise answers."
});
Streaming Responses
The key to streaming is using the RunStreamingAsync method, which returns an IAsyncEnumerable<string>:
var prompt = "Explain what streaming responses are and why they're useful.";
// Stream the response token by token
await foreach (var token in agent.RunStreamingAsync(prompt))
{
Console.Write(token);
}
Console.WriteLine();
This simple loop writes each token to the console as it arrives, creating a typewriter effect that shows the response being generated in real-time.
Enhanced User Experience with Spectre.Console
For a more polished experience, we can use Spectre.Console to add visual feedback:
var responseBuilder = new StringBuilder();
await AnsiConsole.Status()
.Spinner(Spinner.Known.Dots)
.Start("[green]Generating response...[/]", async ctx =>
{
await foreach (var token in agent.RunStreamingAsync(prompt))
{
Console.Write(token);
responseBuilder.Append(token);
}
});
Console.WriteLine();
This displays a spinner while connecting to the model, then streams the response as it’s generated.
Building a Conversation Loop
A complete CLI tool should support multiple queries in a single session:
while (true)
{
var userPrompt = AnsiConsole.Ask<string>("[green]Your question:[/]");
if (userPrompt.Equals("exit", StringComparison.OrdinalIgnoreCase))
{
break;
}
var response = new StringBuilder();
await AnsiConsole.Status()
.Start("[green]Thinking...[/]", async ctx =>
{
await foreach (var token in agent.RunStreamingAsync(userPrompt))
{
Console.Write(token);
response.Append(token);
}
});
Console.WriteLine("\n");
}
Advanced Features
Saving Conversation History
You can capture the streamed responses and save them for later reference:
var timestamp = DateTime.Now.ToString("yyyy-MM-dd_HH-mm-ss");
var fileName = $"conversation_{timestamp}.md";
var markdown = $"""
# Conversation - {DateTime.Now:yyyy-MM-dd HH:mm:ss}
## Model
{selectedModel}
## Prompt
{userPrompt}
## Response
{response.ToString()}
---
*Generated on {DateTime.Now:yyyy-MM-dd HH:mm:ss}*
""";
await File.WriteAllTextAsync(fileName, markdown);
Model Switching
Allow users to switch between different AI models during a session:
if (userPrompt.Equals("switch model", StringComparison.OrdinalIgnoreCase))
{
selectedModel = AnsiConsole.Prompt(
new SelectionPrompt<string>()
.Title("[green]Select a model:[/]")
.AddChoices(modelNames));
// Recreate the agent with the new model
chatClient = new OllamaApiClient(serverUrl, selectedModel);
agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
Name = "AI Assistant",
Instructions = "You are a helpful assistant."
});
continue;
}
Custom Instructions
Allow users to configure the agent’s behavior:
var instructions = AnsiConsole.Ask<string>(
"[green]Enter instructions for the AI:[/]",
"You are a helpful assistant that provides clear answers.");
var agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
Name = "AI Assistant",
Instructions = instructions
});
Complete Example
Here’s a minimal but complete working example:
using Microsoft.Extensions.AI;
using Microsoft.Agents.AI;
using OllamaSharp;
using Spectre.Console;
AnsiConsole.MarkupLine("[bold cyan]AI Chat Assistant[/]");
var serverUrl = "http://localhost:11434";
var ollama = new OllamaApiClient(serverUrl);
var models = await ollama.ListLocalModelsAsync();
if (!models.Any())
{
AnsiConsole.MarkupLine("[red]No models found![/]");
return;
}
var model = models.First().Name;
var chatClient = new OllamaApiClient(serverUrl, model);
var agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
Name = "Assistant",
Instructions = "You are a helpful AI assistant."
});
AnsiConsole.MarkupLine($"[cyan]Using model:[/] {model}\n");
while (true)
{
var prompt = AnsiConsole.Ask<string>("[green]You:[/]");
if (prompt.Equals("exit", StringComparison.OrdinalIgnoreCase))
break;
Console.Write("\n[AI]: ");
await foreach (var token in agent.RunStreamingAsync(prompt))
{
Console.Write(token);
}
Console.WriteLine("\n");
}
AnsiConsole.MarkupLine("[yellow]Goodbye![/]");
Key Takeaways
- Microsoft Agents Framework provides a clean abstraction for working with AI models in .NET
- Streaming responses significantly improve user experience by providing immediate feedback
- IAsyncEnumerable is the key pattern for consuming streamed content in C#
- Spectre.Console enhances CLI applications with rich, interactive UI elements
- The framework is model-agnostic, working with Ollama, OpenAI, Azure OpenAI, and other providers
Performance Considerations
Streaming responses reduces perceived latency but doesn’t necessarily reduce actual processing time. However, users perceive the application as faster because they see progress immediately. This is especially important for:
- Long-form content generation
- Complex reasoning tasks
- Multi-step processing
Conclusion
The Microsoft Agents Framework makes it straightforward to build sophisticated AI-powered CLI tools with streaming capabilities. By leveraging async enumerables and the framework’s abstractions, you can create responsive, professional command-line applications that provide real-time feedback to users.
The combination of the Agents Framework with libraries like Spectre.Console enables you to build CLI tools that rival graphical applications in terms of user experience while maintaining the efficiency and composability that command-line tools are known for.
Further Reading
- Microsoft Agents Framework Documentation
- Ollama Documentation
- Spectre.Console Documentation
- Async Streams in C#
Source Code
A complete working example demonstrating these concepts is available with proper error handling, configuration options, and additional features for production use.