Today, I’m excited to share a practical project: a console app that generates captions for images using LLaVA, a vision-language model powered by Ollama.
We’ll use dependency injection with Microsoft.Extensions.AI
to keep things modern and clean.
This app will preload images from a project folder, let users pick one, and spit out a caption. All in a few lines of code.
Whether you’re exploring local AI tools or just want a fun C# challenge, let’s dive in and build it together!
Setting Up Ollama
First, we need Ollama to run LLaVA locally.
Ollama is a lightweight tool that makes hosting models like this a breeze. We’ll start with llava:7b
, a smaller version that’s friendly to most machines, and I’ll point out how to level up later.
First thing first: install Ollama
1. Download Ollama
Visit ollama.com and grab the installer for your OS: Windows, Mac, or Linux.
For Linux folks, you can use this terminal command:
curl -fsSL https://ollama.com/install.sh | sh
2. Verify It’s Working
Open a terminal (Command Prompt, PowerShell, or your shell of choice) and run:
ollama --version
You should see a version number. If not, double-check the installation.
3. Pull LLaVA 7B
In your terminal, type:
ollama pull llava:7b
This downloads the 7-billion-parameter version of LLaVA
. It's around 4GB, so it’s lighter than the full model. Depending on your internet connection, it can take some minutes.
4. Launch LLaVA
Start it with:
ollama run llava:7b
This fires up a local server at http://localhost:11434
. Keep the terminal running while we work.
To make sure all is set, go to http://localhost:11434
. You should see a message in your browser saying: Ollama is running
Pro Tip: The llava:7b
model works well with modest hardware (8GB RAM minimum, GPU optional). For better accuracy and detail, you can install the full llava
model with ollama pull llava
. It's heftier (7GB+), so you’ll want 16GB RAM and ideally a GPU with 8GB+ VRAM.
Creating the C# Project
You’re all seasoned C# pros, so I won’t micromanage your setup. Whether you’re rocking Visual Studio, Rider, or VSCode, here’s the gist of what you need to do.
Project Setup
- New Console App
Create a new console app in your IDE of choice. I’m targeting .NET 8
for this, but .NET 9
, .NET Core
or Framework
will work too.
I'll name this project: ImageAICaptioner
.
- Add Dependencies
We’ll need three NuGet packages (make sure you have the "Prerelease" flag on):
Microsoft.Extensions.AI
(preview version, e.g., 9.3.0-preview.1.25161.3
)
Microsoft.Extensions.AI.Ollama
(same version)
Microsoft.Extensions.Hosting
(to configure chat client)
Install these via your preferred method, NuGet Package Manager
, CLI
(dotnet add package
), or however you roll.
- Set Up an Images Folder
Add a folder named images
to your project root.
Toss in a few test images (e.g., dog.jpg
, cat.png
, car.png
).
Make sure they copy to the output directory: set their properties to “Copy if newer” in Visual Studio, or tweak your .csproj if you’re on another IDE.
Writing the Code
Let’s build the app step by step.
We’ll set up dependency injection, preload images from the images folder, let the user choose one, and generate a caption with LLaVA
.
I’ll break it down so you can follow along easily.
Step 1: Set Up Dependency Injection
We’ll use Microsoft.Extensions.Hosting
to configure a chat client for LLaVA
.
Add this at the top of Program.cs
:
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Hosting;
var host = Host.CreateDefaultBuilder(args)
.ConfigureServices(services =>
{
services.AddChatClient(
new OllamaChatClient(new Uri("http://localhost:11434"), "llava:7b"));
})
.Build();
var chatClient = host.Services.GetRequiredService<IChatClient>();
This sets up a host, registers an IChatClient
for llava:7b
and pulls it from the DI container.
Clean and reusable!
Step 2: Preload and List Images
Next, we’ll scan the images folder and show the user their options:
var imagesFolder = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "images");
if (!Directory.Exists(imagesFolder))
{
Console.WriteLine("No 'images' folder found! Add one with some images.");
return;
}
var jpgFiles = Directory.GetFiles(imagesFolder, "*.jpg", SearchOption.TopDirectoryOnly);
var pngFiles = Directory.GetFiles(imagesFolder, "*.png", SearchOption.TopDirectoryOnly);
var imageFiles = jpgFiles.Concat(pngFiles).ToArray();
if (imageFiles.Length == 0)
{
Console.WriteLine("No images in the 'images' folder. Add some and try again!");
return;
}
Console.WriteLine("Welcome to the Image AI Caption Generator!");
Console.WriteLine("\nChoose an image to caption:");
for (var i = 0; i < imageFiles.Length; i++)
{
Console.WriteLine($"{i + 1}. {Path.GetFileName(imageFiles[i])}");
}
This checks for the folder, grabs all .jpg
and .png
files, and lists them with numbers (e.g., 1. dog.jpg
) using System.IO
.
Step 3: Handle User Input
Let’s get the user’s choice and validate it:
Console.Write("\nEnter the number of your choice: ");
if (!int.TryParse(Console.ReadLine(), out var choice)
|| choice < 1 || choice > imageFiles.Length)
{
Console.WriteLine("Invalid choice. Exiting!");
return;
}
var selectedImage = imageFiles[choice - 1];
Simple error checking ensures the input is a valid number within range, then picks the corresponding file.
Step 4: Generate the Caption
Now, we’ll send the image as a byte array along with the type, and ask LLaVA
for a caption:
try
{
var prompt = new ChatMessage(ChatRole.User, "Describe this image in one sentence.");
prompt.Contents.Add(
new DataContent(
File.ReadAllBytes(selectedImage),
Path.GetExtension(selectedImage).ToLower() == ".png" ? "image/png" : "image/jpeg"));
var response = await chatClient.GetResponseAsync(prompt);
Console.WriteLine($"\nCaption: {response.Messages[0].Text}");
}
catch (Exception ex)
{
Console.WriteLine($"Oops, something failed: {ex.Message}");
}
Here's where the magic happens 🪄
We're sending the image to LLaVA
and asking it to describe what it sees.
First, we create a message with the instruction "Describe this image in one sentence." pretending it's a user talking to the AI (role ChatRole.User
).
Then, we grab the raw bytes of the selected image file and attach them to the message, letting LLaVA
know whether it’s a PNG
or JPEG
based on the file extension.
After that, we send this whole package off to the chat client, which talks to LLaVA
and waits for a response.
Once we get it back, we pull out the first message's text (our caption) and print it to the console.
If anything goes wrong (like a network hiccup or LLaVA
choking on the image), we catch the error and let the user know something failed.
Step 5: Wrap It Up
Finish with a prompt to exit:
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
Full Code
Rather than pasting it all here again, I’ve uploaded the complete project to a GitHub repo.
Running the App
Let’s test it out!
Steps to Run
- Start Ollama
Ensure llava:7b
is running with ollama run llava:7b
in a terminal.
Leave it open.
- Launch the App
Build and run your project.
The images folder should copy to your output directory.
- Try It
You’ll see something like:
Welcome to the Image AI Caption Generator!
Choose an image to caption:
1. dog.jpg
2. rabbit.jpg
3. car.jpg
4. llama.jpg
5. cat.png
Enter the number of your choice:
Enter 4
, and you might get:
Caption: A large llama stands alone in a green field, looking directly at the camera on a clear day with a blue sky.
You can play with it. LLaVA
will produce a different caption every time.
Troubleshooting
- “No images”: Verify your images folder exists and has files set to copy to output. You can extend it, but this sample reads only
.png
orjpg
images. - Connection issues: Confirm
ollama run llava:7b
is active andhttp://localhost:11434
responds (test in a browser). - Performance woes: If it’s sluggish, your hardware might need the lighter
llava:7b
, or upgrade tollava
if you’ve got the specs. - Not correct captions: Remember that this is a small model running locally and works with a few tokens. Depending on the picture clarity or size, you might get different results. If you have the hardware, try to upgrade to
llava
.
A Note on the Preview Package
Before we wrap up, a quick heads-up: the Microsoft.Extensions.AI
and Microsoft.Extensions.AI.Ollama
packages we're using are currently in preview, specifically version 9.3.0-preview.1.25161.3
as of this post.
That means the code I've shared here works with that version, but it's provided as-is. Since it's a preview, the API might shift in future releases. Methods like GetResponseAsync
or the way we handle image data with DataContent
could change as the package matures.
If you're reading this later and things don't quite line up, check the latest docs or NuGet updates for adjustments.
For now, this setup gets us up and running with LLaVA
, and I’m excited to see where this library takes us as it evolves!
Why This Matters
Using Microsoft.Extensions.AI
with dependency injection isn’t just for show.
It makes swapping tools or scaling up dead simple.
Preloading images keeps it user-friendly without extra hassle.
Let me know how this works for you or if you’ve got ideas to tweak it. There's lot of room to grow!
Happy Coding! ⚡