• Featured post

Embeddings, Vector Search & BM25

Un ordenador no puede entender texto ni relación semántica o significado entre palabras. Solo puede entender números. Esto lo resolvemos mediante el uso de embeddings.

Un embedding es la representación de texto (en forma de números) en un espacio vectorial. Esto permite a los modelos de IA comparar y operar sobre el significado de las palabras.

flowchart TD
    A["perro"] --> B
    B --> C["[-0.003, 0.043, ..., -0.01]"]
    
    N1["(texto que queremos convertir)"]:::note --> A
    N2["(vectores con contenido semántico)"]:::note --> C
    
    classDef note fill:none,stroke:none,color:#777;    

Los vectores de cada palabra o documento capturan el significado semántico del texto.

  • perro estará cerca de mascota
  • contrato estará lejos de playa

Vector vs SQL databases

El problema con las BBDD típicas es que solo buscan matches exactos. Si yo busco por coche solo me sacará las entradas que contengan coche.

En cambio, como las BBDD vectoriales pueden interpretar la semántica de las palabras mediante los vectores, si busco por coche puede sacarme valores como sedán, SUV, Land Rover, etc.

Las BBDD vectoriales son muy buenas cuando necesitamos buscar items similares por proximidad uno respecto al otro. Un ejemplo de uso es buscar películas parecidas (Netflix). Otro ejemplo son los recomendadores de items parecidos en tiendas online (Amazon).

Como ejecutar una búsqueda (query) mediante vectores

(You can see the code here)

Necesitamos:

  • Una BBDD Vectorial (CosmosDB)
  • Un modelo para transformar los embeddings (text-embedding-3-large)

El flujo completo es el siguiente:

  1. Usar un embedding model para obtener los vectores del contenido que queremos indexar
  2. Insertar el texto original y los vectores del contenido en una BBDD vectorial
  3. Cuando queramos ejecutar una query usar el mismo embedding model de antes con la query a buscar. Con el embedding resultante buscamos vectores similares en la BBDD y sacamos el texto original de original_text

    Introducir vectores en CosmosDB

    Para poder buscar necesitamos rellenar antes la BBDD con contenido. Lo mantenemos simple. Metemos

    • un ID a mano
    • el texto original
    • los vectores resultado de hacer el embedding sobre el texto original

El pseudocódigo se ve así y se ejecuta de uno en uno

text = "A shiba walks alone in the park"
# this sends the text to the model text-embedding-3-large 
vectors = createEmbeddingsForText(text)
item = {
	"id": "1",
	"original_text": text,
	"vectors": vectors
}
uploadToCosmosDB(item)

ejemplos de los datos que guardo

{
	"id": "1",
	"original_text": "A shiba walks alone in the park",
	"vectors": [-0.003, 0.043, ..., -0.001]
}

Read More

.NET AI integration

Today’s AI landscape moves so fast and providers differ so much that vendor lock-in can become expensive. You need a clean, testeable way to add AI without tying your architecture to one SDK.

The solution to this problem is a model-agnostic solution.

Nuggets to use (you need to click “see preliminar versions”):

  • Microsoft.Extensions.AI - This nugget implements IChatClient interface, which is an abstraction to use several LLM providers, from ChatGPT to Ollama.
  • Microsoft.Extensions.AI.OpenAI
  • OllamaSharp (previously Microsoft.Extensions.AI.Ollama)

You’ll need to go to Open AI platform to set up a project, billing, and get an openAI API key.

This repository is a test implementation which connects to OpenAi’s ChatGPT and is able to send prompts.

Best Practices

  • Keep inputs short and specific
  • Validate outputs with regex/JSON schema. Reject or re-ask when invalid
  • Log prompts, token counts, latency and provider responses
  • Improve cost ops. Cache results, batch requests and prefer smaller models by default
  • Don’t commit or send secrets or personal information
  • Failover. Implement timeouts, retries, and fallback models
  • LLMs are stateless; maintaining and reconstructing conversational context is a developer’s responsibility (chat history or memory abstractions)

Security

  • prompt injection: beware with malicious prompts to subvert model guardrails, steal data or execute unintended actions
  • LLMs may leak private or internal data via crafted prompts
  • Training data poisoning may be injected by malicious actors
  • DoS and rate limiting: prevent overuse / abuse

Reference(s)

https://roxeem.com/2025/09/04/the-practical-net-guide-to-ai-llm-introduction/
https://roxeem.com/2025/09/08/how-to-correctly-build-ai-features-in-dotnet/

EF Core multithreading

I’ve had issues with EF Core when operating with multiple threads and with multiple calls at the same time.

The most important things to check are:

  1. The DbContext is not being shared between calls or threads
  2. All classes which have the context inyected must be scoped (not singleton)
  3. If working with async methods, you need to await calls

I have the following service

public class PersonService(AppDbContext _context)
{
	public async Task<Person> GetPerson(string id)
	{
		return await context.Persons.Find(id);
	}
}

which I may configure as follows

// if I inject it as singleton, this would cause exceptions on multiple calls
services.AddSingleton<IPersonService, PersonService>

// we have to inject it as scoped so it creates a context new for each call
services.AddScoped<IPersonService, PersonService>

Caching in .NET (IMemoryCache)

.NET offers several cache types. I’m going to explore here IMemoryCache which stores data in the memory of the web server. It’s simple but not suitable for distributed scenarios.

first of all we need to register the services

builder.Services.AddMemoryCache();

GetOrCreateAsync

here’s how you can inject and use it, without manipulating the cache itself

public class PersonService(IMemoryCache _cache)
{
	private const string CACHE_PERSON_KEY = "PersonService:GetPerson:";

	public async Task<Person> GetPerson(string id)
	{
		return await _cache.GetOrCreateAsync(CACHE_PERSON_KEY + id, async entry =>
		{
			entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5);
			return await GetPersonNoCache(id);
		});
	}

	public async Task<Person> GetPersonNoCache(string id)
	{
		// do operations to get a person here
	}
}

Read More

Three-point estimation

Split tasks in its minimum definition and estimate those minimum tasks by Optimistic (O) - Most Likely (M) - Pessimistic (P).

With those estimations we do PERT distribution and then add those estimations

(O+(4xM)+P)/6

Example

Task: Migrate x database Minimum tasks:

  • migrate service 1 to y database
  • migrate service 2 to y database
  • migrate connector to use y database
  • test changes in test env

Then we estimate those tasks

task Optimistic Most likely Pessimistic PERT Comments
migrate service 1 10h 25h 55h 28h (I round hours up)
migrate service 2 4h 14h 22h 14h take x into account
migrate connector 20h 40h 80h 44h  
test changes 2h 7h 14h 8h  
total estimation for task       94h  

Reference(s)

https://www.knowledgehut.com/blog/project-management/three-point-estimating

C# Async await with parallelism

The following is an example where we need to call and await an external API multiple times inside an iteration.

I’m using myFakeAPI from postman for this example and one of their Car response look like this

public class CarResponse
{
	public CarDto Car { get; set; }
}

public class CarDto
{
	public int Id { get; set; }
	public string Car { get; set; }
	public string Car_Model { get; set; }
	public string Car_Color { get; set; }
	public int Car_Model_Year { get; set; }
	public string Car_Vin { get; set; }
	public string Price { get; set; }
	public bool Availability { get; set; }
}

Then this is the method which does call and mapping

private async Task<CarResponse> ExecuteCall(string id)
{
	string combinedUrl = URL + id;

	using var response = await _httpClient.GetAsync(combinedUrl);
	response.EnsureSuccessStatusCode();

	string json = await response.Content.ReadAsStringAsync();
	return JsonConvert.DeserializeObject<CarResponse>(json);
}

Control

This is the control version where we launch and await the tasks one at a time

// DON'T DO THIS
private async Task<List<CarResponse>> Control()
{
	List<CarResponse> carList = [];
	foreach (string id in _idsList)
	{
		CarResponse singleCar = await ExecuteCall(id);
		carList.Add(singleCar);
	}
	return carList;
}

Task.WhenAll()

It’s the most simple one - all tasks are launched at the same time. It’s ideal when we don’t have limits as we have no control over the simultaneous number of calls

// simple but what if we'd have +100 calls?
private async Task<List<CarResponse>> TaskWhenAll()
{
	var getCarsTask = _idsList.Select(ExecuteCall);
	var cars = await Task.WhenAll(getCarsTask);
	return cars.ToList();
}

Parallel.ForEachAsync()

This gives us the most control over number of parallel calls. It’s more complex.

private async Task<List<CarResponse>> ParallelForEachAsync()
{
	// this is a secure collection for multiple threads
	var carsBag = new ConcurrentBag<CarResponse>();
	var options = new ParallelOptions { MaxDegreeOfParallelism = 5 };

	await Parallel.ForEachAsync(_idsList, options, async (id, ct) =>
	{
		CarResponse car = await ExecuteCall(id);
		carsBag.Add(car);
	});
	return carsBag.ToList();
}

C# generics

Example on how to use generics in C#

public class AnimalService(IConnectorService _service)
{
	public async Task<List<T>> GetAnimals<T> (List<string> ids, string query)
	{
		List<T> results = [];
		var request = new ConnectorRequest
		{
			query = query,
			ids = ids
		};
		response = await _service.Execute(request);
		if((response?.result?.Count ?? 0) > 0)
		{
			results = JsonConvert.DeserializeObject<List<T>>(response.result);
		}
		return results;
	}
}

C# JSON tags Newtonsoft

JsonConvert.SerializeObject

I use this to serialize full objects to log them with all their properties

InputModel x = // ...
log.LogInfo($"doing x. input: {JsonConvert.SerializeObject(x)}");

JsonProperty and NullValueHandling

This is useful for cases where we need to modify the given properties of a class we serialize and give back, but for any reason we don’t want to change the internal structure or naming.

With NullValueHandling we may omit in the JSON a variable in case it’s null.

public class House
{
	public List<Window> windows { get; set; };
	
	[JsonProperty("builtInGarage"), NullValueHandling = NullValueHandling.Ignore]
	public Garage garage { get; set; }; 
}

C# How to get headers

This is how to retrieve headers from any call.

// how to retrieve a mandatory header
if(Request.Headers.TryGetValue("mandatory-header", out var mandatoryHeader))
{
	// this one may be either filled or empty
	string optionalHeader = Request.Headers["optional-header"];
	var result = await _service.DoWork(mandatoryHeader, optionalHeader)
}
else 
{
	// log error as mandatory-header isn't included in the call
}

C# Task async programming (TAP) and parallel code

The core for asynchronous programming are the objects Task and Task<T>. Both of them are compatible with the keywords async and await.

First of all we need to identify if the code’s I/O-bound or CPU-bound.

  • the code’s limited for external operations and waits for something a lot of time. Examples of this are DDBB calls, or a server’s response. In this case we have to use async/await to free the thread while we wait
  • the code does a CPU-intensive operation. Then we move the work to another thread using Task.Run() so we don’t block the main thread.

async code vs parallel code

(!) Asynchronous code is not the same as parallel code (!)

  • In async code you are trying to make your threads do as little work as possible. This will keep your app responsibe, capable to serve many requests at once and scale well.
  • In parallel code you do the opposite. You use and keep a hold on a thread to do CPU-intensive calculations

async code

The importante of async programming is that you choose when to wait on a task. This way, you can start other tasks concurrently

In async code, one single thread can start the next task concurrently before the previous one completes.
(!) async code doesn’t cause additional threads to be created because an async method doesn’t run on its own thread. (!) It runs on the current synchronization context and uses time on the thread only when the method is active.

parallel code

For parallelism you need multiple threads where each thread executes a task, and all of those tasks are executed at the same time

Read More

Mock multiple calls with same params

This is an example on how to mock a call when it’s called multiple times, and with the same parameter type every time.

Setup

I have the following class…

public class ConnectorRequest
{
	public string Query { get; set; }
}

… which will be consumed by the following service

public class IConnectorService
{
	Task<string> Execute(ConnectorRequest request);
}

Then I have a class which calls IConnectorService multiple times

public class ConnectorConsumerService
{
	private IConnectorService _service;
	
	// ...
	
	public async Task<string> Process() 
	{
		// ... does whatever
		var response1 = await _service.Execute(request1);
		// ... does whatever with that information
		var response2 = await _service.Execute(request2);
		// ... does whatever with that information
		var response3 = await _service.Execute(request3);
		// ... does whatever with that information
		// ... does whatever else
	}
	
	// ...
	
}

Test

Test which mocks multiple calls

public class ConnectorConsumerServiceTest
{
	// all mocks and stubs
	private Mock<IConnectorService> _dependencyMock;

	// service under test
	private ConnectorConsumerService _service;

	public ConnectorConsumerServiceTest()
	{
		_dependencyMock = new Mock<IConnectorService>();
		_service = new ConnectorConsumerService(_dependencyMock.Object);
	}

	[Fact]
	public async Task ProcessXXX_CaseXXX_ShouldReturnOkay()
	{
		// ARRANGE
		// example starts here! -> 
		var responseToExecution1 = new ConnectorRequest
		{
			Query = "some response";
		}
		var responseToExecution2 = new ConnectorRequest
		{
			Query = "another response";
		}
		var responseToExecution3 = new ConnectorRequest
		{
			Query = "oh no! a response";
		}
		
		_dependencyMock.SetupSequence(mock => mock.Execute(It.IsAny<ConnectorRequest>()))
			.ReturnsAsync(responseToExecution1)
			.ReturnsAsync(responseToExecution2)
			.ReturnsAsync(responseToExecution3);
		// <- example ends here

		// ACT
		var result = await _service.Process();
	
		// ASSERT
		result.status.Should().NotBeNull();
		// ... assert whatever
	}
}