• Featured post

Embeddings, Vector Search & BM25

Un ordenador no puede entender texto ni relación semántica o significado entre palabras. Solo puede entender números. Esto lo resolvemos mediante el uso de embeddings.

Un embedding es la representación de texto (en forma de números) en un espacio vectorial. Esto permite a los modelos de IA comparar y operar sobre el significado de las palabras.

flowchart TD
    A["perro"] --> B
    B --> C["[-0.003, 0.043, ..., -0.01]"]
    
    N1["(texto que queremos convertir)"]:::note --> A
    N2["(vectores con contenido semántico)"]:::note --> C
    
    classDef note fill:none,stroke:none,color:#777;    

Los vectores de cada palabra o documento capturan el significado semántico del texto.

  • perro estará cerca de mascota
  • contrato estará lejos de playa

Vector vs SQL databases

El problema con las BBDD típicas es que solo buscan matches exactos. Si yo busco por coche solo me sacará las entradas que contengan coche.

En cambio, como las BBDD vectoriales pueden interpretar la semántica de las palabras mediante los vectores, si busco por coche puede sacarme valores como sedán, SUV, Land Rover, etc.

Las BBDD vectoriales son muy buenas cuando necesitamos buscar items similares por proximidad uno respecto al otro. Un ejemplo de uso es buscar películas parecidas (Netflix). Otro ejemplo son los recomendadores de items parecidos en tiendas online (Amazon).

Como ejecutar una búsqueda (query) mediante vectores

(You can see the code here)

Necesitamos:

  • Una BBDD Vectorial (CosmosDB)
  • Un modelo para transformar los embeddings (text-embedding-3-large)

El flujo completo es el siguiente:

  1. Usar un embedding model para obtener los vectores del contenido que queremos indexar
  2. Insertar el texto original y los vectores del contenido en una BBDD vectorial
  3. Cuando queramos ejecutar una query usar el mismo embedding model de antes con la query a buscar. Con el embedding resultante buscamos vectores similares en la BBDD y sacamos el texto original de original_text

    Introducir vectores en CosmosDB

    Para poder buscar necesitamos rellenar antes la BBDD con contenido. Lo mantenemos simple. Metemos

    • un ID a mano
    • el texto original
    • los vectores resultado de hacer el embedding sobre el texto original

El pseudocódigo se ve así y se ejecuta de uno en uno

text = "A shiba walks alone in the park"
# this sends the text to the model text-embedding-3-large 
vectors = createEmbeddingsForText(text)
item = {
	"id": "1",
	"original_text": text,
	"vectors": vectors
}
uploadToCosmosDB(item)

ejemplos de los datos que guardo

{
	"id": "1",
	"original_text": "A shiba walks alone in the park",
	"vectors": [-0.003, 0.043, ..., -0.001]
}

Read More

C# Expression bodied members

Expression body definitions let you provide a member’s implementation in a concise, readable form.

Methods

Expression-bodied method consists of a single expression that returns:

  • a value whose type matches the return value
  • or performs an operation for void methods
public class Person(string firstName, string lastName)
{
	public override string ToString() => $"{firstName} {lastName}".Trim();
	public void DisplayName() => Console.WriteLine(ToString());
}

Read-only properties

You can use them to implement read-only property.

public class Location(string locationName)
{
	public string Name => locationName;
}

Reference(s)

https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/expression-bodied-members

C# Collections' index and range operators

Index operator ^1

The index operator is used to reach the last positions of an array or list.

before

List<int> arr = [0, 1, 2, 3, 4, 5];
if(arr[arr.Count-1] == 5)
	Console.WriteLine("match!"); // prints

same but using index operator

List<int> arr = [0, 1, 2, 3, 4, 5];
if(arr[^1] == 5)
	Console.WriteLine("match!"); // prints

we can use it to reach any position starting from the back

List<int> arr = [0, 1, 2, 3, 4, 5];
if(arr[^2] == 4)
	Console.WriteLine("match!"); // prints

Read More

Deconstruction in ASP.NET Core

Desconstructors allows us to extract in a single expression, properties of an object or elements of a tuple, and assign them to distinct variables.

record and record struct automatically provide a Deconstruct method.

Classes

before deconstruction we had to manually extract each variable

var pat = new Person() { Name = "Patrick", BirdDate = new DateOnly(1999, 1, 18)};
var name = pat.Name;
var birthDate = pat.BirthDate;
Console.WriteLine($"Name: {name}, birthdate: {birthDate}");

class Person

public class Person
{
	public string Name { get; set; }
	public string Location { get; set; }
}

with deconstruction

var pat = new Person() { Name = "Patrick", BirdDate = new DateOnly(1999, 1, 18)};
var (name, birthDate) = pat;
Console.WriteLine($"Name: {name}, birthdate: {birthDate}");

class Person with Deconstructor

public class Person
{
	public string Name { get; set; }
	public string Location { get; set; }
	
	public void Deconstruct(out string name, out string location) 
	{
		name = Name;
		location = Location;
	}
}

if you don’t want all properties, just discard some using the discard character _

var pat = new Person() { Name = "Patrick", BirdDate = new DateOnly(1999, 1, 18)};
var (name, _) = pat;
Console.WriteLine($"Name: {name}");

Tuples

The same applies to tuples

// tuple creation
(string name, string location) person = ("mario", "spain");
// deconstructor
var (name, location) = person;

Reference(s)

https://blog.ndepend.com/deconstruction-in-c/

Dependency injection in .NET Core

Dependency injection is native-available in .NET Core

DI implementation

interface we want to inject

public interface IDateTime
{
	DateTime Now { get; }
}

interface’s implementation

public class SystemDateTime : IDateTime
{
	public DateTime Now
	{
		get { return DateTime.Now; }
	}
}

service we inject into

public class HomeController : Controller
{
	private readonly IDateTime _dateTime;

	public HomeController(IDateTime dateTime)
	{
		_dateTime = dateTime;
	}

	public IActionResult Index()
	{
		var serverTime = _dateTime.Now;
		return Ok(serverTime);
	}
}

Read More

C# Anonymous types objects

Anonymous types provide an easy way to encapsulate different properties in a single object. Unlike properties in a class, the properties of anonymous type objects are read-only.

Declare and use anon type objects

declaration

var loginCredentials = new { Username = "suresh", Password = "******" };
Console.WriteLine($"Username {loginCredentials.Username.ToString()}");
Console.WriteLine($"Password {loginCredentials.Password.ToString()}");

After assigning values to the Username and Password properties, they cannot be altered *(they’re readonly).

Anon objects in LINQ

Usually, anon data types are used in the select clause to return a specific subset of properties for each object in the collection.

we have the employee class

public class Employee
{
	public int ID { get; set; }
	public string Name { get; set; }
	public int Age { get; set; }
	public string Address { get; set; }
}
List<Employee> employees = // whatever ...
// we convert an Employee into an anon object type with a subset of properties
var employeeDetails = from emp in employees
	select new { Id = emp.ID, Name = emp.Name };

Reference(s)

https://www.syncfusion.com/blogs/post/understanding-csharp-anonymous-types

C# Collections

List vs ArrayList

Don’t use ArrayList. Use List<T> instead. ArrayList comes from times when C# didn’t have generics and it’s not the same ArrayList we have in Java.

DON’T

ArrayList array = new ArrayList();
array.Add(1);
array.Add("Pony"); // Later error at runtime if you try to operate with numbers.

do

List<int> list = [];
list.Add(1);
list.Add("Pony"); // Compilation error. 

retrieve data

List<string> list = [ "yes", "no"];
var firstItem = list[0];

Tuples

Tuples help us manipulate data sets easily without having to define a new class with custom properties.

One of its benefits is that they can be used as method params and return types.

// simple tuple
(int, string) employee = (23, "Yohannes");
Console.WriteLine($"{employee.Item2} is {employee.Item1} years old");

// tuple with named parameters
(int Age, string Name) employee2 = (29, "Ramon");
Console.WriteLine($"{employee2.Name} is {employee2.Age} years old");

// nested tuples
var employees = ((23, "Yohannes"), (29, "Ramon"));
Console.WriteLine(employees.Item1); // (23, "Yohannes")
Console.WriteLine(employees.Item1.Item1); // 23

Read More

C# 12 primary constructors

A concise syntax to declare constructors whose params are available anywhere in the body.

Primary constructors is an easier way to create a constructor for your class or struct, by eliminating the need for explicit declarations of private fields and bodies that only assign param values.

They are great when you just need to do simple initialization of fields and for dependency injection.

Code example using primary constructors

public class Book(int id, string title, IEnumerable<decimal> ratings)
{
	public int Id => id;
	public string Title => title.Trim();
	public int Pages { get; set; }
	public decimal AverageRating => ratings.Any() ? ratings.Average() : 0m;
}

another example

public class Person(string firstName, string lastName)
{
	public string FirstName { get; } = firstName;
	public string LastName { get; } = lastName;
}

Read More

C# 12 collection expressions

Collection expressions try to unify all syntaxes for initializing different collections. With them you can use the same syntax to express collections in a consistent way.

They cannot be used with var. You must declare the type.

// create an array
int[] intArray = [1, 2, 3, 4, 5];
// create empty array
int[] emptyArray = [];
// create a list
List<char> charList = ['D', 'a', 'v', 'i', 'd'];
// create a 2D list
List<int> int2dList = [[1, 2, 3], [4, 5, 6,], [7, 8, 9]];

how to flatten collections

int[] row0 = [1, 2, 3];
int[] row1 = [4, 5, 6];
int[] flattenCollection = [.. row0, 100, .. row1];
foreach(element in flattenCollection)
{
	Console.Write($"{element}, ");
}
// 1, 2, 3, 100, 4, 5, 6,

how to check for list emptiness.

if (item is [])
	Console.WriteLine("list is empty");

if (item is not [])
	Console.WriteLine("list is not empty");

Reference(s)

https://devblogs.microsoft.com/dotnet/refactor-your-code-with-collection-expressions/
https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-12#collection-expressions

C# class vs struct vs record (reference types vs value types)

Reference types vs value types

Reference types are allocated on the heap and garbage-collected, whereas value types are allocated on the stack so their (de)allocation are generally cheaper.

Arrays of reference types contain just references to instances of the reference type residing on the heap. Value type arrays hold the actual instance of the value type. Value type arrays are much cheaper than reference type arrays.

Value types get boxed when cast to a reference type or one of the interfaces they implement. They get unboxed when cast back to the value type. Boxes are objects that are allocated on the heap and are garbage-collected. Too much boxing and unboxing can have a negative impact.
In contrast, no such boxing occurs for reference types.

Reference type assignments copy the reference, whereas value type assignments copy the entire value. Assignments of large reference types are cheaper than assignments of large value types.

Changes to an instance of a reference type affect all references pointing to the instance. Value type instances are copied when they’re passed. When an instance of a value type is changed, it doesn’t affect any of its copies.

You can’t have a null value type. “null” only means a memory address reference is missing.

Class

Class is a reference type. When you compare reference types, this comparison checks if the compared objects are the same object (heap location).

You can specify access modifier if required (private, public, protected, internal).

class declaration

class Person
{
	public string Name {get; set;}
	public int Age {get; set;}
}

create an object

Person person = new Person();
person.Age = 30;
person.Name = "Mario";
Console.WriteLine($"{person.Name} {person.Age}"); // Mario 30

create another object using the first object we created

Person person2 = person; 
person2.Age = 29;
Console.WriteLine($"{person.Name} {person.Age}"); // Mario 29
Console.WriteLine($"{person2.Name} {person2.Age}"); // Mario 29

Read More

Interact with your pods

This details various ways that you can interact with the code and data running inside your pod.

Access a Pod via Network

If you want access to a specific pod, even if it’s not serving traffic on the internet. To achieve this, Kubernetes has support built into it.

kubectl port-forward <pod-name> <localport>:<podport>
kubectl port-forward istio 5015:80

A secure tunnel is created from your local machine, though the K8s master, to the instance of the Pod running on one of the worker nodes.

Getting More Info with Logs

When you need to debug.

# downloads the current logs from the running instance
kubectl logs kuard

# continuously streams logs
kubectl logs -f kuard

# get logs from a previous instance of the container. Useful if your containers are continuously restarting at startup. 
kubectl logs kuard --previous
  • -c if you have multiple containers in your Pod, you can choose the one to view

Read More