#blog

3 paragraphs

Saturday, June 20, 2026

Today, I worked on creating script for the #cosine-similarity blog. I used the BGE-M3 to create initial dense embeddings. I retrieve top 50 results and do re-ranking using BGE-reranker-v2 to get top 5 results. I have setup search and retrieval pipelines for GitHub commits and Youtube videos. I can fetch commits based on repo name, and date range. For YouTube videos, I use a database from kaggle. Tomorrow, I plan to finish up the presentation and add a #blog entry.
- #blog
- #cosine-similarity

Today, I learnt about Elo rankings. My aim today was to create a World Cup 2026 prediction but with a twist. Instead of users picking winners for each individual game, users pick teams with semi-random matchups that create Elo rankings. I also use match results from recent games and use Bradley Terry model to create a beginning ranking but at a lower weight. I realized that looking at recent games was not reliable since teams like Senegal were getting ranked higher than France. I chose to use FIFA’s ranking as a stronger signal. Anyways, I should write a #blog in detail about what I learned but one surprising fact was that Elo ranking and Bradley Terry model are the same model with Elo being the online model where Bradley Terry works on batches of data. Also, it is preferable to have decaying K for Elo ranking. #projects-showcase #wc2026

I started a #blog for #cosine-similarity. This blog starts with a visual of the definition of the cos function. The blog then talks about vectors and how the cos theta function value is 0 when the vectors overlap and how the value of the function is 1 when the vectors are orthogonal. This is essentially the dot product of the vectors. I then created a script that used word embeddings using the gensim package to illustrate how learned word embeddings exist in a latent space that encodes semantic meaning of the words. For example, we can do things like Queen = Kind - Male + Female in the latent space. However, this is not exactly but we use nearest neighbor search to find the closest vector. Next work is to play with sentence embeddings and create the GitHub commit understanding examples.
- #blog
- #cosine-similarity