So following up on 12 Jan 2025: I ain’t afraid of nothin’, I was ready to take the plunge into vector database and all the complexity that entails (using a CDC pattern to sync data between my Postgres and my new vector database etc.).
However, after speaking to Ammar, he gave me two suggestions:
The TLDR is this worked like a charm. While looking up on this (for both the theoretical basis + the accepted best practice), I could not find very much except for:
This OpenAI blog post, which says
[O]n the MTEB benchmark, a text-embedding-3-large embedding can be shortened to a size of 256 while still outperforming an unshortened text-embedding-ada-002 embedding with a size of 1536.
A friend pointed out that vectors that are Matryoshka (??) can have their dimensions reduced this way. In the same OpenAI blog post, the footnote links to this paper, which contains the following:
To be honest, this wasn’t very much to go on and I also did not fully understand the theory behind shortening vectors. I was sceptical: even if this successfully sped up the search, isn’t it not scalable? What if the size of the database gets even larger?
But I knew the only way to know for sure is to try, and I tested the same queries across:
At least for my use case, this was unreasonably effective:
While I am not afraid of toil and hard work, this was truly an example of the 80/20 rule. Setting up a new database with a shortened vector column was pretty easy and the results are really good. In fact, it took much longer to sync the embeddings (and exposed a few flaws in my syncing process). While I was waiting for the sync to complete, I decided to try the in-memory vector index approach.
The idea is an enticing as it sounds crazy. But the napkin math works out: