Finney Koshy
Product
We just finished building a recommendation system using RAG for a client who needed to surface relevant content from thousands of blog posts based on a user's preference. The constraint: It needed to work on demand, without the user needing to log in. It works great, but we learned some expensive lessons along the way.
Here's what we wish we'd known before we started.
Planning and experimenting with the vector retrieval strategy was an important step of the project. The recommendation system would match user preferences against thousands of blogs, and figuring out what to actually embed was the key for accuracy and speed. It made a huge impact. Here’s how:
Think of it like Tinder. Imagine if there were no defined profile fields on the potential matches - no height, job, location, each match has freedom to put whatever they want. Match #1 could writes about his favorite foods. Match #2 only has a letter of recommendation from his job on his profile. How would you match with people effectively? You can't.
This is the problem we encountered with our recommendation system. We initially tried embedding entire blog posts, thinking more data would mean better matches. Wrong. The system was drowning in noise.
The breakthrough came when we realized we needed to think backwards from the user interface. Users weren't typing free-form queries - they were filling out tight, controlled forms. They'd select activities like "outdoor," "cuisine," "fitness”, expense levels using dollar signs ($, $$ , $$$), exertion levels, time commitments, and other structured preferences.
So instead of trying to match messy blog content against messy user input, we used AI to generate those exact same structured fields for every blog. If a user selected "outdoor + $$ + moderate exertion," we could do a direct one-to-one match against blogs that we'd already categorized with those exact same parameters.
It was like creating a universal translation layer. Users spoke in structured preferences, blogs got translated into the same structured language, and matching became clean and precise instead of hoping vector similarity would figure it out.
Quick note: Why not just use SQL filtering?
You might be wondering - if everything is so structured, why not just use basic filtering or SQL queries? Why do we even need RAG at all?
The beauty of RAG was getting approximations and ranges, even when the initial query wasn't perfectly met. Think about Tinder again - you might rule out anyone under 6 feet (common and disappointing, we know). But with RAG, you could get nearby ranges: 6'1" or 5'11", which is pretty close.
That Tinder example is basic because those are just numbers and you can do similar numbers easily. But with RAG, you could match similar concepts that SQL can't handle. "Outdoor" could match with "beach" or "hiking." "Low-key" might match with "relaxed" or "casual." These words have semantic similarity that RAG picks up automatically, but regular SQL index filtering would miss completely.
So we got the precision of structured data with the flexibility of semantic matching. Users got exactly what they asked for, plus relevant alternatives they might not have considered. It was the best of both worlds - structured enough to be accurate, smart enough to be useful when perfect matches didn't exist.
As we built this RAG powered recommendation system, one thing we explored was rerankers. We heard a lot about rerankers, from blog posts to AI tech conferences. A lot of AI experts have mentioned them.
So naturally, we had to try it.
And honestly? The rerankers worked. Our accuracy jumped from 85% to about 90% relevant results. Pretty impressive improvement.
But then we did the math on what it would actually cost to rank 2-3k blogs per query.
The reality is, it's not worth it. The costs were way too high.
Cohere's Rerank 3.5 charges $2 per 1,000 searches, where each search handles up to 100 documents. With 2,000-3,000 blogs to rank, we needed 20-30 search units per recommendation request. That's $0.05 per query.
When you scale that up, it gets ugly fast:
The reranker did help, but not enough to justify spending hundreds per day for that marginal boost.
Our structured metadata approach was already performing well. For high-volume applications, sometimes good enough actually is good enough - especially when the alternative costs a fortune.
For the vector database we chose Supabase. It's widely used, easy to setup and has a generous free tier. You can get a database, authentication, APIs, everything you need to get an app running fast all in one. Perfect for prototyping and building apps without overhead.
When we saw they had vector search built in, it felt like the obvious choice. Keep everything in one place, keep it simple, keep it cheap.
For our initial testing with a few hundred blogs, it worked perfectly. But then we scaled up to our full dataset.
At 1,000+ blogs, Supabase started timing out. At 2,000-3,000 blogs, it consistently failed.
It was clear that we were hitting the limits of the free and simple version of supabase and we would need some more fine-tuned controls and more powerful instances like on their paid tiers.
At that point, if we're going to pay for better performance and deal with infrastructure complexity anyway, why not just use a dedicated vector database like Pinecone from the start? In our case we ended up investing some time into researching indexes in supabase and upgrading our tier to handle a longer timeout and that fixed the issue.
The lesson here: if you know you'll need to scale beyond a few thousand documents, start with a dedicated vector database from the beginning. The learning curve is worth avoiding the migration headache later.
Building a RAG recommendation system taught us that the fundamentals matter more than the fancy features. Getting your data structure right from the start will save you more headaches than any expensive reranking tool. Understanding your scale requirements early prevents painful migrations later.
The system works great now - fast, accurate, and cost-effective. These were valuable lessons learned through actually building and testing in production.
If you're building something similar, start with structured metadata, skip the rerankers unless you have unlimited budget, and choose your vector database based on your actual scale needs, not convenience.
Read by 10,000+ AI professionals and builders.
Welcome! I'm Chris from Aloa, and this is the first post in a series where we're building an AI medical transcription app from scratch. If you're interested in healthcare AI or just want to see some practical applications of technology, this series is for you. Today we're focusing on the transcription engine, which is the core technology that converts a doctor's dictation into text.
We analyzed 2,540 top keywords to see just how bad it is—and who’s hit hardest.
RAG (Retrieval Augmented Generation) combines LLMs with external data sources for enhanced AI responses. While perfect for simple Q&A and chatbots with custom data, our real-world implementation revealed significant limitations with accuracy, debugging, and complex queries that required a more sophisticated multi-layered approach.