The premise of this post is a weekend, not a quarter. If you already have a Rails app with Postgres and a table full of text — articles, support tickets, product descriptions, whatever — you can add semantic search without standing up a vector database, signing up for a SaaS, or rewriting your stack. The friction that stops most people is the assumption that “vector search” means a new piece of infrastructure. It doesn’t. pgvector is a Postgres extension, your embeddings are just another column, and the query is ORDER BY. This post is the rubric I’d hand a past version of myself: what semantic search actually is, the one decision that matters (chunking and dimensions), the pitfalls that eat your Saturday, and a concrete shape for the implementation. No client story, because the pattern doesn’t need one.
What “semantic search” means here, and what it doesn’t
Semantic search ranks rows by meaning instead of literal token overlap. You convert each piece of text into a vector — an array of floats produced by an embedding model — and at query time you embed the search string the same way, then find the nearest vectors. “Nearest” is cosine distance, usually. Text about “canceling a subscription” surfaces for a query of “how do I stop being billed” even though they share no keywords.
That’s the win. The scope, for one weekend, is narrow on purpose: you are adding a second ranking signal to an app that already works. You are not replacing your existing keyword search, and you are not building RAG yet.
The honest framing: pgvector gives you “good enough” retrieval that runs inside the database you already pay for. For most B2B apps with tens of thousands of rows, that is the whole game. I’ve written more on the storage side in Postgres RAG with pgvector — skip Pinecone, ship on one box; this post is the search half.
The core mechanic: embed, store, query, index
Four moving parts. Get these right and the rest is plumbing.
| Part | The decision | Sane default |
|---|---|---|
| Model | Which embedding model produces the vectors | A small hosted model (OpenAI text-embedding-3-small, or a Workers AI embedding model) — 768–1536 dims |
| Column | Where the vector lives | vector(N) column on the searchable table, N = model’s output dimension |
| Distance | How “nearest” is measured | Cosine (<=> operator), normalized vectors |
| Index | How you avoid a full scan | HNSW for read-heavy tables; skip the index entirely under ~10k rows |
The migration is unremarkable:
enable_extension "vector"
add_column :articles, :embedding, :vector, limit: 1536
add_index :articles, :embedding,
using: :hnsw,
opclass: :vector_cosine_ops
The model side is one method. You embed on write, store the result, and never recompute unless the text changes:
class Article < ApplicationRecord
def refresh_embedding!
update_column(:embedding, EmbeddingService.embed(body))
end
scope :semantic, ->(query, limit: 20) {
vector = EmbeddingService.embed(query)
nearest_neighbors(:embedding, vector, distance: "cosine").limit(limit)
}
end
The query itself is the part that surprises people: it’s just ORDER BY embedding <=> $1 LIMIT 20. There’s no separate index to consult, no second datastore to keep in sync, no eventual-consistency window. The vector is a column; the search is SQL.
The expensive operation is embedding, not searching. Embedding your entire backlog is an API spend and a one-time backfill job. Embedding a single query at search time costs one API call per search — cache it if your queries repeat.
Pitfalls that will eat your Saturday
The model fits in an afternoon. The day disappears in four predictable places.
Dimension mismatch. Your vector(N) column hardcodes N. Switch embedding models and the dimensions change — 1536 to 768 — and every stored vector is now garbage you can’t compare against. Pick the model first, pin it, and treat a model change as a full re-embed migration.
Backfilling synchronously. Embedding 50,000 rows in a Rake task that calls the API in a loop will rate-limit you and block for an hour. Batch it, push it to background jobs, and respect the provider’s limits. (On the queue choice, see Sidekiq vs Solid Queue in 2026.)
Chunking too coarse or not at all. Embedding a 5,000-word document as one vector averages its meaning into mush. A query about one paragraph matches weakly against the whole-document average. Chunk long text into passages — paragraphs or ~500-token windows — and store one vector per chunk.
Indexing too early. HNSW indexes have build cost and tuning knobs (m, ef_construction). Under ~10k rows, a sequential scan over cosine distance is fast enough that the index adds complexity for no measurable win. Add it when a query is actually slow, not preemptively.
Premature indexing is the same trap as premature optimization, wearing a Postgres hat. — Self note, after one too many
ef_constructionrabbit holes
The meta-pitfall: treating semantic search as a search replacement. Keyword search still wins for exact matches — product SKUs, names, error codes. The mature move is hybrid: run both, blend the scores. But that’s a second weekend, not this one.
What the pattern looks like end to end
Picture a help-center app: an articles table with title and body, already searchable by Postgres full-text. You want “fuzzy” search that understands intent.
Weekend shape:
-
Saturday morning — add the extension and the
vectorcolumn. Pick the embedding model and pin its dimension. WriteEmbeddingService.embed(text)as a thin wrapper over one API. (For the wrapper patterns and fallback handling, dropping AI into an existing Rails app with ruby_llm is the checklist I’d follow.) -
Saturday afternoon — chunk and backfill. Split each
bodyinto passages, embed each in batched background jobs, store one vector per chunk in anarticle_chunkstable that belongs toarticles. Watch the rate limits. -
Sunday morning — wire the query. A
semanticscope embeds the search string and returns nearest chunks, then loads the parent articles. Dedupe by article, since multiple chunks from one document can all rank. -
Sunday afternoon — evaluate. Hand-write 15–20 real queries with the answer you expect on top. Run them. Eyeball whether the right article ranks in the top three. This is your regression suite — crude, but it catches a bad model swap instantly.
The thing I’d flag to a founder: the embedding spend is small and predictable at this scale, but it’s recurring — every new article and every uncached query costs a fraction of a cent. Budget it as a line item, not a one-time cost. If your query volume is high, the embedding calls — not the Postgres queries — are what you’ll watch.
What you deliberately leave out this weekend: hybrid ranking, re-ranking models, query expansion. They’re real improvements. They’re also the difference between “shipped” and “still tuning in three weeks.”
What done looks like
Done is not “the search feels smart.” Done is falsifiable:
- Your 15–20 hand-labeled queries return the expected result in the top three, and you’ve written them down so a model swap can be re-checked in minutes.
- Embeddings refresh automatically when source text changes — an
after_updatecallback or a job, so you never serve a vector that describes old text. - The backfill is idempotent. Re-running it doesn’t double-embed or corrupt rows.
- A single search adds one embedding API call plus one indexed Postgres query — and you can point to both in the logs.
- You have a documented, pinned model and dimension. The number
1536lives in exactly one place.
If you can demo a query that returns the right answer with zero shared keywords, and explain why each of the above holds, you’re done. The temptation is to keep tuning until it’s perfect. Resist. Good retrieval that ships beats perfect retrieval that doesn’t.
When this approach does not apply
Skip pgvector when your corpus is genuinely huge — hundreds of millions of vectors with low-latency demands — where a purpose-built vector store earns its operational cost. Skip it when your “search” is really structured filtering (price, date, status) that SQL WHERE already nails; embeddings add nothing there.
And skip it when keyword search is already good enough and nobody’s complaining. Semantic search is a real feature with a real recurring cost. Adding it because it’s fashionable is how a weekend becomes a maintenance burden.
The falsifiable bit
Here’s the claim I’ll stand behind: for a Rails app under roughly a few hundred thousand documents, adding pgvector semantic search does not require any new infrastructure beyond a Postgres extension and one embedding API — and the search query is a single indexed ORDER BY. If you find yourself provisioning a separate vector database to get acceptable results at that scale, the bottleneck is almost certainly your chunking or your model choice, not Postgres.