The premise of this post is a weekend, not a quarter. If you already have a Rails app with Postgres and a table full of text — articles, support tickets, product descriptions, whatever — you can add semantic search without standing up a vector database, signing up for a SaaS, or rewriting your stack. The friction that stops most people is the assumption that “vector search” means a new piece of infrastructure. It doesn’t. pgvector is a Postgres extension, your embeddings are just another column, and the query is ORDER BY. This post is the rubric I’d hand a past version of myself: what semantic search actually is, the one decision that matters (chunking and dimensions), the pitfalls that eat your Saturday, and a concrete shape for the implementation. No client story, because the pattern doesn’t need one.

What “semantic search” means here, and what it doesn’t

Semantic search ranks rows by meaning instead of literal token overlap. You convert each piece of text into a vector — an array of floats produced by an embedding model — and at query time you embed the search string the same way, then find the nearest vectors. “Nearest” is cosine distance, usually. Text about “canceling a subscription” surfaces for a query of “how do I stop being billed” even though they share no keywords.

That’s the win. The scope, for one weekend, is narrow on purpose: you are adding a second ranking signal to an app that already works. You are not replacing your existing keyword search, and you are not building RAG yet.

The honest framing: pgvector gives you “good enough” retrieval that runs inside the database you already pay for. For most B2B apps with tens of thousands of rows, that is the whole game. I’ve written more on the storage side in Postgres RAG with pgvector — skip Pinecone, ship on one box; this post is the search half.

The core mechanic: embed, store, query, index

Four moving parts. Get these right and the rest is plumbing.

PartThe decisionSane default
ModelWhich embedding model produces the vectorsA small hosted model (OpenAI text-embedding-3-small, or a Workers AI embedding model) — 768–1536 dims
ColumnWhere the vector livesvector(N) column on the searchable table, N = model’s output dimension
DistanceHow “nearest” is measuredCosine (<=> operator), normalized vectors
IndexHow you avoid a full scanHNSW for read-heavy tables; skip the index entirely under ~10k rows

The migration is unremarkable:

enable_extension "vector"

add_column :articles, :embedding, :vector, limit: 1536

add_index :articles, :embedding,
  using: :hnsw,
  opclass: :vector_cosine_ops

The model side is one method. You embed on write, store the result, and never recompute unless the text changes:

class Article < ApplicationRecord
  def refresh_embedding!
    update_column(:embedding, EmbeddingService.embed(body))
  end

  scope :semantic, ->(query, limit: 20) {
    vector = EmbeddingService.embed(query)
    nearest_neighbors(:embedding, vector, distance: "cosine").limit(limit)
  }
end

The query itself is the part that surprises people: it’s just ORDER BY embedding <=> $1 LIMIT 20. There’s no separate index to consult, no second datastore to keep in sync, no eventual-consistency window. The vector is a column; the search is SQL.

The expensive operation is embedding, not searching. Embedding your entire backlog is an API spend and a one-time backfill job. Embedding a single query at search time costs one API call per search — cache it if your queries repeat.

Pitfalls that will eat your Saturday

The model fits in an afternoon. The day disappears in four predictable places.

Dimension mismatch. Your vector(N) column hardcodes N. Switch embedding models and the dimensions change — 1536 to 768 — and every stored vector is now garbage you can’t compare against. Pick the model first, pin it, and treat a model change as a full re-embed migration.

Backfilling synchronously. Embedding 50,000 rows in a Rake task that calls the API in a loop will rate-limit you and block for an hour. Batch it, push it to background jobs, and respect the provider’s limits. (On the queue choice, see Sidekiq vs Solid Queue in 2026.)

Chunking too coarse or not at all. Embedding a 5,000-word document as one vector averages its meaning into mush. A query about one paragraph matches weakly against the whole-document average. Chunk long text into passages — paragraphs or ~500-token windows — and store one vector per chunk.

Indexing too early. HNSW indexes have build cost and tuning knobs (m, ef_construction). Under ~10k rows, a sequential scan over cosine distance is fast enough that the index adds complexity for no measurable win. Add it when a query is actually slow, not preemptively.

Premature indexing is the same trap as premature optimization, wearing a Postgres hat. — Self note, after one too many ef_construction rabbit holes

The meta-pitfall: treating semantic search as a search replacement. Keyword search still wins for exact matches — product SKUs, names, error codes. The mature move is hybrid: run both, blend the scores. But that’s a second weekend, not this one.

What the pattern looks like end to end

Picture a help-center app: an articles table with title and body, already searchable by Postgres full-text. You want “fuzzy” search that understands intent.

Weekend shape:

  1. Saturday morning — add the extension and the vector column. Pick the embedding model and pin its dimension. Write EmbeddingService.embed(text) as a thin wrapper over one API. (For the wrapper patterns and fallback handling, dropping AI into an existing Rails app with ruby_llm is the checklist I’d follow.)

  2. Saturday afternoon — chunk and backfill. Split each body into passages, embed each in batched background jobs, store one vector per chunk in an article_chunks table that belongs to articles. Watch the rate limits.

  3. Sunday morning — wire the query. A semantic scope embeds the search string and returns nearest chunks, then loads the parent articles. Dedupe by article, since multiple chunks from one document can all rank.

  4. Sunday afternoon — evaluate. Hand-write 15–20 real queries with the answer you expect on top. Run them. Eyeball whether the right article ranks in the top three. This is your regression suite — crude, but it catches a bad model swap instantly.

The thing I’d flag to a founder: the embedding spend is small and predictable at this scale, but it’s recurring — every new article and every uncached query costs a fraction of a cent. Budget it as a line item, not a one-time cost. If your query volume is high, the embedding calls — not the Postgres queries — are what you’ll watch.

What you deliberately leave out this weekend: hybrid ranking, re-ranking models, query expansion. They’re real improvements. They’re also the difference between “shipped” and “still tuning in three weeks.”

What done looks like

Done is not “the search feels smart.” Done is falsifiable:

If you can demo a query that returns the right answer with zero shared keywords, and explain why each of the above holds, you’re done. The temptation is to keep tuning until it’s perfect. Resist. Good retrieval that ships beats perfect retrieval that doesn’t.

When this approach does not apply

Skip pgvector when your corpus is genuinely huge — hundreds of millions of vectors with low-latency demands — where a purpose-built vector store earns its operational cost. Skip it when your “search” is really structured filtering (price, date, status) that SQL WHERE already nails; embeddings add nothing there.

And skip it when keyword search is already good enough and nobody’s complaining. Semantic search is a real feature with a real recurring cost. Adding it because it’s fashionable is how a weekend becomes a maintenance burden.

The falsifiable bit

Here’s the claim I’ll stand behind: for a Rails app under roughly a few hundred thousand documents, adding pgvector semantic search does not require any new infrastructure beyond a Postgres extension and one embedding API — and the search query is a single indexed ORDER BY. If you find yourself provisioning a separate vector database to get acceptable results at that scale, the bottleneck is almost certainly your chunking or your model choice, not Postgres.