Filtered ANN Is a Retrieval Problem
Filtered ANN gets awkward in job search because the query usually mixes semantic intent with hard constraints. Take a search for React jobs in Columbus, Ohio as an example. A global ANN search can find semantically similar jobs, but many of the strongest matches may come from larger software markets. If location is applied afterward, a large share of the candidate set disappears before the system finds enough valid rows. The obvious alternative has the opposite failure mode: filter to Columbus first, then rank by vector distance. That keeps the geography right, but the local pool is still semantically noisy. It includes retail, warehouse, nursing, operations, support, and other roles that are irrelevant to the actual query. Geography narrows the corpus, but not along the dimension that matters most.
On our corpus, that scale difference is large enough to matter. A geography-only Ohio slice still leaves roughly 36k active recent jobs, and Columbus still leaves roughly 4.7k. But the software-shaped slice inside those pools is only in the hundreds, and the narrower frontend or full-stack slice is only in the tens. Looking from the other direction, a global semantic ANN search is competing against a software corpus of roughly 70k jobs, while the Ohio or Columbus portion is only a small fraction. Without another layer, location-first search starts too broad, and semantic-first search starts in the wrong region.
That is the retrieval problem. In filtered ANN, neither post-filtering nor geography-only partitioning is a satisfying coarse retrieval strategy. Post-filtering applies structure too late. Geography-only partitioning keeps the system simple, but the partitions are still semantically messy.
The Middle Path - Semantic Gating as a Partitioning Scheme
What we want is a coarse retrieval step that respects meaning earlier, without requiring a fully global ANN pass before constraints are applied. That is where semantic gating fits.
Our approach is to insert a semantic partitioning layer between those extremes. Instead of treating the corpus as one large embedding space, or as a set of location shards, we build a category layer in embedding space and use it as a coarse gate. At query time, we first match the query embedding against a known set of category centroids, use those matches to narrow the eligible pool, and only then run final vector ranking inside that reduced set.
Building the Category Layer
The category layer starts with a requirement-centric representation of jobs. Rather than embedding the raw posting indiscriminately, the indexing path emphasizes the parts of the document that best describe the role. That keeps the embedding focused on the actual job instead of generic page text or boilerplate, which makes it more useful for both retrieval and category assignment.
From there, we form semantic groups offline and compute a centroid for each group. Those centroids become reusable category representations stored in the database. The implementation has evolved over time, but the basic pattern is stable: form a semantic layer once, store an average vector for each category, and reuse that layer at query time instead of rediscovering corpus structure from scratch on every search. Each job is then matched to one or more categories at write time, and those matches are persisted. That is what makes the design fit naturally into Postgres. Semantic gating becomes a lookup against known categories plus ordinary relational joins, not a custom online clustering step.
Query-Time Flow
At query time, the system embeds the user query and finds the nearest categories. Those categories define a coarse semantic slice of the corpus. The search then builds an eligible set by intersecting category membership with the usual relational constraints such as freshness, activity, and location. Only after that does it run vector ordering over the reduced pool.
The key shift is that semantic gating is part of candidate generation, not something bolted on after retrieval.
Why This Works Well in Postgres
Once the semantic layer is explicit, Postgres can query the nearest categories, join through persisted category membership, apply the usual filters, and only then run final vector ranking on the reduced set.
Conceptually, the query looks like this:
WITH nearest_categories AS (
SELECT id
FROM jobcategory
ORDER BY centroid_embedding <=> $query_embedding
LIMIT 5
),
eligible_jobs AS (
SELECT DISTINCT j.id, j.embedding_512
FROM jobs j
JOIN jobcategorymatch m
ON m.job_id = j.id
WHERE m.category_id IN (SELECT id FROM nearest_categories)
AND j.is_active = true
AND j.city = 'Columbus'
AND j.state = 'OH'
)
SELECT id
FROM eligible_jobs
ORDER BY embedding_512 <=> $query_embedding
LIMIT 25;
Tradeoffs
This does not eliminate the usual tradeoffs. The category layer has to stay clean enough to be useful, and any partitioning scheme can become too coarse or too narrow. Cross-category roles still need care, especially when the same posting legitimately belongs to more than one semantic neighborhood. That is why semantic gating works better as a soft partitioning layer than as a rigid taxonomy. It should narrow the search space aggressively enough to help retrieval, but not so aggressively that it cuts off legitimate edge cases.
There is also an indexing tradeoff. Another path would be to use HNSW with filtered retrieval strategies, but HNSW is too memory intensive for our use case. For now, semantic gating has been the more useful lever.
Closing
Semantic gating is not new at the algorithm level. Coarse-to-fine retrieval already exists in established vector search systems. The difference here is that the partitioning layer is explicit and persistent in Postgres rather than hidden inside a vector index. That matters when semantic narrowing has to interleave with relational filters such as freshness, activity, and location before final ranking runs.