RAG on Snowflake: Cortex Search, Vector Types, and the DIY Menu

The question I get from teams already running their warehouse on Snowflake is never "should we do RAG" — it's "do we really need to stand up a separate vector database next to the platform we already trust with governance and access control." Most of the time, no. Snowflake's retrieval story has matured into a genuine menu — a fully managed option, a DIY option built on native primitives, and the unstructured and structured retrieval pieces that feed both — and picking the wrong one usually means either overpaying for managed convenience you didn't need, or hand-building infrastructure Snowflake already ships.

This is that menu, in the same spirit as RAG on GCP and RAG on AWS: what Cortex Search actually does under the hood, how to build retrieval yourself on the native VECTOR type, where Document AI and Cortex Analyst fit as the unstructured and structured retrieval halves, and a decision table for which path actually fits a given question. If you haven't read RAG from the ground up, that's the platform-agnostic foundation this builds on.

What does Cortex Search actually do, and why isn't it "just vector search"?

Cortex Search is Snowflake's fully managed retrieval service, and the detail worth knowing before reaching for it is that it isn't a pure vector-similarity engine — it's a hybrid retrieval pipeline that combines vector search for semantically similar documents, keyword search for lexically similar ones, and a final semantic-reranking pass that reorders the combined candidates by actual relevance. That hybrid design exists because pure vector similarity misses exact-term matches a keyword search catches trivially (a product SKU, an error code, a proper noun that embeddings don't represent distinctly), and pure keyword search misses paraphrases and semantic equivalents vector search is built for — combining both and reranking the union is a meaningfully better retrieval signal than either alone, which is exactly why "vector search" and "Cortex Search" aren't synonyms even though marketing sometimes blurs them.

Operationally, Cortex Search is close to zero-maintenance: point it at a table or a stage, configure the columns to index, and it manages embedding generation, index maintenance, and the hybrid ranking pipeline without you writing a retrieval pipeline by hand. That's the trade this article keeps coming back to across every platform in the RAG series — managed retrieval buys you correctness and low operational burden at the cost of less control over the exact scoring and chunking behavior than a DIY build gives you.

How do you build retrieval yourself on Snowflake's native VECTOR type?

Snowflake supports VECTOR as a first-class column type — VECTOR(FLOAT, 768) for a 768-dimension floating-point embedding, up to 4096 dimensions — which means embeddings can live in an ordinary table alongside the rest of your governed data, with no separate vector store to provision, secure, or keep in sync. Four similarity functions ship natively: VECTOR_COSINE_SIMILARITY, VECTOR_L2_DISTANCE, VECTOR_L1_DISTANCE, and VECTOR_INNER_PRODUCT — which means a similarity search is an ordinary SQL query, not a call into a separate system with its own API and its own access-control model to reconcile against Snowflake's.

-- Embeddings live in a normal table — no separate vector store
CREATE TABLE document_chunks (
    chunk_id STRING,
    document_id STRING,
    chunk_text STRING,
    embedding VECTOR(FLOAT, 768)
);

-- Similarity search is ordinary SQL: embed the query, then rank by distance
WITH query_vec AS (
    SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_768('e5-base-v2', 'refund policy for damaged goods') AS v
)
SELECT chunk_text,
       VECTOR_COSINE_SIMILARITY(embedding, (SELECT v FROM query_vec)) AS score
FROM document_chunks
ORDER BY score DESC
LIMIT 5;

The DIY path is the right call when you need retrieval logic Cortex Search's managed pipeline doesn't expose — a custom reranking model, a domain-specific chunking strategy, or a retrieval step that has to join against other governed tables in the same query rather than calling out to a separate search service — and it's the wrong call when the team just wants working retrieval without owning the embedding-freshness and index-maintenance problem, which is precisely what Cortex Search exists to absorb.

How do Document AI and Cortex Analyst feed the two different halves of retrieval?

Most real RAG systems need to retrieve from two structurally different sources, and Snowflake ships a dedicated tool for each. Document AI (the AI_PARSE_DOCUMENT function) handles the unstructured half — it takes PDFs, scanned documents, and other unstructured formats and converts them into structured, searchable content: extracting text and layout, pulling specific entities, tables, or fields, and classifying document types, which is the step that turns a folder of contracts or clinical PDFs into something either Cortex Search or a DIY vector table can actually index. Cortex Analyst handles the structured half — natural-language questions translated into SQL against your existing tables, using a semantic model (the same semantic-layer discipline covered in text-to-SQL and the semantic layer) rather than free-form schema guessing.

graph TD
    subgraph UNSTRUCTURED["Unstructured sources"]
        PDF["PDFs, scanned docs"]
        DOCAI["Document AI
(AI_PARSE_DOCUMENT)"]
        PDF --> DOCAI
    end
    subgraph STRUCTURED["Structured sources"]
        TBL["Tables, marts"]
        ANALYST["Cortex Analyst
(semantic model -> SQL)"]
        TBL --> ANALYST
    end
    DOCAI --> SEARCH["Cortex Search
or DIY VECTOR table"]
    SEARCH --> AGENT["Cortex Agents
(orchestrates both)"]
    ANALYST --> AGENT
    AGENT --> ANSWER["Answer grounded in
both structured + unstructured data"]

Retrieval on Snowflake usually needs both halves at once — Document AI turns unstructured files into something retrievable, Cortex Analyst turns a natural-language question into governed SQL over structured tables, and Cortex Agents is the layer that decides which tool a given question actually needs, sometimes both.

This is exactly the split building a full AI assistant on Cortex walks through end to end for a single application — this article is the broader retrieval-menu view across the whole platform, not a specific build.

How do you decide which retrieval path actually fits?

Approach	Best for	Cost of ownership
Cortex Search	Standard document/knowledge-base retrieval, teams that want managed hybrid search without building a pipeline	Low — Snowflake manages embeddings and indexing
DIY on VECTOR type	Custom scoring/reranking, retrieval that must join against other governed tables in one query	Higher — you own chunking, embedding refresh, and index strategy
Document AI	Turning unstructured PDFs/scans into retrievable, structured content	Low — a managed SQL function, not a pipeline you build
Cortex Analyst	Natural-language questions that are really SQL questions over structured data	Moderate — requires building and maintaining the semantic model

The pattern that resolves most "which one" debates: start with Cortex Search for anything that's genuinely a document/knowledge-base retrieval problem, because the managed path is faster to ship and Snowflake's hybrid ranking is already tuned reasonably well out of the box. Reach for the DIY VECTOR path only when a specific requirement — custom reranking logic, a single-query join against other governed data — can't be expressed through Cortex Search's configuration surface. And treat Document AI and Cortex Analyst as prerequisites feeding the other two, not competitors to them: the real decision is rarely "Cortex Search or Cortex Analyst," it's "does this question need unstructured retrieval, structured retrieval, or both," with Cortex Agents as the orchestration layer that routes between them once both are wired up.

The mistake I've seen cost the most rework is building a DIY vector-search pipeline for what turns out to be an entirely standard document-retrieval problem, because the team assumed "vector search" meant "custom build" by default. Cortex Search's hybrid ranking (vector plus keyword plus semantic rerank) genuinely outperforms a naive single-similarity-function DIY query for typical document retrieval, and re-deriving that ranking logic by hand is real, ongoing engineering work that buys nothing over the managed option for a use case that never needed custom scoring in the first place. Default to Cortex Search, and only justify a DIY build against a specific, named requirement it can't meet — not against a general preference for control.

What to carry away

RAG on Snowflake isn't one thing — it's a managed hybrid retrieval service (Cortex Search: vector plus keyword plus semantic rerank), a DIY path on the native VECTOR type and its four similarity functions when you need control the managed path doesn't expose, and two feeder services — Document AI for turning unstructured files into retrievable content, Cortex Analyst for turning natural-language questions into governed SQL over structured data — that both retrieval paths ultimately depend on.

Default to Cortex Search for standard document retrieval; it's faster to ship and its hybrid ranking beats a naive DIY implementation for the common case. Reach for the native VECTOR type only against a specific requirement the managed service can't meet. And remember that most real questions aren't purely structured or purely unstructured — Cortex Agents exists specifically to route between Cortex Analyst and Cortex Search rather than forcing you to pick one retrieval strategy for an entire application.