Snowflake UDFs and UDTFs: Scalar, Table Functions, Snowpark, and Performance

Every Snowflake account I've ever audited has the same artifact buried in it: a 60-line CASE expression, copy-pasted into thirty queries, each copy subtly different because someone fixed a bug in one and not the others. It's the business's definition of "is this claim billable" or "normalize this address," and it lives nowhere and everywhere at once. A user-defined function is the fix — encapsulate that logic once, name it, version it, and call it everywhere — and Snowflake's UDF system is far richer than most people use. You can write functions in SQL, Python, Java, or Scala; you can return a single value or an entire table; you can batch rows through pandas for throughput; and you can get all of it dangerously wrong from a performance standpoint if you don't understand what runs where. This is the deep version.

The first split to internalize, because everything else hangs off it: a scalar UDF takes a row's inputs and returns one value; a table function (UDTF) takes inputs and returns a set of rows. "One in, one out" versus "one in, many out." Get that distinction and the whole feature surface organizes itself.

Scalar UDFs: one row in, one value out

A scalar UDF behaves like a built-in function — you call it in a SELECT list, a WHERE clause, anywhere an expression is valid, and it returns a single value per row. The simplest and, crucially, the best-performing flavor is the SQL UDF, because Snowflake can inline its body into the calling query and optimize the whole thing as one plan. No separate runtime, no data movement — it's essentially a named, reusable expression.

-- SQL scalar UDF: encapsulate "net billable amount" once, use it everywhere
CREATE OR REPLACE FUNCTION net_billable(gross NUMBER, refunded NUMBER, currency STRING)
RETURNS NUMBER
AS
$$
  CASE WHEN currency = 'USD' THEN gross - refunded
       ELSE (gross - refunded) * fx_rate_to_usd(currency)
  END
$$;

SELECT order_id, net_billable(gross, refunded, currency) AS usd_net
FROM   orders;

When the logic outgrows SQL — you need a library, complex string processing, an ML model — you reach for a Python, Java, or Scala scalar UDF. These run in a secure sandbox (the Snowpark runtime), and you can package dependencies. Here's the same idea in Python, the language most teams now reach for:

CREATE OR REPLACE FUNCTION normalize_phone(raw STRING)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
HANDLER = 'norm'
AS
$$
import re
def norm(raw):
    if raw is None:
        return None
    digits = re.sub(r'\D', '', raw)
    return f"+1{digits}" if len(digits) == 10 else digits
$$;

The key architectural fact: a SQL UDF is inlined and free; a Python/Java UDF runs in a sandboxed process that Snowflake has to feed rows into and read results out of. That boundary is cheap per row but not zero, and it's the root of every UDF performance conversation below.

Table functions (UDTFs): one row in, many rows out

A UDTF returns a table. You call it in the FROM clause, usually joined laterally against a source table so it runs once per input row and emits zero-or-more output rows. The classic uses are explosion and per-group computation: flatten a JSON array into rows, split a delimited string, generate a sequence, or run a custom calculation over a partition that's awkward in pure SQL.

graph TD
    subgraph SCALAR["Scalar UDF — one in, one out"]
        R1["row: (gross, refunded, ccy)"] --> S1["net_billable(...)"] --> V1["one value: 84.20"]
    end
    subgraph UDTF["Table function (UDTF) — one in, many out"]
        R2["row: order with 3 line-item JSON array"] --> T1["explode_items(json)"] --> M1["row: item 1"]
        T1 --> M2["row: item 2"]
        T1 --> M3["row: item 3"]
    end
          

The defining difference. A scalar UDF maps each input row to exactly one value, so it slots into a SELECT or WHERE like any built-in. A UDTF maps each input row to a set of rows, so it lives in the FROM clause (joined laterally) — perfect for exploding nested structures, generating sequences, or emitting several rows of derived output per input. Choosing the wrong shape is the most common UDF design mistake.

A Python UDTF is a class with a defined lifecycle, and understanding that lifecycle is what separates a correct UDTF from a slow or wrong one. Snowflake calls process() once per input row (you yield output rows), and — if you partition the input — calls end_partition() once per partition after all its rows have been processed, which is where per-group results belong.

CREATE OR REPLACE FUNCTION top_n_per_group(score FLOAT, label STRING, n INT)
RETURNS TABLE (label STRING, score FLOAT)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
HANDLER = 'TopN'
AS
$$
class TopN:
    def __init__(self):
        self._rows = []          # state accumulates across the partition
    def process(self, score, label, n):
        self._rows.append((score, label))
        self._n = n
        # yield nothing per-row; we emit at end_partition
        return
    def end_partition(self):
        for score, label in sorted(self._rows, reverse=True)[: self._n]:
            yield (label, score)
$$;

-- called per partition: top 3 scores within each category
SELECT t.label, t.score
FROM   candidates,
       TABLE(top_n_per_group(candidates.score, candidates.label, 3)
             OVER (PARTITION BY candidates.category));

The UDTF lifecycle is the whole game: per-row work goes in process(), per-group work goes in end_partition(), and state lives on the instance between them. A fresh instance is created per partition, so you can safely accumulate in self during process() and compute the group result in end_partition() — that's how you express "rank within each category" or "sessionize these events" that plain SQL window functions can't quite reach. Always pass OVER (PARTITION BY ...) when your logic is per-group; without it, Snowflake may hand the whole input to one instance and you lose parallelism (and possibly correctness if your code assumed grouping). The partition clause is not optional decoration — it's how you tell Snowflake both how to parallelize and what a "group" means to your function.

The language choice, and what it costs

LanguageRuns whereBest forPerformance note
SQLInlined into the query planReusable expressions, business logic, simple transformsFastest — no sandbox, fully optimizable
PythonSnowpark sandboxLibraries, ML inference, complex parsing, the default for data teamsPer-row sandbox overhead; use vectorized for throughput
Java / ScalaSnowpark sandbox (JVM)Existing JVM libraries, high-performance compiled logicFast in-sandbox; JAR packaging overhead
JavaScriptEmbedded engineLegacy / procedural scalar logicGenerally slower than the above; rarely the right choice now

Vectorized Python UDFs: pay the boundary once per batch

Here's the single most important performance lever for Python UDFs. A normal Python scalar UDF is invoked once per row — the sandbox boundary is crossed for every row, and for a billion rows that overhead dominates. A vectorized (batch) Python UDF instead receives a pandas DataFrame of many rows at once and returns a Series, so the boundary is crossed once per batch and your code operates on whole columns at NumPy/pandas speed. For anything CPU-bound over many rows, this is routinely several times faster.

CREATE OR REPLACE FUNCTION score_risk(amount FLOAT, velocity FLOAT)
RETURNS FLOAT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
HANDLER = 'score'
PACKAGES = ('pandas','numpy')
AS
$$
import pandas as pd
from _snowflake import vectorized

@vectorized(input=pd.DataFrame)        # receive a batch, not a row
def score(df):
    return (df[0].clip(0, 1e6) / 1e6 * 0.7 + df[1].rank(pct=True) * 0.3)
$$;

The performance traps that make a UDF slower than SQL

A scalar UDF in a WHERE clause or a join key can quietly destroy a query, because it runs per row and blocks pruning. Two specific traps bite hardest. First: WHERE my_python_udf(col) = 'x' forces Snowflake to evaluate the UDF for every row before filtering, and — worse — it can defeat partition pruning and predicate pushdown, so you scan data you could have skipped. Push the cheap, native predicates first and apply the UDF to the survivors, or precompute the UDF result into a column. Second: a non-SQL scalar UDF on a join key means the sandbox runs for every probe — turn it into a materialized column instead. The rule of thumb: SQL UDFs are free to use anywhere; Python/Java scalar UDFs belong in the SELECT list over an already-filtered set, not in the predicates that decide what to scan. When a query with a UDF is mysteriously slow, the UDF's position in the query is the first thing I check, before the UDF's code.

Beyond placement, a few more performance realities worth holding:

  • Prefer SQL UDFs whenever the logic fits. They inline and optimize as part of the plan; a Python UDF that only does arithmetic is paying sandbox cost for nothing a SQL UDF couldn't do free.
  • Vectorize Python UDFs over large data. The per-row-vs-per-batch difference is the biggest single Python-UDF speedup available.
  • Mind cold starts and dependencies. A Python UDF with heavy PACKAGES (a large ML library) pays import cost; keep imports lean, and remember the first call to a function in a warehouse can be slower as the sandbox warms.
  • Watch UDTF state size. A UDTF that accumulates an entire huge partition in self before end_partition() can blow memory — design partitions to be bounded.

Security: owner's rights, caller's rights, and external access

UDFs are also a governance surface, and the model matters in regulated environments. By default a UDF runs with owner's rights — it executes with the privileges of whoever created it, which lets you expose a function that reads or transforms data the caller can't see directly (a controlled way to apply masking or lookups). Caller's rights functions run as the invoker instead, for when the function should only ever touch what the caller already can. Choosing deliberately between them is how a UDF becomes a safe abstraction over sensitive data rather than a privilege-escalation hole.

And when a Python UDF needs to reach outside Snowflake — call an API, hit a tokenization service — that's gated by an external access integration plus a network rule and secret, so egress is explicit and auditable rather than ambient. In a healthcare or finance context, that explicit, granted egress is exactly what an auditor wants to see.

Stored procedures vs UDFs: don't confuse them

A recurring confusion worth settling: a UDF computes and returns a value or table and is meant to be used inside queries; a stored procedure does things — it runs procedural logic, executes DDL/DML, orchestrates a workflow — and is called on its own, not embedded in a SELECT. If you need "compute a value I can use in a query," that's a UDF. If you need "run these ten steps, create tables, log results," that's a stored procedure (often written in Snowpark Python). Reaching for a stored procedure when you wanted a function — or vice versa — leads to awkward, slow designs.

When not to write a UDF at all

The honest counterweight: a UDF is the right tool less often than enthusiasts think. If a plain SQL expression or a view expresses the logic, use that — it's more transparent to the optimizer and to the next engineer. If the transformation is a pipeline stage, a dbt model is usually the better home than a UDF, because it's testable, documented, and materialized rather than recomputed on every query. Reserve UDFs for genuinely reusable, query-embeddable logic that SQL can't cleanly express — the billing rule, the address normalizer, the JSON exploder — and resist the urge to push entire transformations into Python functions just because you can. A warehouse full of opaque Python UDFs is its own maintenance burden.

What to carry away

Snowflake UDFs let you encapsulate logic once and call it everywhere, and the surface is bigger than most teams use: scalar functions (one value out) and table functions or UDTFs (a set of rows out), written in SQL, Python, Java, or Scala. SQL UDFs are inlined and effectively free; non-SQL UDFs run in a sandbox whose per-row boundary cost is the root of every performance question. The UDTF lifecycle — process() per row, end_partition() per group, state on the instance, OVER (PARTITION BY) to define and parallelize groups — is what lets you express per-group logic that window functions can't.

Make three habits automatic. Prefer SQL UDFs when the logic fits, and vectorize Python UDFs (batch pandas) when it doesn't and the data is large. Keep non-SQL scalar UDFs out of WHERE clauses and join keys, where they run per row and defeat pruning — apply them in the SELECT over an already-filtered set. And choose owner's vs caller's rights deliberately, gating any external calls through an external access integration. Used with that discipline, UDFs turn scattered, copy-pasted business logic into a governed, reusable, performant part of the platform — which is exactly what that 60-line CASE expression was always supposed to be. For how Snowflake executes all of this, see Snowflake internals.