Index Types Beyond B-Trees: Hash, Bitmap, and Inverted Indexes

Ask most engineers to index a column and the reflex is automatic: CREATE INDEX, B-tree, done. It's the right reflex most of the time — B-trees are the generalist tool for a reason, and B-trees vs LSM-trees covers why they dominate range-friendly, mixed read/write workloads. But I've watched teams reach for a B-tree on a boolean flag column with two possible values, and on a full-text description field, and on a wide analytical fact table filtered by five low-cardinality dimensions at once — and get mediocre results in all three cases, not because B-trees are bad, but because none of those are the query shape a B-tree is built for. There are three other structures worth knowing by name, each purpose-built for an access pattern the generalist gets wrong.

What is a hash index, and why is it a narrower tool than it looks?

A hash index maps a key to a bucket via a hash function, giving O(1) equality lookups — genuinely faster than a B-tree's O(log n) for the one thing it does, "is this exact value present, and where." The trade is total: a hash index has no concept of order, so it cannot serve a range query, a sort, or even a prefix match — WHERE status = 'active' is exactly what it's built for, WHERE created_at > '2022-01-01' is a query type it cannot answer at all. Postgres supports hash indexes directly, and the honest reason they're used far less often than B-trees isn't that they're broken — it's that most real-world query patterns eventually need a range predicate somewhere, even on a column that's mostly filtered by equality, and a B-tree already covers the equality case adequately while also covering the range case a hash index can't. The narrow lane where a hash index genuinely wins is a column queried exclusively by exact-match equality, at high volume, where the marginal speed gain over a B-tree's equality performance is worth giving up range-query capability entirely.

What is a bitmap index, and why does it fit analytical queries so well?

A bitmap index represents each distinct value in a column as a bitmap — one bit per row, set to 1 if that row has the value, 0 otherwise. For a low-cardinality column (a status enum, a region, a boolean), this is remarkably compact, and it turns multi-predicate filtering into bitwise operations: WHERE region = 'west' AND status = 'active' becomes an AND between two bitmaps, computed at hardware speed, with the result bitmap directly identifying every matching row without touching the underlying data until the final scan. This is exactly the shape of query analytical workloads generate constantly — several low-cardinality filters combined — which is why bitmap indexes are an Oracle staple for data warehousing, and why the same underlying idea shows up, in a different guise, inside modern OLAP engines: Druid and Pinot's segment-level filtering on low-cardinality dimensions is conceptually a bitmap-index operation even where the implementation detail differs from Oracle's classic bitmap index object.

The reason bitmap indexes fall apart on high-cardinality columns is the same reason they excel on low-cardinality ones: a bitmap per distinct value means a column with millions of distinct values needs millions of bitmaps, and both the storage cost and the maintenance cost (updating potentially many bitmaps per write) explode. Bitmap indexes also tend to be a poor fit for heavy, concurrent OLTP write workloads for the same reason — a single row update can require touching multiple bitmaps — which is part of why they're an analytical-workload tool specifically, not a general-purpose replacement for a B-tree.

Index type	Query shape it serves	Cardinality it wants	Weakness
B-tree	Equality, range, sort, prefix	Any	Generalist — not the fastest at any single pattern
Hash	Equality only	Any	Zero range-query capability
Bitmap	Multi-predicate AND/OR filtering	Low (few distinct values)	Explodes in storage/maintenance at high cardinality; poor OLTP write fit
Inverted	Full-text search, "contains this term"	N/A (indexes terms within text)	Not a general column index — built specifically for text

What is an inverted index, and why is it the structure behind every full-text search engine?

An inverted index flips the natural document-to-content relationship: instead of storing, for each document, the text it contains, it stores, for each term, the list of documents (or rows) that contain it — a term-to-document-ID mapping, the "inverted" direction relative to how the data was originally organized. This is exactly the structure that makes "find every document containing the word 'timeout'" fast at scale: instead of scanning every document's text, look up the single postings list for "timeout" and you have your answer directly, with no scan at all. Elasticsearch's entire search capability is built on Lucene's inverted index implementation — that's covered in depth there, so this article won't re-derive Lucene's segment architecture, just place the inverted index alongside hash and bitmap indexes as the third named structure in this survey, purpose-built for the one query shape neither of the other two can serve at all: "which rows contain this term," as opposed to "which rows equal this value."

graph TD
    subgraph HASH["Hash index"]
        HK["key -> bucket"] --> HV["O(1) equality only"]
    end
    subgraph BM["Bitmap index"]
        BV1["value A: 1 0 1 1 0"]
        BV2["value B: 0 1 0 0 1"]
        BV1 -->|"bitwise AND/OR"| BR["Fast multi-predicate filter"]
        BV2 -->|"bitwise AND/OR"| BR
    end
    subgraph INV["Inverted index"]
        TERM["term: 'timeout'"] --> POST["postings list:
doc 4, doc 19, doc 203..."]
    end

Three structures optimized for three different questions a B-tree either can't answer at all (inverted) or answers less efficiently (hash for pure equality, bitmap for combined low-cardinality predicates). None of the three replaces a B-tree generally — each earns its place only when the query shape genuinely matches what it was built for.

How do you actually decide which index type a column needs?

Start from the query, not the column. If the query is WHERE column = value exclusively, with genuinely no range or sort requirement anywhere in the workload, a hash index is worth the narrow bet — but confirm "exclusively" carefully, because it's a common mistake to assume a column is equality-only and discover a range query six months later that a hash index simply cannot serve. If the query combines several low-cardinality filters — the classic analytical WHERE region = X AND tier = Y AND status = Z — a bitmap index (or the equivalent segment-filtering mechanism inside an OLAP engine built for that pattern) turns what would be a multi-condition scan into cheap bitwise operations. If the query needs "contains this word or phrase" rather than "equals this exact value," nothing but an inverted index actually answers that question — a B-tree, hash, or bitmap index on a text column can support exact-match or prefix queries at best, never genuine full-text search. And for everything else — the default, still-correct-most-of-the-time case of equality, range, sort, and prefix matching on any cardinality — a B-tree remains the right generalist choice, which is exactly why it's the default in nearly every database rather than an oversight.

I've seen a team add a bitmap-style low-cardinality filter to a column that looked low-cardinality in a sample and turned out to have thousands of distinct values in production — the index went from a performance win to a storage and write-latency liability almost overnight. Cardinality isn't a one-time property you check at index-creation time and forget; a status enum that starts with three values can grow to fifteen as the product evolves, and a region column can quietly become a per-store-location column as the business expands. Before committing to a cardinality-sensitive structure like a bitmap index, check actual production cardinality, not a development sample, and revisit that assumption whenever the underlying business dimension it represents changes shape.

What to carry away

A B-tree is the right default because it's a genuine generalist — equality, range, sort, and prefix matching all work adequately on it, which covers most real query patterns without needing a specialized structure. The three alternatives here each exist because a specific, common query shape gets served meaningfully better by giving up generality: a hash index trades away range-query capability entirely for faster pure-equality lookups; a bitmap index trades away high-cardinality efficiency for near-free multi-predicate filtering on the low-cardinality dimensions that dominate analytical workloads; an inverted index answers a question — "which rows contain this term" — that none of the row-oriented, value-comparison structures can answer at all.

The decision framework is simple to state and easy to skip in practice: match the index type to the actual query shape and actual column cardinality, verified against production data rather than assumed from a sample, and default back to a B-tree whenever the workload doesn't clearly and durably match one of the three specialized patterns above.