Debunking the "Filter Early, JOIN Later" SQL Performance Myth

Debunking the "Filter Early, JOIN Later" SQL Performance Myth

4 min read ·

EF Core is too slow? Discover how you can easily insert 14x faster (reducing saving time by 94%). Boost your performance with our method integrated within EF Core: Bulk Insert, update, delete, and merge. Join 5,000+ satisfied customers who have trusted our library since 2014. Learn more

NET devs, your app's performance is trying to tell you something. Are you listening? Trace Explorer makes it easy to spot trends, find bottlenecks, and debug slow spans—without drowning in logs. Search, filter, and aggregate to find what actually matters, like your slowest URLs, APIs & assets, jump straight to the culprit, and get back to shipping cool stuff. Check it out.

I came across a Medium article with 700+ claps promoting this "SQL performance trick":

"Filter Early, JOIN Later"

SQL performance tip that doesn't actually work.
Source: SQL Tricks that Cut My Query Time by 80%

The claim goes like this: instead of joining tables first and then filtering, you should filter in a subquery first, then join.

The supposed benefit?

The database filtered the smaller table first, then did the JOIN — saving time and memory.

Here is the thing - this advice is completely wrong for modern databases.

Let me show you why with actual data.

The Supposed "Optimization"

The article shows two queries. Here is the "bad" version:

SELECT *
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.total > 500;

And the "optimized" version:

SELECT *
FROM (
  SELECT * FROM orders WHERE total > 500
) o
JOIN users u ON u.id = o.user_id;

The claim is that the second query is faster because it "filters first, then joins."

Sounds logical, right? Wrong.

Testing with Real Data

I tested both queries on a PostgreSQL database with:

  • 10,000 users
  • 5,000,000 orders (500 per user)
  • Filtering for orders > $500

Let me run EXPLAIN ANALYZE on both queries to see what actually happens.

The Results

Here are the execution plans for both queries:

"Bad" Query Execution Plan:

Hash Join  (cost=280.00..96321.92 rows=2480444 width=27) (actual time=1.014..641.202 rows=2499245 loops=1)
  Hash Cond: (o.user_id = u.id)
  ->  Seq Scan on orders o  (cost=0.00..89528.00 rows=2480444 width=14) (actual time=0.006..368.857 rows=2499245 loops=1)
        Filter: (total > '500'::numeric)
        Rows Removed by Filter: 2500755
  ->  Hash  (cost=155.00..155.00 rows=10000 width=13) (actual time=0.998..0.999 rows=10000 loops=1)
        Buckets: 16384  Batches: 1  Memory Usage: 577kB
        ->  Seq Scan on users u  (cost=0.00..155.00 rows=10000 width=13) (actual time=0.002..0.341 rows=10000 loops=1)
Planning Time: 0.121 ms
Execution Time: 685.818 ms

"Optimized" Query Execution Plan:

Hash Join  (cost=280.00..96321.92 rows=2480444 width=27) (actual time=1.019..640.613 rows=2499245 loops=1)
  Hash Cond: (orders.user_id = u.id)
  ->  Seq Scan on orders  (cost=0.00..89528.00 rows=2480444 width=14) (actual time=0.005..368.260 rows=2499245 loops=1)
        Filter: (total > '500'::numeric)
        Rows Removed by Filter: 2500755
  ->  Hash  (cost=155.00..155.00 rows=10000 width=13) (actual time=1.004..1.005 rows=10000 loops=1)
        Buckets: 16384  Batches: 1  Memory Usage: 577kB
        ->  Seq Scan on users u  (cost=0.00..155.00 rows=10000 width=13) (actual time=0.003..0.348 rows=10000 loops=1)
Planning Time: 0.118 ms
Execution Time: 685.275 ms

The execution plans are identical.

Both queries took ~685ms. The "optimization" did absolutely nothing.

Here's the simplified execution plan, where I removed some details:

Hash Join
  Hash Cond: (o.user_id = u.id)
  ->  Seq Scan on orders o
        Filter: (total > '500'::numeric)
        Rows Removed by Filter
  ->  Hash
        ->  Seq Scan on users u

The core operations are:

  1. Sequential Scan on orders table with filter applied
  2. Sequential Scan on users table
  3. Hash operation to build hash table from users
  4. Hash Join using the hash condition on user_id

Query Optimizers Are Smarter Than You

Modern databases use cost-based optimizers. Here is what happens when you run a query:

  1. Parser turns your SQL into an abstract syntax tree
  2. Optimizer rewrites your query into the most efficient form
  3. Executor runs the optimized plan

The optimizer looks at your query and says: "I don't care how you wrote this. I will figure out the best way to execute it."

Both of our queries get rewritten to the same optimal plan:

  • Filter the orders table first (because that eliminates rows early)
  • Build a hash table from users (the smaller table)
  • Hash join the filtered orders with users

The optimizer already does the "optimization" automatically.

Your manual subquery does not make it faster - it just makes your SQL harder to read.

How Cost-Based Optimization Works

The query optimizer has statistics about your tables:

  • Row counts
  • Data distribution
  • Index availability
  • Column selectivity

It uses these stats to estimate the cost of different execution strategies:

  • Which table to scan first
  • Which join algorithm to use (hash, nested loop, merge)
  • When to apply filters
  • Which indexes to use

Then it picks the cheapest plan. Your well-intentioned manual "optimization" gets ignored because the optimizer knows better.

Summary

The "Filter Early, JOIN Later" advice is a relic from ancient database systems that did not have sophisticated optimizers.

Modern databases like PostgreSQL, MySQL, and SQL Server already do predicate pushdown and join reordering automatically. Your manual "optimizations" are pointless and make code harder to maintain.

Write clear, readable SQL. Let the optimizer do its job.

The real lesson? Stop believing every performance tip you read online. Use EXPLAIN ANALYZE to understand what your database is actually doing.


Whenever you're ready, there are 4 ways I can help you:

  1. (NEW) Pragmatic REST APIs: You will learn how to build production-ready REST APIs using the latest ASP.NET Core features and best practices. It includes a fully functional UI application that we'll integrate with the REST API.
  2. Pragmatic Clean Architecture: Join 4,000+ students in this comprehensive course that will teach you the system I use to ship production-ready applications using Clean Architecture. Learn how to apply the best practices of modern software architecture.
  3. Modular Monolith Architecture: Join 2,000+ engineers in this in-depth course that will transform the way you build modern systems. You will learn the best practices for applying the Modular Monolith architecture in a real-world scenario.
  4. Patreon Community: Join a community of 1,000+ engineers and software architects. You will also unlock access to the source code I use in my YouTube videos, early access to future videos, and exclusive discounts for my courses.

Become a Better .NET Software Engineer

Join 66,000+ engineers who are improving their skills every Saturday morning.