Why N matters

Table of Contents

Recently, I became an unplanned repository maintainer. One of my open-source projects gained traction, and with that came a wave of pull requests; people adding features, fixing bugs, and “optimizing” the code.

I wrote a lightweight application in Rust called niri-sidebar to manage a sidebar of floating windows for the niri window manager.

I received a PR claiming to optimize the performance of the application. I store the sidebar’s windows in a Vec (a simple contiguous list). This PR claimed that lookups in this Vec were slow because they are \( \mathcal{O}(n) \) and could be optimized to \( \mathcal{O}(1) \) using a HashMap.

If you just thought, “Sure, that seems like a reasonable improvement,” you should think twice. Hopefully, by the end of this post, you’ll see why the “naive” solution was actually the high-performance one.

The Hidden Factors #

The most important thing to remember is that Big O notation is asymptotic. It compares how algorithms scale as inputs grow infinitely large. If a sidebar were holding hundreds of thousands of windows, a HashMap would absolutely be the right choice.

But asymptotic analysis ignores the constant factors, the fixed costs of running an algorithm that Big O notation conveniently hides. When \(N\) is small, these constant factors often matter more than the scaling function itself.

Vec vs. HashMap: The Hardware Reality #

In this project, a user will realistically have 4 to 8 windows in their sidebar. Even a power user won’t exceed 20. At such small scales, a Vec will almost always win due to the Law of Simplicity.

A Vec is a contiguous block of memory. This makes it incredibly efficient for the CPU cache. CPUs don’t fetch single bytes; they fetch cache lines (typically 64 bytes). When the CPU grabs the first window in my vector, it likely already has the next five in its L1 cache.

In contrast, a HashMap involves hashing a key and jumping to a memory address that is likely not in the cache. This causes a cache miss, forcing the CPU to wait hundreds of cycles to fetch data from RAM.

Furthermore, HashMaps introduce:

Hashing Overhead: Even fast hash functions (like Rust’s default DoS-resistant SipHash) have a computational cost significantly higher than comparing two integers.
Allocations: HashMaps often require multiple allocations for buckets and metadata.
Branch Misprediction: A linear scan through a Vec is easy for a CPU to predict. A HashMap lookup involves complex branching (Is the bucket empty? Does the key match? Is there a collision?), which can stall the CPU pipeline.

A Classic Example: Sorting #

I want to cement this concept with a classic example from computer science. You have likely heard of Quicksort, the gold standard sorting algorithm that runs in \( \mathcal{O}(n \log n) \). You might also know Insertion Sort, a “slow” algorithm that runs in \( \mathcal{O}(n^2) \).

Yet, almost every production sorting library (including Rust’s slice::sort) uses a hybrid approach: they switch to Insertion Sort when the input size is small.

Why? Because for small \(N\), \(n^2\) is actually smaller than \(n \log n\) when you account for the constant factors. Let’s prove it mathematically.

Note: Quicksort’s worst case is \( \mathcal{O}(n^2) \), but this only happens with unlucky pivot choices. In this analysis, we are comparing the average case for both algorithms.

1. Insertion Sort Analysis #

Suppose we have already sorted the first \(i\) elements. We now look at the \((i+1)\)-th element and want to insert it into the sorted sequence.

In the average case, assuming all permutations are equally likely, the element will settle in the middle of the sorted partition, requiring \( \frac{i+1}{2} \) comparisons.

To sort an array of size \(n\), we perform this step for every element from \(i=1\) to \(n-1\). The total average comparisons \(C(n)\) is:

\[ C(n) = \sum_{i=1}^{n-1} \frac{i+1}{2} = \frac{1}{2} \left( \sum_{i=1}^{n-1} i + \sum_{i=1}^{n-1} 1 \right) \]

Using the arithmetic series formula: \[ C(n) = \frac{1}{2} \left( \frac{(n-1)n}{2} + (n-1) \right) \]

Simplifying the algebra: \[ C(n) = \frac{n^2 + n - 2}{4} \sim \mathbf{0.25 n^2} \] Here we use the so called tilde notation to account for the factor of the leading term.

2. Quicksort Analysis #

For Quicksort, the average case depends on the pivot. In a random array, the pivot is equally likely to be the 1st, 2nd, … or \(n\)-th smallest element.

We pick a pivot (let’s say it’s the \(k\)-th smallest).
We partition the array (cost: \(n-1\) comparisons).
We recursively sort the left (\(k-1\) elements) and right (\(n-k\) elements) partitions.

The recurrence relation is: \[ C(n) = (n-1) + \frac{1}{n} \sum_{k=1}^{n} ( C(k-1) + C(n-k) ) \]

Because \(\sum C(k-1)\) and \(\sum C(n-k)\) cover the same elements (just mirrored), we can simplify: \[ C(n) = (n-1) + \frac{2}{n} \sum_{k=0}^{n-1} C(k) \]

Multiplying by \(n\): \[ nC(n) = n(n-1) + 2 \sum_{k=0}^{n-1} C(k) \]

Now, we write the same equation for \(n-1\): \[ (n-1)C(n-1) = (n-1)(n-2) + 2 \sum_{k=0}^{n-2} C(k) \]

Subtracting the second equation from the first eliminates the summation: \[ nC(n) - (n-1)C(n-1) = 2(n-1) + 2C(n-1) \] \[ nC(n) = (n+1)C(n-1) + 2n - 2 \]

Divide by \(n(n+1)\) to create a telescoping series: \[ \frac{C(n)}{n+1} = \frac{C(n-1)}{n} + \frac{2n-2}{n(n+1)} \]

For large \(n\), the term \(\frac{2n-2}{n(n+1)}\) approximates to \(\frac{2}{n}\). Expanding this recurrence down to the base case gives us a summation: \[ \frac{C(n)}{n+1} \approx 2 \sum_{i=1}^{n} \frac{1}{i} \]

The term \(\sum_{i=1}^{n} \frac{1}{i}\) is the Harmonic Number \(H_n\), which is approximately \(\ln n + \gamma\) (where \(\gamma\) is the Euler-Mascheroni constant). Therefore: \[ \frac{C(n)}{n+1} \approx 2 \ln n \] \[ C(n) \approx 2(n+1) \ln n \approx 2n \ln n \]

Converting to base-2 logarithms (to match standard Big O usage): \[ C(n) \approx 2 \ln(2) \cdot n \log_2 n \sim\mathbf{1.39 n \log_2 n} \]

3. The Crossover Point #

Now we have our two competitors. Let’s plug in small values for \(N\) and see who wins.

\(N\) (Items)	Insertion Sort (\(\approx 0.25 n^2\))	Quicksort (\(\approx 1.39 n \log n\))	Winner
5	~6.25 ops	~16.1 ops	Insertion
10	~25 ops	~46.1 ops	Insertion
20	~100 ops	~120.1 ops	Insertion
50	~625 ops	~392.2 ops	Quicksort

For \(N=20\) (my sidebar scenario), Insertion Sort performs fewer operations than Quicksort.

And remember the hardware? Insertion sort is a tight loop of swaps, extremely friendly to the CPU’s branch predictor. Quicksort relies on partitioning around a pivot, which is statistically a 50/50 branch; a nightmare for the CPU pipeline that leads to frequent flushes and wasted cycles.

Conclusion #

Theory tells you how an algorithm scales as \(N \to \infty\), not how it performs on a sidebar with 12 windows. “Faster” data structures like HashMaps often come with heavy overheads; hashing, allocations, and cache misses, that only pay off when \(N\) is large.

Context is everything. If \(N\) is bounded by a small number, the “inefficient” algorithm is often the most efficient.

The fastest algorithm is the one that matches your constraints, not the one with the best asymptotic proof.