The Weight of Decisions: Solving Weighted Random Sorting at Scale

2026-01-07

5 minutes

Several years ago at Microsoft, I worked on a back-end service responsible for routing high volumes of traffic. The service functioned as a traffic dispatcher: we received an incoming request, determined the most appropriate downstream destination, and forwarded the payload.

Our routing decisions relied on a pool of destination services. To ensure reliability, we tracked ongoing statistics regarding the success rates of these partners. When a destination consistently processed requests successfully, we increased its traffic share. Conversely, if we observed frequent errors, we reduced its allocation.

The Requirement for Ordered Fail-over

Routing in a distributed environment requires more than just selecting a single destination. Downstream partners occasionally fail to process specific items due to transient internal errors or malformed data. Because of this, we needed a "fail-over" list for every request. If the primary choice failed, the system needed an immediate second and third option ready to go.

This meant our routing logic had to produce a fully sorted list of all available destination services. We would attempt to send the request to the first service in the list; if that service returned an error, we moved to the second, and continued until the request succeeded or we exhausted the list.

Balancing Traffic via Weighted Randomization

A simple deterministic sort based on quality scores would create a "starvation" problem. If Service A had the highest success rate, a standard sort would consistently place it at the top. This would result in Service A receiving 100% of the initial traffic, leaving services B and C with no data points. Without ongoing "exploration" traffic to lower-ranked services, we could never observe if their performance had recovered or improved.

To maintain fresh statistics across the entire pool, we needed a weighted random sort, effectively a weighted permutation. Consider a scenario with three services and the following target distributions:

Service A: 50%
Service B: 30%
Service C: 20%

The sorted list needed to respect these probabilities at every rank. For the first position, the chances are exactly as defined (50%, 30%, 20%). However, if Service A is selected for the first slot, the second slot must be filled by B or C based on their remaining relative weights. In this case, Service B should appear in the second position with a probability of 0.3 / (0.3 + 0.2) = 60%, while Service C takes the spot 40% of the time.

Implementation Logic

Both the initial and the improved versions of our code use the following data structure to represent a destination:

public class Item
{
	public string Name { get; set; }
	public double Weight { get; set; }
}

The Intuition and the Initial O(N² log N) Approach

I suspected a mathematical shortcut existed to calculate a single value for each item that, when sorted, would result in this exact distribution. This thought surfaced repeatedly over the years and never truly gave me peace. I felt certain a single-pass solution was possible, yet the specific formula remained out of reach. At the time, we settled on an iterative approach that picked items one by one.

In C#, the implementation relied on a "weighted pick and remove" loop. The computational cost was actually O(N² log N) in the version below because a full sort is performed inside a loop to pick just one item. While N = 10 renders this inefficiency invisible in production, the logic scales poorly for larger pools or high-throughput middleware.

public static List<T> InefficientWeightedRandomize<T>(IEnumerable<T> source)
	where T : Item
{
	var remainingItems = source.ToList();
	var result = new List<T>();

	while (remainingItems.Any())
	{
		// Sort the source list by a random factor modified by weight
		var nextItem = remainingItems
			.Select(item => new
			{
				Item = item,
				SortKey = Random.Shared.NextDouble() * item.Weight
			})
			.OrderByDescending(x => x.SortKey)
			.First();

		result.Add(nextItem.Item);
		remainingItems.Remove(nextItem.Item);
	}

	return result;
}

The Efraimidis-Spirakis Algorithm

Recently, I came across the Efraimidis-Spirakis Weighted Random Sampling paper, which provides the mathematical proof for the "one-pass sort" I had originally envisioned. The paper demonstrates that calculating a key k_i = u_i^1/w_i (where u is a random value and w is the weight) and sorting by that key produces a mathematically perfect weighted random sample.

This technique is a specialized form of Reservoir Sampling. It allows us to process the list in O(N log N) time with a much cleaner implementation:

public static List<T> WeightedRandomize<T>(IEnumerable<T> source)
	where T : Item
{
	return source
		.Select(item => new
		{
			Item = item,
			SortKey = Math.Pow(
				Random.Shared.NextDouble(),
				1.0 / item.Weight
			)
		})
		.OrderByDescending(x => x.SortKey)
		.Select(x => x.Item)
		.ToList();
}

Applications for Streaming Data

The Efraimidis-Spirakis algorithm offers an interesting property: it works on data streams without requiring the full list to be present at the start. Since each item's SortKey is calculated independently, we can assign keys as items arrive. Once the stream ends, a single pass through the data (or maintaining a min-heap for a "Top K" selection) yields the result.

For the team back at Microsoft still managing these services, this is a small but elegant improvement. While N = 10 doesn't demand high-performance algorithms, replacing the O(N²) loop with a proper weighted sort aligns the implementation with the intended mathematical model. I suspect there might even be an old research ticket regarding this specific efficiency improvement sitting in the backlog, likely created by me before I moved on. It might be time to finally close it.