OOM at scale taught me a few things

Here is a concise learning recap summarizing your journey from “Memory Crashes” to a “Production-Ready Clustering Pipeline.” From OOM to Optimized: Scaling Unstructured Clustering Goal: Run Leiden community detection on 30k–100k text embeddings with high density. Challenge: The process was crashing (Out of Memory) even on moderate datasets.

The “Graph Explosion” Problem We discovered that a standard Threshold Graph (e.g., “connect everyone > 70% similar”) is mathematically dangerous for large datasets.

The Math: 100k nodes with 50% density = 5 Billion edges. Storing this requires ~60GB RAM.
The Fix: Switch to k-Nearest Neighbors (k-NN).
- Instead of “Connect to everyone,” we say “Connect to the top 30.”
- This caps memory usage linearly (N \times 30) instead of quadratically (N^2).
- Result: Graph size dropped from ~60GB to ~50MB.

The “Hybrid” Approach Pure k-NN was too aggressive—it bridged distinct clusters together, dropping the community count from ~100 to ~20.

The Fix: Hybrid k-NN + Thresholding.
- Step 1: Use PyNNDescent to find the top 30 candidates (Fast).
- Step 2: Apply a strict mask (Similarity > 0.7) to prune weak links (Precise).
Outcome: We got the speed/safety of k-NN with the high-quality separation of a Radius graph.

The “Embedding Spike” Problem Even before clustering, generating embeddings for 30k texts caused crashes.

The Cause:
- Tensor Expansion: Sending all 30k texts to the model at once created massive temporary matrices (~20GB RAM).
- The “VStack” Trap: Storing results in a list and running np.vstack(list) momentarily doubles memory usage (List + New Array coexist).
The Fix:
- Batching: Process 256 texts at a time to keep “expansion” memory low (~200MB).
- Pre-allocation: Create an empty np.zeros((N, Dim)) array first, then fill it slice-by-slice. This keeps memory usage flat with zero spikes.

Zero-Copy Engineering We learned that Python objects are heavy.

Bad: edges = list(zip(rows, cols)) creates millions of tuple objects (Slow, Heavy).
Good: edges = np.column_stack((rows, cols)) keeps data in pure C-contiguous memory (Fast, Tiny).
Result: We can pass millions of edges to igraph in milliseconds using almost no extra RAM. Final Verdict By combining Batched Inference, Pre-allocated Memory, and Hybrid k-NN, we turned a script that crashed on 30k rows into a pipeline that can easily scale to 100k+ rows on a standard laptop, while maintaining the exact same clustering quality as the brute-force method.