01
Dec
Recently, I was working on a multi-threaded implementation of a function to calculate the Poisson distribution (amath_pdist). The goal was to divide the workload across multiple threads to improve performance, especially for large arrays. However, instead of achieving the expected speedup, I noticed a significant slowdown as the size of the array increased. After some investigation, I discovered the culprit: false sharing. In this post, I’ll explain what false sharing is, show the original code causing the problem, and share the fixes that led to a substantial performance improvement. The Problem: False Sharing in Multi-threaded Code False sharing happens when…