(For the hardware setup I'm using, see this post. For the baseline performance measurements for this benchmark, see this post.) In my previous post in the series, I described the benchmark I'm optimizing – populating a 1-TB clustered index as fast as possible using default values. I proved to you that I had an IO bottleneck [...]