1 year ago
#365688
ancientjpeg
What happens when multiple GPU threads in a single warp/wave attempt to write to the same shared memory location?
I've been learning about parallel/GPU programming a lot recently, and I've encountered a situation that's stumped me. What happens when two threads in a warp/wave attempt to write to the same exact location in shared memory? Specifically, I'm confused as to how this can occur when warp threads each execute the exact same instruction at the same time (to my understanding).
For instance, say you dispatch a shader that runs 32 threads, the size of a normal non-AMD warp. Assuming no dynamic branching (which as I understand, will normally call up a second warp to execute the branched code? I could be very wrong about that), what happens if we have every single thread try to write to a single location in shared memory?
Though I believe my question applies to any kind of GPU code, here's a simple example in HLSL:
groupshared uint test_target;
#pragma kernel WarpWriteTest
[numthreads(32, 1, 1)]
void WarpWriteTest (uint thread_id: SV_GroupIndex) {
test_target = thread_id;
}
I understand this is almost certainly implementation-specific, but I'm just curious what would generally happen in a situation like this. Obviously, you'd end up with an unpredictable value stored in test_target
, but what I'm really curious about is what happens on a hardware level. Does the entire warp have to wait until every write is complete, at which point it will continue executing code in lockstep (and would this result in noticeable latency)? Or is there some other mechanism to GPU shared memory/cache that I'm not understanding?
Let me clarify, I'm not asking what happens when multiple threads try to access a value in global memory/DRAM—I'd be curious to know, but my question is specifically concerned the shared memory in a threadgroup. I also apologize if this information is readily available somewhere else—as anyone reading might know, GPU terminology in general can be very nebulous and non-standardized, so I've had difficulty even knowing what I should be looking for.
Thank you so much!
parallel-processing
gpu
hardware
hlsl
0 Answers
Your Answer