1 year ago
#356530
huseyin tugrul buyukisik
Do CPUs with AVX2 or newer instruction sets support any form of caching on register renaming?
For example, there is a very simple pseudo code with many duplicated values taken:
Data:
1 5 1 5 1 2 2 3 8 3 4 5 6 7 7 7
For all data elements:
get particle id from data array
idx = id/7
index = (idx << 8) | id
aabb = lookup[index]
test collision of aabb with a ray
so that it will very probably re-compute same value of 1 for same division followed by same bitwise operation, with no loop carried dependency.
Can new CPUs (like Avx512 or AVX2) remember the pattern (same data + same code path) and directly rename an old input register and return the output quickly (like predicting branch but instead predicting register renamed for a temporary value)?
I'm currently developing a collision detection algorithm on an old CPU (bulldozer ver.1) and any online C++ compiler is not good enough for having predictable performance due to cpu being shared by all visitors.
Removing duplicates by using an unoredered map takes about 15-30 nanoseconds per insert or by using a vectorized plain array scan about 3-5 nanoseconds per insert. This is too slow to effectively filter unnecessary duplicates out. Even if a direct-mapped cache is used (that contains just a modulo operator and some assignments), it still fails (due to cache miss) even worse than using an unordered map in terms of performance.
I'm not expecting a cpu with only hundred(s) of physical registers to actually cache many things, but it could help a lot in computing duplicate values quickly, by just remembering the "same value + same code path" combo only from the last iteration of a loop. At least some physics simulations with collision checking could get a decent boost.
Processing a sorted is faster, but only for branching code? What about branchless code, with newest cpus?
Is there any way of harnessing the register renaming performance (zero latency?) as a simple caching of duplicated work?
caching
rename
cpu-registers
avx2
avx512
0 Answers
Your Answer