1 year ago

#356530

test-img

huseyin tugrul buyukisik

Do CPUs with AVX2 or newer instruction sets support any form of caching on register renaming?

For example, there is a very simple pseudo code with many duplicated values taken:

Data:
   1 5 1 5 1 2 2 3 8 3 4 5 6 7 7 7

For all data elements:
    get particle id from data array
    idx = id/7
    index = (idx << 8) | id
    aabb = lookup[index]
    test collision of aabb with a ray

so that it will very probably re-compute same value of 1 for same division followed by same bitwise operation, with no loop carried dependency.

Can new CPUs (like Avx512 or AVX2) remember the pattern (same data + same code path) and directly rename an old input register and return the output quickly (like predicting branch but instead predicting register renamed for a temporary value)?

I'm currently developing a collision detection algorithm on an old CPU (bulldozer ver.1) and any online C++ compiler is not good enough for having predictable performance due to cpu being shared by all visitors.

Removing duplicates by using an unoredered map takes about 15-30 nanoseconds per insert or by using a vectorized plain array scan about 3-5 nanoseconds per insert. This is too slow to effectively filter unnecessary duplicates out. Even if a direct-mapped cache is used (that contains just a modulo operator and some assignments), it still fails (due to cache miss) even worse than using an unordered map in terms of performance.

I'm not expecting a cpu with only hundred(s) of physical registers to actually cache many things, but it could help a lot in computing duplicate values quickly, by just remembering the "same value + same code path" combo only from the last iteration of a loop. At least some physics simulations with collision checking could get a decent boost.

Processing a sorted is faster, but only for branching code? What about branchless code, with newest cpus?

Is there any way of harnessing the register renaming performance (zero latency?) as a simple caching of duplicated work?

caching

rename

cpu-registers

avx2

avx512

0 Answers

Your Answer

Accepted video resources