python (65.2k questions)
javascript (44.3k questions)
reactjs (22.7k questions)
java (20.8k questions)
c# (17.4k questions)
html (16.3k questions)
r (13.7k questions)
android (13k questions)
Auto-vectorization for hand-unrolled initialized tiled-computation versus simple loop with no initialization
In optimization for an AABB collision detection algorithm's inner-most 4-versus-4 comparison part, I am stuck at simplifying code at the same time gaining(or just retaining) performance.
Here is the v...
huseyin tugrul buyukisik
Votes: 0
Answers: 0
Why does the pseudocode of _mm_insert_ps calculate %8?
Within the intel intrinsics guide, the pseudocode for the operation of _mm_insert_ps, the following is defined:
FOR j := 0 to 3
i := j*32
IF imm8[j%8]
dst[i+31:i] := 0
ELSE
...

Brotcrunsher
Votes: 0
Answers: 1
Is there a way to cast integers to bytes, knowing these ints are in range of bytes. Using SSE?
In an xmm register I have 3 integers with values less than 256. I want to cast these to bytes, and save them to memory. I don't know how to approach it.
I was thinking about getting those numbers from...
thomas113412
Votes: 0
Answers: 1