python (65.1k questions)
javascript (44.2k questions)
reactjs (22.7k questions)
java (20.8k questions)
c# (17.4k questions)
html (16.3k questions)
r (13.7k questions)
android (12.9k questions)
vectorizing branched table lookup in SSE fast approximate cosine
I'm making a small game engine for personal use. The target architecture is x86_64 preferably with SSE2.
The sine/cosine function is one of the core parts, and it's implemented as a precomputed table ...
xiver77
Votes: 0
Answers: 1
error: invalid static_cast from type ‘__m256i’ {aka ‘__vector(4) long long int’} to type ‘void*’
I'm trying to compile a piece of code where it calls uses static_cast to do something like the following:
__m256i values;
int64_t i = 1;
static_cast<void*>(values + i);
but this results i...
David
Votes: 0
Answers: 0
how to use _mm_mask_add_ps instruction correctly?
I wrote the test code as below. If I set mask 0b1111 or 0b0000, it works fine. If I use the mask combined with 01, 0b1101 0b1001..., the program crashed. A SIGILL signal which means illegal instructio...
HLI
Votes: 0
Answers: 0
Accessing the fields of a __m128i variable in a portable way
I am trying to use SIMD instructions to speed up the sum of elements in an array of uint8_t (i.e., sum reduction). For that purpose, I am replicating the most voted answer in this question:
Sum reduct...
AAA
Votes: 0
Answers: 1