python (65.2k questions)
javascript (44.3k questions)
reactjs (22.7k questions)
java (20.8k questions)
c# (17.4k questions)
html (16.3k questions)
r (13.7k questions)
android (13k questions)
Why Pytorch 1.7 with cuda10.1 cannot compatible with Nvidia A100 Ampere Architecture (according to PTX compatibilty pricinple)
According to Nvidia official documentation, if CUDA appliation is built to include PTX, because the PTX is forward-compatible, Meaning PTX is supported to run on any GPU with compute capability highe...

Seven link bob
Votes: 0
Answers: 1
Get the PTX dump when running TensorRT
I am running an ONNX model through TensorRT.
I can verify that inference is running on the GPU through the results and nvsys profile logs.
However, I would like to see the corresponding PTX binary tha...
mikepapadim
Votes: 0
Answers: 0
Why does NVCC not optimize away ceilf() for literals?
(Followup question for Compile-time ceiling function, for literals, in C?)
Considering the following CUDA function:
__device__ int foo_f() { return ceilf(1007.1111); }
It should be easy to optimize t...
ein supports Moderator Strike
Votes: 0
Answers: 1