Only Coders - Where knowledge meets opportunity

python (65.2k questions)

javascript (44.3k questions)

reactjs (22.7k questions)

java (20.8k questions)

c# (17.4k questions)

html (16.3k questions)

r (13.7k questions)

android (13k questions)

Questions - ptx

Can nvcc generate an older PTX ISA version

When I compile (nvcc -ptx axpy) a short example kernel with nvcc in CUDA toolkit 11.4: __global__ void axpy(float a, float* x, float* y) { y[threadIdx.x] = a * x[threadIdx.x]; } I get this ptx: // ...

Steve Cox

c++

cuda

ptx

Votes: 0

Answers: 0

Why Pytorch 1.7 with cuda10.1 cannot compatible with Nvidia A100 Ampere Architecture (according to PTX compatibilty pricinple)

According to Nvidia official documentation, if CUDA appliation is built to include PTX, because the PTX is forward-compatible, Meaning PTX is supported to run on any GPU with compute capability highe...

Seven link bob

pytorch

cuda

gpu

ptx

Votes: 0

Answers: 1

Latest Answer

After @talonmies' reminder, I also posted the same question in discuss.pytorch.org. The answer is because pytorch1.7 uses cuDNN7, which is not compatible with the A100. CuDNN7.6.5 is not supported by ...

Seven link bob

Get the PTX dump when running TensorRT

I am running an ONNX model through TensorRT. I can verify that inference is running on the GPU through the results and nvsys profile logs. However, I would like to see the corresponding PTX binary tha...

mikepapadim

gpu

nvidia

tensorrt

ptx

Votes: 0

Answers: 0

Why does NVCC not optimize away ceilf() for literals?

(Followup question for Compile-time ceiling function, for literals, in C?) Considering the following CUDA function: __device__ int foo_f() { return ceilf(1007.1111); } It should be easy to optimize t...

ein supports Moderator Strike

floating-point

cuda

compiler-optimization

ceil

ptx

Votes: 0

Answers: 1

Latest Answer

As you are no doubt fully aware, PTX is a virtual assembly language and isn't run by the GPU. If we compile your code to machine code, we see this: $ cat bogogogo.cu __device__ int foo_f() { return ce...

talonmies

Posts

Questions

Blogs

Jobs

Questions about ptx

Read more about ptx