CUDA Python Low-level Bindings

Overview

CUDA-Python

Building

Requirements

Dependencies of the CUDA-Python bindings and some versions that are known to work are as follows:

  • Driver: Linux (450.80.02 or later) Windows(456.38 or later)
  • CUDA Toolkit 11.0 to 11.4 - e.g. 11.4.48
  • Cython - e.g. 0.29.21
  • Versioneer - e.g. 0.20

Compilation

To compile the extension in-place, run:

python setup.py build_ext --inplace

To compile for debugging the extension modules with gdb, pass the --debug argument to setup.py.

Develop installation

You can use

python setup.py develop

to use the module in-place in your current Python environment (e.g. for testing of porting other libraries to use the binding).

Build the Docs

conda env create -f docs/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs
make html
open build/html/index.html

Build the Docs

conda env create -f docs_src/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs_src
make html
open build/html/index.html

Publish the Docs

git checkout gh-pages
cd docs_src
make html
cp -a build/html/. ../docs/

Testing

Requirements

Dependencies of the test execution and some versions that are known to work are as follows:

  • numpy-1.19.5
  • numba-0.53.1
  • matplotlib-3.3.4
  • scipy-1.6.3
  • pytest-benchmark-3.4.1

Unit-tests

You can run the included tests with:

pytest

Samples

You can run the included tests with:

pytest examples

Benchmark

You can run benchmark only tests with:

pytest --benchmark-only

Examples

The included examples are:

  • examples/extra/jit_program.py: Demonstrates the use of the API to compile and launch a kernel on the device. Includes device memory allocation / deallocation, transfers between host and device, creation and usage of streams, and context management.
  • examples/extra/numba_emm_plugin.py: Implements a Numba External Memory Management plugin, showing that this CUDA Python Driver API can coexist with other wrappers of the driver API.
Comments
  • Fails to build on AmazonLinux

    Fails to build on AmazonLinux

    Hi,

    I checked out the package and tried to build it on AmazonLinux but it fails to compile. Please see the build output below. I also tried all other commands there were mentioned in installation guide, but all failed with the same issue.

    Cuda : 11.2 GCC: 9.3

    $ python setup.py build
    Compiling cuda/_cuda/ccuda.pyx because it changed.
    Compiling cuda/_cuda/cnvrtc.pyx because it changed.
    [1/2] Cythonizing cuda/_cuda/ccuda.pyx
    [2/2] Cythonizing cuda/_cuda/cnvrtc.pyx
    Compiling cuda/_lib/utils.pyx because it changed.
    [1/1] Cythonizing cuda/_lib/utils.pyx
    Compiling cuda/_lib/ccudart/ccudart.pyx because it changed.
    Compiling cuda/_lib/ccudart/utils.pyx because it changed.
    [1/2] Cythonizing cuda/_lib/ccudart/ccudart.pyx
    [2/2] Cythonizing cuda/_lib/ccudart/utils.pyx
    Compiling cuda/ccuda.pyx because it changed.
    Compiling cuda/ccudart.pyx because it changed.
    Compiling cuda/cnvrtc.pyx because it changed.
    Compiling cuda/cuda.pyx because it changed.
    Compiling cuda/cudart.pyx because it changed.
    Compiling cuda/nvrtc.pyx because it changed.
    [1/6] Cythonizing cuda/ccuda.pyx
    [2/6] Cythonizing cuda/ccudart.pyx
    [3/6] Cythonizing cuda/cnvrtc.pyx
    [4/6] Cythonizing cuda/cuda.pyx
    [5/6] Cythonizing cuda/cudart.pyx
    [6/6] Cythonizing cuda/nvrtc.pyx
    Compiling cuda/tests/test_ccuda.pyx because it changed.
    Compiling cuda/tests/test_ccudart.pyx because it changed.
    Compiling cuda/tests/test_interoperability_cython.pyx because it changed.
    [1/3] Cythonizing cuda/tests/test_ccuda.pyx
    [2/3] Cythonizing cuda/tests/test_ccudart.pyx
    [3/3] Cythonizing cuda/tests/test_interoperability_cython.pyx
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.8
    creating build/lib.linux-x86_64-3.8/cuda
    copying cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_version.py -> build/lib.linux-x86_64-3.8/cuda
    creating build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_cuda
    creating build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib
    creating build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/__init__.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/kernels.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/perf_test_utils.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_cupy.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_launch_latency.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_numba.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_pointer_attributes.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    creating build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/__init__.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cuda.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cudart.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cython.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_kernelParams.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_nvrtc.py -> build/lib.linux-x86_64-3.8/cuda/tests
    creating build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/__init__.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.h -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_lib/dlfcn.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.h -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/tests/test_ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/_lib/ccudart/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    UPDATING build/lib.linux-x86_64-3.8/cuda/_version.py
    set build/lib.linux-x86_64-3.8/cuda/_version.py to '11.7.1'
    running build_ext
    building 'cuda._cuda.ccuda' extension
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/cuda
    creating build/temp.linux-x86_64-3.8/cuda/_cuda
    /home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -fPIC -I./cuda -I./cuda/_cuda -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include -I/usr/local/cuda-11.2/include -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include/python3.8 -c cuda/_cuda/ccuda.cpp -o build/temp.linux-x86_64-3.8/cuda/_cuda/ccuda.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    cuda/_cuda/ccuda.cpp: In function 'int __pyx_f_4cuda_5_cuda_5ccuda_cuPythonInit()':
    cuda/_cuda/ccuda.cpp:4202:138: error: 'CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM' was not declared in this scope
     4202 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0x1B58, CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 836, __pyx_L4_error)
          |                                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:4924:137: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     4924 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0xFA0, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 917, __pyx_L4_error)
          |                                                                                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:5637:152: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     5637 |       __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuGetErrorString"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuGetErrorString), 0x1770, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 997, __pyx_L4_error)
          |                                                                                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:15609:73: error: 'CUflushGPUDirectRDMAWritesTarget' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:122: error: 'CUflushGPUDirectRDMAWritesScope' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:167: warning: expression list treated as compound expression in initializer [-fpermissive]
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                                                                       ^
    cuda/_cuda/ccuda.cpp:16977:94: error: 'CUexecAffinityType' has not been declared
    16977 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int *__pyx_v_pi, CUexecAffinityType __pyx_v_typename, CUdevice __pyx_v_dev) {
          |                                                                                              ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int*, int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17082:30: error: expected primary-expression before '(' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17082:32: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17082:34: error: expected primary-expression before 'int'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                  ^~~
    cuda/_cuda/ccuda.cpp:17082:41: error: 'CUexecAffinityType' was not declared in this scope
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                         ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17082:69: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                                                     ^
    cuda/_cuda/ccuda.cpp:17082:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:17319:86: error: 'CUexecAffinityParam' has not been declared
    17319 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUcontext *__pyx_v_pctx, CUexecAffinityParam *__pyx_v_paramsArray, int __pyx_v_numParams, unsigned int __pyx_v_flags, CUdevice __pyx_v_dev) {
          |                                                                                      ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUctx_st**, int*, int, unsigned int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17424:30: error: expected primary-expression before '(' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17424:32: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17424:44: error: expected primary-expression before '*' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                            ^
    cuda/_cuda/ccuda.cpp:17424:45: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                             ^
    cuda/_cuda/ccuda.cpp:17424:47: error: 'CUexecAffinityParam' was not declared in this scope
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                               ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:68: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                    ^
    cuda/_cuda/ccuda.cpp:17424:70: error: expected primary-expression before 'int'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                      ^~~
    cuda/_cuda/ccuda.cpp:17424:75: error: expected primary-expression before 'unsigned'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                           ^~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:97: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                                                 ^
    cuda/_cuda/ccuda.cpp:17424:99: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                   ~                                                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                   )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:20397:67: error: 'CUexecAffinityParam' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                   ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:88: error: '__pyx_v_pExecAffinity' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                        ^~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:111: error: 'CUexecAffinityType' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:146: warning: expression list treated as compound expression in initializer [-fpermissive]
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                                                                  ^
    cuda/_cuda/ccuda.cpp:33564:75: error: 'CUDA_ARRAY_MEMORY_REQUIREMENTS' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:107: error: '__pyx_v_memoryRequirements' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:143: error: expected primary-expression before '__pyx_v_array'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                                                               ^~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:167: error: expected primary-expression before '__pyx_v_device'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
    ---
    truncated due to git issue limit
    ---
    cuda/_cuda/ccuda.cpp:58806:44: error: 'CUgraphMem_attribute' was not declared in this scope
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                            ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:58806:66: error: expected primary-expression before 'void'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                                                  ^~~~
    cuda/_cuda/ccuda.cpp:58806:74: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                   ~                                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                          )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:64515:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64515:79: error: '__pyx_v_object_out' was not declared in this scope
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:99: error: expected primary-expression before 'void'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                   ^~~~
    cuda/_cuda/ccuda.cpp:64515:127: error: expected primary-expression before '__pyx_v_destroy'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                               ^~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:144: error: expected primary-expression before 'unsigned'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:182: error: expected primary-expression before 'unsigned'
    64515 | atic CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^~~~~~~~
    
    cuda/_cuda/ccuda.cpp:64515:208: warning: expression list treated as compound expression in initializer [-fpermissive]
    64515 | a_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^
    
    cuda/_cuda/ccuda.cpp:64686:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64686:94: error: expected primary-expression before 'unsigned'
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64686:120: warning: expression list treated as compound expression in initializer [-fpermissive]
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                        ^
    cuda/_cuda/ccuda.cpp:64857:66: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                  ^~~~~~~~~~~~
          |                                                                  CUsurfObject
    cuda/_cuda/ccuda.cpp:64857:95: error: expected primary-expression before 'unsigned'
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                               ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64857:121: warning: expression list treated as compound expression in initializer [-fpermissive]
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                         ^
    cuda/_cuda/ccuda.cpp:65028:93: error: 'CUuserObject' has not been declared
    65028 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count, unsigned int __pyx_v_flags) {
          |                                                                                             ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph, int, unsigned int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65133:30: error: expected primary-expression before '(' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:65133:32: error: expected primary-expression before ')' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:65133:41: error: expected primary-expression before ',' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65133:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65133:57: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:71: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                                       ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:85: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                   ~                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                     )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:65199:94: error: 'CUuserObject' has not been declared
    65199 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph, int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65304:30: error: expected primary-expression before '(' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                              ^
    cuda/_cuda/ccuda.cpp:65304:32: error: expected primary-expression before ')' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                ^
    cuda/_cuda/ccuda.cpp:65304:41: error: expected primary-expression before ',' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65304:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65304:57: error: expected primary-expression before 'unsigned'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65304:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:74604:69: error: 'CUmoduleLoadingMode' was not declared in this scope
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                     ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:74604:90: error: '__pyx_v_mode' was not declared in this scope; did you mean '__pyx_k_name'?
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                                          ^~~~~~~~~~~~
          |                                                                                          __pyx_k_name
    cuda/_cuda/ccuda.cpp:74775:145: error: 'CUmemRangeHandleType' has not been declared
    74775 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void *__pyx_v_handle, CUdeviceptr __pyx_v_dptr, size_t __pyx_v_size, CUmemRangeHandleType __pyx_v_handleType, unsigned PY_LONG_LONG __pyx_v_flags) {
          |                                                                                                                                                 ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void*, CUdeviceptr, size_t, int, long long unsigned int)':
    cuda/_cuda/ccuda.cpp:74880:30: error: expected primary-expression before '(' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:74880:32: error: expected primary-expression before ')' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:74880:34: error: expected primary-expression before 'void'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                  ^~~~
    cuda/_cuda/ccuda.cpp:74880:53: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                     ^
    cuda/_cuda/ccuda.cpp:74880:61: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                             ^
    cuda/_cuda/ccuda.cpp:74880:63: error: 'CUmemRangeHandleType' was not declared in this scope; did you mean 'CUmemHandleType'?
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                               ^~~~~~~~~~~~~~~~~~~~
          |                                                               CUmemHandleType
    cuda/_cuda/ccuda.cpp:74880:85: error: expected primary-expression before 'unsigned'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                                                     ^~~~~~~~
    cuda/_cuda/ccuda.cpp:74880:108: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                   ~                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                            )
    error: command '/home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc' failed with exit status 1
    
    opened by pranavladkat 9
  • nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    Dear developers,

    I found out that calling the NVRTC for compilation is changing the preferred encoding for the current Python instance.

    For more details and to reproduce the issue, please refer to this StackOverflow question.

    Do you have an idea on why this happens, and how it is possible to revert the preferred encoding to its original setting?

    Thank you in advance

    opened by redsnic 5
  • Failed to dlopen libcuda.so in WSL environment

    Failed to dlopen libcuda.so in WSL environment

    from cuda import cuda
    cuda.cuInit(0)
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    Input In [2], in <module>
    ----> 1 cuda.cuInit(0)
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/cuda.pyx:8876, in cuda.cuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/ccuda.pyx:17, in cuda.ccuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:3553, in cuda._cuda.ccuda._cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:424, in cuda._cuda.ccuda.cuPythonInit()
    
    RuntimeError: Failed to dlopen libcuda.so
    

    This is because in a WSL environment libcuda.so lives in /usr/lib/wsl/lib which is not in the default search path of dlopen. For libraries that link against libcuda this isn't a problem because there's a file at /etc/ld.so.conf.d/ld.wsl.conf which instructs the linker as to where it can find the libraries, but unfortunately dlopen doesn't use this.

    As a workaround, adding /usr/lib/wsl/lib to the LD_LIBRARY_PATH environment variable resolves the problem.

    opened by kkraus14 4
  • No module named 'examples'

    No module named 'examples'

    I change directories to try to run some examples.

    (cython) [email protected]:~/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction$ python clock_nvrtc_test.py
    Traceback (most recent call last):
      File "/home/nyck33/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction/clock_nvrtc_test.py", line 10, in <module>
        from examples.common import common
    ModuleNotFoundError: No module named 'examples'
    

    What am I doing wrong? I am looking at pypi package called absolufy-imports to try to get this going.

    opened by nyck33 3
  • Adopting a set of

    Adopting a set of "supported" python versions

    Right now the project doesn't have any set of explicitly supported python versions. NEP 29 provides an example of how this can be done:

    All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.

    Minimum Python ... version support should be adjusted upward on [a] major and minor release, but never on a patch release.

    This language also allows forecasting of python versions and forecasting (of some degree) of the resources required to maintain the project due to PEP 602 which normalizes the release schedule of python versions.

    There are at least two areas this practically impacts:

    • Support for version specific issues. Having a specified set of support versions allows some version specific issues to be termed in or out of scope, and be prioritized appropriately.
    • Binary distributions are currently made available on pypi and the nvidia channel of conda-forge, this bounds for which versions of python the binaries are targeted.
    opened by m3vaz 3
  • cudart.cudaSetDevice allocates memory on GPU other than target

    cudart.cudaSetDevice allocates memory on GPU other than target

    cuda-python 11.6.1 cuda toolkit 11.2 Ubuntu Linux

    If you run something like the following on a multi-GPU machine

    device_num = 5
    err, = cuda.cuInit(0)
    err, device = cuda.cuDeviceGet(device_num)
    err, cuda_context = cuda.cuCtxCreate(0, device)
    err, = cudart.cudaSetDevice(device)
    

    The call to cudart.cudaSetDevice will properly set your device to '5', but it will also allocate ~305 MB of memory on device 0 (or whichever is the 0th device in the device list provided by CUDA_VISIBLE_DEVICES). I think this issue (possibly in the C-CUDA runtime underneath?) may possibly be the root of many downstream issues in libraries like Tensorflow and Pytorch who have similar issues where a user selects a device but still gets tons of allocations on other devices. This 305 MB may not sound like a lot, but I'm running a program on an Nvidia-DGX with 16 GPUs and I have 64 worker processes, causing 64*305 = 19GB of unusable space to be allocated on GPU 0, which crashes the program. I cannot simply set CUDA_VISIBLE_DEVICES to correct this problem because the workers are communicating via shared GPU memory (via cuIPCMemHandle) with their parent process, and the parent process needs access to all GPUs. Additionally, the worker processes are performing data augmentation on one GPU, while writing output to another GPU with a different device ID.

    I am trying to investigate a workaround to not call 'cudart.cudaSetDevice' at all, but when it is not called I cannot properly use the pointer given by cuda.cuMemAlloc to create a PyTorch tensor. When I call cudart.cudaSetDevice, I am able to use the pointer properly.

    opened by QuiteAFoxtrot 3
  • Use python exceptions instead of `err, ... =`

    Use python exceptions instead of `err, ... =`

    Congratulations on the GA release! 🥳

    I've been looking forward to the cuda bindings for a while, and was just looking through the docs.

    The overview notes an implementation of ASSERT_DRV, which already contains the caveat:

    In a future release, this may automatically raise exceptions using a Python object model.

    I'm not sure if that means that the errors are going to be subclasses of something like a CUDAError, or if that is to be interpreted some other way, but in any case, I was quite surprised about this choice of exception API

    Why not make the functions raise err by default? Right now, IIUC, every invocation would need to accept an extra err-return (and handle it with something like ASSERT_DRV). This seems like a really onerous task to achieve the default behaviour of "fail in case of something unexpected" (and actively choosing where to introduce try... except: handling to continue even if things fail).

    It seems like a bad trade-off for me (high verbosity, and easy to forget adding an ASSERT_DRV), but maybe I'm overlooking something?

    The reasons I'm raising this right now, is that this would be a pretty fundamental API change, and if there's any chance at all (assuming it's not already "zero" after GA), it would be ASAP.

    opened by h-vetinari 3
  • ERROR  Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    ERROR Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    venv "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\Python.exe" Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)] Commit hash: Installing torch and torchvision Traceback (most recent call last): File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 227, in prepare_enviroment() File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 150, in prepare_enviroment run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch") File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 33, in run raise RuntimeError(message) RuntimeError: Couldn't install torch. Command: "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\python.exe" -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 Error code: 1 stdout: Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113

    stderr: ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none) ERROR: No matching distribution found for torch==1.12.1+cu113

    Press any key to continue . . .

    opened by GreatHK 2
  • First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    I'm trying to compile cuda-python in a fairly minimal conda environment (nothing installed but the requirements), with cuda-11.6 installed, and seeing several instances of the following sort of error:

          Error compiling Cython file:
          ------------------------------------------------------------
          ...
                  Get memory address of class instance
    
              """
              pass
    
          cdef class CUkernelNodeAttrValue_v1(CUlaunchAttributeValue_union):
                                             ^
          ------------------------------------------------------------
    
          cuda/cuda.pxd:2637:36: First base of 'CUkernelNodeAttrValue_v1' is not an extension type
    

    I do have other cuda versions installed alongside 11.6 but judging from the output of Parsing headers in "/usr/local/cuda-11.6/include" it seems like it's probably finding the right version? Any advice on how to get past this, or debug it? Thanks!

    opened by bertmaher 2
  • Remove duplicate code in vectorAddMMAP example

    Remove duplicate code in vectorAddMMAP example

    The code to determine granularity is duplicated, immediately after perforing that check, the same code exists. The second entry is being eliminated by this change.

    opened by pentschev 2
  • _ZSt28__throw_bad_array_new_lengthv

    _ZSt28__throw_bad_array_new_lengthv

    ~/cuda-python$ pip install -e .
    Obtaining file:///home/vinuj/cuda-python
    Requirement already satisfied: cython in /home/vinuj/anaconda3/lib/python3.9/site-packages (from cuda-python==11.7.1) (0.29.28)
    Installing collected packages: cuda-python
      Attempting uninstall: cuda-python
        Found existing installation: cuda-python 11.7.1
        Uninstalling cuda-python-11.7.1:
          Successfully uninstalled cuda-python-11.7.1
      Running setup.py develop for cuda-python
    
        from cuda import cuda, cudart
    ImportError: /home/vinuj/cuda-python/cuda/cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv
    
    
    opened by vinutah 2
  • Dropping Python 3.7

    Dropping Python 3.7

    We're considering dropping support for Python 3.7 for the next release. Per NEP 29, Python 3.7 drop schedule was almost a year ago and many associated libraries have already dropped it.

    Let us know if there's concerns in having Python 3.7 dropped next release. Thanks!

    opened by vzhurba01 0
  • No module named 'cuda._lib'; 'cuda' is not a package

    No module named 'cuda._lib'; 'cuda' is not a package

    After following the steps on cuda-python to install cuda-python with conda instruction, I try to

    from cuda import cuda, nvrtc
    

    as in the example in the pycharm python console, but it raises an error:

    Traceback (most recent call last):
      File "D:\Anaconda\envs\hierot\lib\code.py", line 90, in runcode
        exec(code, self.locals)
      File "<input>", line 1, in <module>
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
      File "cuda\cuda.pyx", line 1, in init cuda.cuda
        # Copyright 2021-2022 NVIDIA Corporation.  All rights reserved.
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
    ModuleNotFoundError: No module named 'cuda._lib'; 'cuda' is not a package
    

    But the code above can be successfully run in the terminal

    (hierot) D:\Projects\SimPlatform>python
    Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from cuda import cuda, nvrtc
    >>>
    

    Please help me with the problem, thanks in advance. Further information provided on request.

    I searched with

    ModuleNotFoundError: No module named 'xxx'
    

    Solutions suggest configure correct python interpreter, but I believe my interpreter is already properly configured.

    And search with

    No module named 'xxx'; 'yyy' is not a package
    

    Some says the cause is the name cuda is shadowed by the package name cuda, I think it might be the problem. Please check this.

    opened by HIEROT 0
  • Windows: ModuleNotFoundError: No module named 'win32api'

    Windows: ModuleNotFoundError: No module named 'win32api'

    Installing on Windows:

    python -m pip install cuda-python

    Then from python:

    from cuda import cuda

    Fails with

        File "cuda\cuda.pyx", line 1, in init cuda.cuda
    
        File "cuda\ccuda.pyx", line 1, in init cuda.ccuda
    
        File "cuda\_cuda\ccuda.pyx", line 8, in init cuda._cuda.ccuda
    
      ModuleNotFoundError: No module named 'win32api'
    

    I can fix this by installing pypiwin32 manually. But I think it should be listed in requirements.txt if platform_system is Windows.

    Thanks

    opened by ilyasher 0
  • more inference time in cuda env compared to cpu (occured only for a layer)

    more inference time in cuda env compared to cpu (occured only for a layer)

    Dear sir/madam: When I inference on a deep learning model (slowfast model), I'm facing a problem that my python program seems to take more inference time in cuda env compared to cpu. It's not the whole model but one specific layer takes more time on cuda env than cpu. I'm so confused that hope someone can help me with it. Here is the details. the specific layer is "slowway-conv1" layer as showned in the pic below representing the model structure of slowfast. image And my confusing result is as follows. the first for cuda and the second for cpu. image image In cuda env, I found the processing time of "conv1" (0.97s) accounts for a great proportion of the processing time of the whole model (1.04s), while in cpu env, the processing time of "conv1" (0.07s) only accounts for a very small proportion of the processing time of the whole model (4.43s). And I reckon that the proportion in cpu env is reasonable considering the calculation budget. Is my method of time measurement mistaken? I used the following code to measure time cost. image image If it's my fault that causing the confusing result, please kindly point out, or please give me some ideas to help me solve this problem. Thank you very much! Yours, Koala

    opened by koalaaaaaaaaa 0
  • cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    The current implementation of cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime for its version. This results in incorrect runtime versions if the runtime version is different from the version of cuda-python.

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/ccudart.pyx#L79-L82

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/utils.pyx#L37

    Additional context

    A workaround used in https://github.com/rapidsai/rmm/pull/946 is to use numba's API for this instead:

    import numba.cuda
    
    def cudaRuntimeGetVersion():
        major, minor = numba.cuda.runtime.get_version()
        return major * 1000 + minor * 10
    
    opened by bdice 6
Releases(v12.0.0)
MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images (ISBI 2021, MELBA 2021)

MultiMix This repository contains the implementation of MultiMix. Our publications for this project are listed below: "MultiMix: Sparingly Supervised,

Ayaan Haque 27 Dec 22, 2022
ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers Official implementation of ViewFormer. ViewFormer is a NeRF-free neural rend

Jonáš Kulhánek 169 Dec 30, 2022
This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Predicting Patient Outcomes with Graph Representation Learning This repository contains the code used for Predicting Patient Outcomes with Graph Repre

Emma Rocheteau 76 Dec 22, 2022
OCRA (Object-Centric Recurrent Attention) source code

OCRA (Object-Centric Recurrent Attention) source code Hossein Adeli and Seoyoung Ahn Please cite this article if you find this repository useful: For

Hossein Adeli 2 Jun 18, 2022
An addernet CUDA version

Training addernet accelerated by CUDA Usage cd adder_cuda python setup.py install cd .. python main.py Environment pytorch 1.10.0 CUDA 11.3 benchmark

LingXY 4 Jun 20, 2022
FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

HKBU High Performance Machine Learning Lab 6 Nov 18, 2022
Mmdet benchmark with python

mmdet_benchmark 本项目是为了研究 mmdet 推断性能瓶颈,并且对其进行优化。 配置与环境 机器配置 CPU:Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz GPU:NVIDIA GeForce RTX 3080 10GB 内存:64G 硬盘:1T

杨培文 (Yang Peiwen) 24 May 21, 2022
My take on a practical implementation of Linformer for Pytorch.

Linformer Pytorch Implementation A practical implementation of the Linformer paper. This is attention with only linear complexity in n, allowing for v

Peter 349 Dec 25, 2022
Code to reproduce the results for Compositional Attention

Compositional-Attention This repository contains the official implementation for the paper Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal 58 Nov 30, 2022
Automatic Differentiation Multipole Moment Molecular Forcefield

Automatic Differentiation Multipole Moment Molecular Forcefield Performance notes On a single gpu, using waterbox_31ang.pdb example from MPIDplugin wh

4 Jan 07, 2022
FairFuzz: AFL extension targeting rare branches

FairFuzz An AFL extension to increase code coverage by targeting rare branches. FairFuzz has a particular advantage on programs with highly nested str

Caroline Lemieux 222 Nov 16, 2022
Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection"

Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection". LRPDenseNet.py

Pedro Ricardo Ariel Salvador Bassi 2 Sep 21, 2022
Implementation of the paper "Generating Symbolic Reasoning Problems with Transformer GANs"

Generating Symbolic Reasoning Problems with Transformer GANs This is the implementation of the paper Generating Symbolic Reasoning Problems with Trans

Reactive Systems Group 1 Apr 18, 2022
PyJokes - Joking around with Python library pyjokes

Hi, it's Muhaimin again 👋 This is something unorthodox but cool. Don't forget t

Muhaimin A. Salay Kanton 1 Feb 02, 2022
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

Multimedia Research 196 Dec 13, 2022
Kindle is an easy model build package for PyTorch.

Kindle is an easy model build package for PyTorch. Building a deep learning model became so simple that almost all model can be made by copy and paste from other existing model codes. So why code? wh

Jongkuk Lim 77 Nov 11, 2022
Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

playlist-story-builder This project attempts to embed a story into a music playlist by sorting the playlist so that the order of the music follows a n

Dylan R. Ashley 0 Oct 28, 2021
AWS documentation corpus for zero-shot open-book question answering.

aws-documentation We present the AWS documentation corpus, an open-book QA dataset, which contains 25,175 documents along with 100 matched questions a

Sia Gholami 2 Jul 07, 2022
[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

CC 4.4k Dec 27, 2022
E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

End-to-end Music Remastering System This repository includes source code and pre

Junghyun (Tony) Koo 37 Dec 15, 2022