Memory efficient transducer loss computation

Overview

Introduction

This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to reduce the memory consumption for computing transducer loss.

How does it differ from the RNN-T loss from torchaudio

It produces same output as torchaudio for the same input, so optimized_transducer should be equivalent to torchaudio.functional.rnnt_loss().

This project is more memory efficient and potentially faster (TODO: This needs some benchmarks)

Also, torchaudio accepts only output from nn.Linear, but we also support output from log-softmax (You can set the option from_log_softmax to True in this case).

How does it differ from warp-transducer

It borrows the methods of computing alpha and beta from warp-transducer. Therefore, optimized_transducer produces the same alpha and beta as warp-transducer for the same input.

However, warp-transducer produces different gradients for CPU and CUDA when using the same input. See https://github.com/HawkAaron/warp-transducer/issues/93

This project produces consistent gradient on CPU and CUDA for the same input, just like what torchaudio is doing. (We borrow the gradient computation formula from torchaudio).

optimized_transducer uses less memory than that of warp-transducer and is potentially faster. (TODO: This needs some benchmarks).

Installation

You can install it via pip:

pip install optimized_transducer

To check that optimized_transducer was installed successfully, please run

python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"

which should print the version of the installed optimized_transducer, e.g., 1.2.

Installation FAQ

What operating systems are supported ?

It has been tested on Ubuntu 18.04. It should also work on macOS and other unixes systems. It may work on Windows, though it is not tested.

How to display installation log ?

Use

pip install --verbose optimized_transducer

How to reduce installation time ?

Use

export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -j to make.

Which version of PyTorch is supported ?

It has been tested on PyTorch >= 1.5.0. It may work on PyTorch < 1.5.0

How to install a CPU version of optimized_transducer ?

Use

export OT_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF"
export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF to cmake.

What Python versions are supported ?

Python >= 3.6 is known to work. It may work for Python 2.7, though it is not tested.

Where to get help if I have problems with the installation ?

Please file an issue at https://github.com/csukuangfj/optimized_transducer/issues and describe your problem there.

Usage

optimized_transducer expects that the output shape of the joint network is NOT (N, T, U, V), but is (sum_all_TU, V), which is a concatenation of 2-D tensors: (T_1 * U_1, V), (T_2 * U_2, V), ..., (T_N, U_N, V). Note: (T_1 * U_1, V) is just the reshape of a 3-D tensor (T_1, U_1, V).

Suppose your original joint network looks somewhat like the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out = encoder_out.unsqueeze(2) # Now encoder out is (N, T, 1, D)
decoder_out = decoder_out.unsqueeze(1) # Now decoder out is (N, 1, U, D)

x = encoder_out + decoder_out # x is of shape (N, T, U, D)
activation = torch.tanh(x)

logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = torchaudio.functional.rnnt_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
)

You need to change it to the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out_list = [encoder_out[i, :logit_lengths[i], :] for i in range(N)]
decoder_out_list = [decoder_out[i, :target_lengths[i]+1, :] for i in range(N)]

x = [e.unsqueeze(1) + d.unsqueeze(0) for e, d in zip(encoder_out_list, decoder_out_list)]
x = [p.reshape(-1, D) for p in x]
x = torch.cat(x)

activation = torch.tanh(x)
logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = optimized_transducer.transducer_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
    from_log_softmax=False,
)

Caution: We used from_log_softmax=False in the above example since logits is the output of nn.Linear.

Hint: If logits is the output of log-softmax, you should use from_log_softmax=True.

In most cases, you should pass the output of nn.Linear to compute the loss, i.e., use from_log_softmax=False, to save memory.

If you want to do some operations on the output of log-softmax before feeding it to optimized_transducer.transducer_loss(), from_log_softmax=True is helpful in this case. But be aware that this will increase the memory usage.

For more usages, please refer to

For developers

As a developer, you don't need to use pip install optimized_transducer. To make development easier, you can use

git clone https://github.com/csukuangfj/optimized_transducer.git
cd optimized_transducer
mkdir build
cd build
cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

I usually create a file path.sh inside the build direcotry, containing

export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

so what you need to do is

cd optimized_transducer/build
source path.sh

# Then you are ready to run Python tests
python3 optimized_transducer/python/tests/test_compute_transducer_loss.py

# You can also use "import optimized_transducer" in your Python projects

To run all Python tests, use

cd optimized_transducer/build
ctest --output-on-failure
Comments
  • Issue with optimized-transducer installation

    Issue with optimized-transducer installation

    I started installing K2, lhotse and Icefall. So far I was able to test K2 and it works perfectly, lhotse also works but when I tried to install icefall I got a weird issue about optimized-transducer. The log is below.

    Collecting kaldilm Using cached kaldilm-1.11-cp38-cp38-linux_x86_64.whl Collecting kaldialign Using cached kaldialign-0.2-cp38-cp38-linux_x86_64.whl Requirement already satisfied: sentencepiece>=0.1.96 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 3)) (0.1.96) Requirement already satisfied: tensorboard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (2.7.0) Requirement already satisfied: typeguard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (2.13.3) Collecting optimized_transducer Using cached optimized_transducer-1.3.tar.gz (47 kB) Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.8.1) Requirement already satisfied: werkzeug>=0.11.15 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.0.2) Requirement already satisfied: numpy>=1.12.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.21.2) Requirement already satisfied: protobuf>=3.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.19.3) Requirement already satisfied: wheel>=0.26 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.37.1) Requirement already satisfied: setuptools>=41.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (58.0.4) Requirement already satisfied: grpcio>=1.24.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.43.0) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.4.6) Requirement already satisfied: absl-py>=0.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.0.0) Requirement already satisfied: google-auth<3,>=1.6.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.3.3) Requirement already satisfied: requests<3,>=2.21.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.27.1) Requirement already satisfied: markdown>=2.6.8 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.3.6) Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.6.1) Requirement already satisfied: six in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from absl-py>=0.4->tensorboard->-r requirements.txt (line 4)) (1.16.0) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.2.4) Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.2.8) Requirement already satisfied: rsa<5,>=3.1.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.8) Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (1.3.0) Requirement already satisfied: importlib-metadata>=4.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (4.10.1) Requirement already satisfied: zipp>=0.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (3.7.0) Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.4.8) Requirement already satisfied: charset-normalizer~=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2.0.10) Requirement already satisfied: certifi>=2017.4.17 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2021.10.8) Requirement already satisfied: idna<4,>=2.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (3.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (1.26.8) Requirement already satisfied: oauthlib>=3.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (3.1.1) Building wheels for collected packages: optimized-transducer Building wheel for optimized-transducer (setup.py): started Building wheel for optimized-transducer (setup.py): finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-qa004082 cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (153 lines): running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

              cd build/temp.linux-x86_64-3.8
    
              cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
              make  _optimized_transducer
    

    -- Enabled languages: CXX;CUDA -- The CXX compiler identification is GNU 6.5.0 -- The CUDA compiler identification is NVIDIA 11.1.74 -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86 -- Adding arch 35 -- Adding arch 50 -- Adding arch 60 -- Adding arch 61 -- Adding arch 70 -- Adding arch 75 -- Adding arch 80 -- Adding arch 86 -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86 -- Downloading pybind11 -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src -- pybind11 v2.6.0 -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12") -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -- Looking for C++ include pthread.h -- Looking for C++ include pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package): Policy CMP0074 is not set: find_package uses _ROOT variables. Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

    Environment variable CUDA_ROOT is set to:
    
      /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
    For compatibility, CMake is ignoring the variable.
    

    Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include) This warning is for project developers. Use -Wno-dev to suppress it.

    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1") -- Caffe2: CUDA detected: 11.1 -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0 -- Caffe2: Header version is: 11.1 -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message): Caffe2: Cannot find cuDNN library. Turning the option off Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message): Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN libraries. Please set the proper cuDNN prefixes and / or install cuDNN. Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- Configuring incomplete, errors occurred! See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log". See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log". make: *** No rule to make target `_optimized_transducer'. Stop. Traceback (most recent call last): File "", line 1, in File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in setuptools.setup( File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup return distutils.core.setup(**attrs) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup dist.run_commands() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 299, in run self.run_command('build') File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run _build_ext.run(self) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension raise Exception( Exception: Build optimized_transducer failed. Please check the error message. You can ask for help by creating an issue on GitHub.

    Click: https://github.com/csukuangfj/optimized_transducer/issues/new


    ERROR: Failed building wheel for optimized-transducer Running setup.py clean for optimized-transducer Failed to build optimized-transducer Installing collected packages: optimized-transducer, kaldilm, kaldialign Running setup.py install for optimized-transducer: started Running setup.py install for optimized-transducer: finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (155 lines): running install running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

                cd build/temp.linux-x86_64-3.8
    
                cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
                make  _optimized_transducer
    
    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 6.5.0
    -- The CUDA compiler identification is NVIDIA 11.1.74
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86
    -- Adding arch 35
    -- Adding arch 50
    -- Adding arch 60
    -- Adding arch 61
    -- Adding arch 70
    -- Adding arch 75
    -- Adding arch 80
    -- Adding arch 86
    -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86
    -- Downloading pybind11
    -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12")
    -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package):
      Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
      Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
      command to set the policy and suppress this warning.
    
      Environment variable CUDA_ROOT is set to:
    
        /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
      For compatibility, CMake is ignoring the variable.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1")
    -- Caffe2: CUDA detected: 11.1
    -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0
    -- Caffe2: Header version is: 11.1
    -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH)
    CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message):
      Caffe2: Cannot find cuDNN library.  Turning the option off
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):
      Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
      libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log".
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log".
    make: *** No rule to make target `_optimized_transducer'.  Stop.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in <module>
        setuptools.setup(
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension
        raise Exception(
    Exception:
    Build optimized_transducer failed. Please check the error message.
    You can ask for help by creating an issue on GitHub.
    
    Click:
    	https://github.com/csukuangfj/optimized_transducer/issues/new
    
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer Check the logs for full command output.

    opened by AmirHussein96 8
  • Warprnnt gradient for CPU

    Warprnnt gradient for CPU

    @csukuangfj Just wanted to note that the gradient is not incorrect for CPU vs GPU, the instructions clearly state that for CPU you need to provide log_softmax(joint-logits) whereas for the GPU you should only provide joint-logits since the cuda kernel will efficiently compute the log_softmax internally.

    Anyway yours is also an efficient implementation, also written in c++, could you benchmark the solutions if you have time ? Even a naive one would give some hint as to speed in relative terms. The memory efficient implementation of yours is very interesting too, which reduces speed but saves a lot of memory.

    opened by titu1994 2
  • "ModuleNotFoundError: No module named '_optimized_transducer'" when testing.

    I install the optimized_transducer as follows:

    git clone https://github.com/csukuangfj/optimized_transducer.git
    cd optimized_transducer
    mkdir build
    cd build
    cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
    export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH
    

    The cmake log as follows:

    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 7.5.0
    -- The CUDA compiler identification is NVIDIA 10.1.243
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Autodetected CUDA architecture(s):  7.0
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_70,code=sm_70
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75
    -- Skipping arch 35
    -- Skipping arch 50
    -- Skipping arch 60
    -- Skipping arch 61
    -- Adding arch 70
    -- Skipping arch 75
    -- OT_COMPUTE_ARCHS: 70
    -- Downloading pybind11
    -- pybind11 is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python (found version "3.8.11")
    -- Found PythonLibs: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    -- Found CUDA: /usr/local/cuda (found version "10.1")
    -- Caffe2: CUDA detected: 10.1
    -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /usr/local/cuda
    -- Caffe2: Header version is: 10.1
    -- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
    -- Found cuDNN: v7.6.2  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
    -- Autodetected CUDA architecture(s):  7.0
    -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70
    -- Found Torch: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/lib/libtorch.so
    -- PyTorch version: 1.7.0+cu101
    -- PyTorch cuda version: 10.1
    -- Use FetchContent provided by k2
    -- Downloading googletest
    
    -- googletest is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-src
    -- googletest's binary dir is /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-build
    -- The C compiler identification is GNU 7.5.0
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Downloading moderngpu
    -- moderngpu is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/moderngpu-src
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /ceph-meixu/luomingshuang/optimized_transducer/build
    

    But when I use python optimized_transducer/python/tests/test_compute_transducer_loss.py for testing, there is an error as follows:

    /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
      warnings.warn(
    Traceback (most recent call last):
      File "optimized_transducer/python/tests/test_compute_transducer_loss.py", line 8, in <module>
        import optimized_transducer
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ModuleNotFoundError: No module named '_optimized_transducer'
    

    Hope to know how I can solve it. Thanks!

    opened by luomingshuang 2
  • Update transducer-loss.h

    Update transducer-loss.h

    I found that https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/csrc/transducer-loss.h#L17 and https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/python/tests/test_compute_transducer_loss.py#L61 were not Inconsistent. I think that the front was not correct. Here I fixed it. @csukuangfj , what do you think?

    opened by shanguanma 2
  • fix for CMakeLists.txt

    fix for CMakeLists.txt

    When I run make -j in the build dir, there is an error happens: error: #error C++14 or later compatible compiler is required to use ATen.. So I add the following two commands to CMakeLists.txt and the make -j process can run successfully.

    set(CMAKE_CXX_STANDARD 14)
    set(CMAKE_CXX_STANDARD_REQUIRED ON)
    

    I'm not sure if the above two commands are necesary for the CMakeLists.txt in all environments.

    opened by luomingshuang 1
  • Fix installation on macOS.

    Fix installation on macOS.

    To fix the following error when running

    python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"
    

    on macOS:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ImportError: dlopen(/Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so, 2): Symbol not found: _THPVariableClass
      Referenced from: /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
      Expected in: flat namespace
     in /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
    
    opened by csukuangfj 0
  • Disable warp level parallel reduction

    Disable warp level parallel reduction

    Somehow it produces incorrect alpha and beta for a large value of sum_all_TU using warps.

    We disable warp level parallel reduction for now and use the method from https://github.com/HawkAaron/warp-transducer to compute alpha and beta.

    Will revisit the issues about warps after gaining more experience with CUDA programming.

    opened by csukuangfj 0
  • transducer grad compute formular

    transducer grad compute formular

    The formular for gradient is below inwarprnnt_numba and warp_transducer cpu:

        T, U, _ = log_probs.shape
        grads = np.full(log_probs.shape, -float("inf"))
        log_like = betas[0, 0]  # == alphas[T - 1, U - 1] + betas[T - 1, U - 1]
    
        # // grad to last blank transition
        grads[T - 1, U - 1, blank] = alphas[T - 1, U - 1]
        grads[: T - 1, :, blank] = alphas[: T - 1, :] + betas[1:, :]
    
        # // grad to label transition
        for u, l in enumerate(labels):
            grads[:, u, l] = alphas[:, u] + betas[:, u + 1]
    
        grads = -np.exp(grads + log_probs - log_like)
    

    that is not same to torchaudio, optimized_transducer and ,warp_transducer gpu, but you said that warp_transducer cpu grad is same to optimized_transducer and torchaudio, how that is achieved?

    opened by zh794390558 9
  • install error

    install error

    1. CUDA_cublas_LIBRARY not found error when compiling ,my cuda version 10.2
    2. /usr/include/c++/7/bits/basic_string.tcc(1067): error: expression must have pointer type detected during: instantiation of "std::basic_string<_CharT, _Traits, _Alloc>::_Rep *std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc &) [with _CharT=char16_t, _Traits=std::char_traits<char16_t>, _Alloc=std::allocator<char16_t>]"

    To Fix the above two problems, I have to use root to modify some settings of the linux system. Is there any better solution?

    opened by zmqwer 0
  • loss value and decode library?

    loss value and decode library?

    thanks very much for your great project! I have two questions to ask: 1. how big is the the transducer loss for a well performed model? or the model is converged? 2. is there any fast decode solution? I found the decode module in many project implementing the beam search decode algorithm is extremely slow

    opened by xiongjun19 10
Releases(v1.4)
Owner
Fangjun Kuang
Was vorbei ist, ist vorbei.
Fangjun Kuang
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

Integrated Gradients This is the pytorch implementation of "Axiomatic Attribution for Deep Networks". The original tensorflow version could be found h

Tianhong Dai 150 Dec 23, 2022
transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

transfer_adv CVPR-2021 AIC-VI: unrestricted Adversarial Attacks on ImageNet CVPR2021 安全AI挑战者计划第六期赛道2:ImageNet无限制对抗攻击 介绍 : 深度神经网络已经在各种视觉识别问题上取得了最先进的性能。

25 Dec 08, 2022
Code for Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022)

Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022) We consider how a user of a web servi

joisino 20 Aug 21, 2022
[ICLR2021oral] Rethinking Architecture Selection in Differentiable NAS

DARTS-PT Code accompanying the paper ICLR'2021: Rethinking Architecture Selection in Differentiable NAS Ruochen Wang, Minhao Cheng, Xiangning Chen, Xi

Ruochen Wang 86 Dec 27, 2022
Python scripts form performing stereo depth estimation using the HITNET model in ONNX.

ONNX-HITNET-Stereo-Depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in ONNX. Stereo depth estimation on

Ibai Gorordo 30 Nov 08, 2022
For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

LongScientificFormer For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training. Some code

Athar Sefid 6 Nov 02, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Portfolio analytics for quants, written in Python

QuantStats: Portfolio analytics for quants QuantStats Python library that performs portfolio profiling, allowing quants and portfolio managers to unde

Ran Aroussi 2.7k Jan 08, 2023
Near-Duplicate Video Retrieval with Deep Metric Learning

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

2 Jan 24, 2022
NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

[Official] FINE Samples for Learning with Noisy Labels This repository is the official implementation of "FINE Samples for Learning with Noisy Labels"

mythbuster 27 Dec 23, 2022
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and

TuZheng 405 Jan 04, 2023
A TikTok-like recommender system for GitHub repositories based on Gorse

GitRec GitRec is the missing recommender system for GitHub repositories based on Gorse. Architecture The trending crawler crawls trending repositories

337 Jan 04, 2023
Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation This repository contains MegEngine implementation of ou

MEGVII Research 309 Dec 30, 2022
Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Yihong Sun 12 Nov 15, 2022
Fusion-in-Decoder Distilling Knowledge from Reader to Retriever for Question Answering

This repository contains code for: Fusion-in-Decoder models Distilling Knowledge from Reader to Retriever Dependencies Python 3 PyTorch (currently tes

Meta Research 323 Dec 19, 2022
Unity Propagation in Bayesian Networks Handling Inconsistency via Unity Smoothing

This repository contains the scripts needed to generate the results from the paper Unity Propagation in Bayesian Networks Handling Inconsistency via U

0 Jan 19, 2022
Fast SHAP value computation for interpreting tree-based models

FastTreeSHAP FastTreeSHAP package is built based on the paper Fast TreeSHAP: Accelerating SHAP Value Computation for Trees published in NeurIPS 2021 X

LinkedIn 369 Jan 04, 2023
Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

Cross Domain Facial Expression Recognition Benchmark Implementation of papers: Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchm

89 Dec 09, 2022