Memory efficient transducer loss computation

Overview

Introduction

This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to reduce the memory consumption for computing transducer loss.

How does it differ from the RNN-T loss from torchaudio

It produces same output as torchaudio for the same input, so optimized_transducer should be equivalent to torchaudio.functional.rnnt_loss().

This project is more memory efficient and potentially faster (TODO: This needs some benchmarks)

Also, torchaudio accepts only output from nn.Linear, but we also support output from log-softmax (You can set the option from_log_softmax to True in this case).

How does it differ from warp-transducer

It borrows the methods of computing alpha and beta from warp-transducer. Therefore, optimized_transducer produces the same alpha and beta as warp-transducer for the same input.

However, warp-transducer produces different gradients for CPU and CUDA when using the same input. See https://github.com/HawkAaron/warp-transducer/issues/93

This project produces consistent gradient on CPU and CUDA for the same input, just like what torchaudio is doing. (We borrow the gradient computation formula from torchaudio).

optimized_transducer uses less memory than that of warp-transducer and is potentially faster. (TODO: This needs some benchmarks).

Installation

You can install it via pip:

pip install optimized_transducer

To check that optimized_transducer was installed successfully, please run

python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"

which should print the version of the installed optimized_transducer, e.g., 1.2.

Installation FAQ

What operating systems are supported ?

It has been tested on Ubuntu 18.04. It should also work on macOS and other unixes systems. It may work on Windows, though it is not tested.

How to display installation log ?

Use

pip install --verbose optimized_transducer

How to reduce installation time ?

Use

export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -j to make.

Which version of PyTorch is supported ?

It has been tested on PyTorch >= 1.5.0. It may work on PyTorch < 1.5.0

How to install a CPU version of optimized_transducer ?

Use

export OT_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF"
export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF to cmake.

What Python versions are supported ?

Python >= 3.6 is known to work. It may work for Python 2.7, though it is not tested.

Where to get help if I have problems with the installation ?

Please file an issue at https://github.com/csukuangfj/optimized_transducer/issues and describe your problem there.

Usage

optimized_transducer expects that the output shape of the joint network is NOT (N, T, U, V), but is (sum_all_TU, V), which is a concatenation of 2-D tensors: (T_1 * U_1, V), (T_2 * U_2, V), ..., (T_N, U_N, V). Note: (T_1 * U_1, V) is just the reshape of a 3-D tensor (T_1, U_1, V).

Suppose your original joint network looks somewhat like the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out = encoder_out.unsqueeze(2) # Now encoder out is (N, T, 1, D)
decoder_out = decoder_out.unsqueeze(1) # Now decoder out is (N, 1, U, D)

x = encoder_out + decoder_out # x is of shape (N, T, U, D)
activation = torch.tanh(x)

logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = torchaudio.functional.rnnt_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
)

You need to change it to the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out_list = [encoder_out[i, :logit_lengths[i], :] for i in range(N)]
decoder_out_list = [decoder_out[i, :target_lengths[i]+1, :] for i in range(N)]

x = [e.unsqueeze(1) + d.unsqueeze(0) for e, d in zip(encoder_out_list, decoder_out_list)]
x = [p.reshape(-1, D) for p in x]
x = torch.cat(x)

activation = torch.tanh(x)
logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = optimized_transducer.transducer_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
    from_log_softmax=False,
)

Caution: We used from_log_softmax=False in the above example since logits is the output of nn.Linear.

Hint: If logits is the output of log-softmax, you should use from_log_softmax=True.

In most cases, you should pass the output of nn.Linear to compute the loss, i.e., use from_log_softmax=False, to save memory.

If you want to do some operations on the output of log-softmax before feeding it to optimized_transducer.transducer_loss(), from_log_softmax=True is helpful in this case. But be aware that this will increase the memory usage.

For more usages, please refer to

For developers

As a developer, you don't need to use pip install optimized_transducer. To make development easier, you can use

git clone https://github.com/csukuangfj/optimized_transducer.git
cd optimized_transducer
mkdir build
cd build
cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

I usually create a file path.sh inside the build direcotry, containing

export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

so what you need to do is

cd optimized_transducer/build
source path.sh

# Then you are ready to run Python tests
python3 optimized_transducer/python/tests/test_compute_transducer_loss.py

# You can also use "import optimized_transducer" in your Python projects

To run all Python tests, use

cd optimized_transducer/build
ctest --output-on-failure
Comments
  • Issue with optimized-transducer installation

    Issue with optimized-transducer installation

    I started installing K2, lhotse and Icefall. So far I was able to test K2 and it works perfectly, lhotse also works but when I tried to install icefall I got a weird issue about optimized-transducer. The log is below.

    Collecting kaldilm Using cached kaldilm-1.11-cp38-cp38-linux_x86_64.whl Collecting kaldialign Using cached kaldialign-0.2-cp38-cp38-linux_x86_64.whl Requirement already satisfied: sentencepiece>=0.1.96 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 3)) (0.1.96) Requirement already satisfied: tensorboard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (2.7.0) Requirement already satisfied: typeguard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (2.13.3) Collecting optimized_transducer Using cached optimized_transducer-1.3.tar.gz (47 kB) Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.8.1) Requirement already satisfied: werkzeug>=0.11.15 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.0.2) Requirement already satisfied: numpy>=1.12.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.21.2) Requirement already satisfied: protobuf>=3.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.19.3) Requirement already satisfied: wheel>=0.26 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.37.1) Requirement already satisfied: setuptools>=41.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (58.0.4) Requirement already satisfied: grpcio>=1.24.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.43.0) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.4.6) Requirement already satisfied: absl-py>=0.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.0.0) Requirement already satisfied: google-auth<3,>=1.6.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.3.3) Requirement already satisfied: requests<3,>=2.21.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.27.1) Requirement already satisfied: markdown>=2.6.8 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.3.6) Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.6.1) Requirement already satisfied: six in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from absl-py>=0.4->tensorboard->-r requirements.txt (line 4)) (1.16.0) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.2.4) Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.2.8) Requirement already satisfied: rsa<5,>=3.1.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.8) Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (1.3.0) Requirement already satisfied: importlib-metadata>=4.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (4.10.1) Requirement already satisfied: zipp>=0.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (3.7.0) Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.4.8) Requirement already satisfied: charset-normalizer~=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2.0.10) Requirement already satisfied: certifi>=2017.4.17 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2021.10.8) Requirement already satisfied: idna<4,>=2.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (3.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (1.26.8) Requirement already satisfied: oauthlib>=3.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (3.1.1) Building wheels for collected packages: optimized-transducer Building wheel for optimized-transducer (setup.py): started Building wheel for optimized-transducer (setup.py): finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-qa004082 cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (153 lines): running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

              cd build/temp.linux-x86_64-3.8
    
              cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
              make  _optimized_transducer
    

    -- Enabled languages: CXX;CUDA -- The CXX compiler identification is GNU 6.5.0 -- The CUDA compiler identification is NVIDIA 11.1.74 -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86 -- Adding arch 35 -- Adding arch 50 -- Adding arch 60 -- Adding arch 61 -- Adding arch 70 -- Adding arch 75 -- Adding arch 80 -- Adding arch 86 -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86 -- Downloading pybind11 -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src -- pybind11 v2.6.0 -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12") -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -- Looking for C++ include pthread.h -- Looking for C++ include pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package): Policy CMP0074 is not set: find_package uses _ROOT variables. Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

    Environment variable CUDA_ROOT is set to:
    
      /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
    For compatibility, CMake is ignoring the variable.
    

    Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include) This warning is for project developers. Use -Wno-dev to suppress it.

    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1") -- Caffe2: CUDA detected: 11.1 -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0 -- Caffe2: Header version is: 11.1 -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message): Caffe2: Cannot find cuDNN library. Turning the option off Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message): Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN libraries. Please set the proper cuDNN prefixes and / or install cuDNN. Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- Configuring incomplete, errors occurred! See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log". See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log". make: *** No rule to make target `_optimized_transducer'. Stop. Traceback (most recent call last): File "", line 1, in File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in setuptools.setup( File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup return distutils.core.setup(**attrs) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup dist.run_commands() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 299, in run self.run_command('build') File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run _build_ext.run(self) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension raise Exception( Exception: Build optimized_transducer failed. Please check the error message. You can ask for help by creating an issue on GitHub.

    Click: https://github.com/csukuangfj/optimized_transducer/issues/new


    ERROR: Failed building wheel for optimized-transducer Running setup.py clean for optimized-transducer Failed to build optimized-transducer Installing collected packages: optimized-transducer, kaldilm, kaldialign Running setup.py install for optimized-transducer: started Running setup.py install for optimized-transducer: finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (155 lines): running install running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

                cd build/temp.linux-x86_64-3.8
    
                cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
                make  _optimized_transducer
    
    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 6.5.0
    -- The CUDA compiler identification is NVIDIA 11.1.74
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86
    -- Adding arch 35
    -- Adding arch 50
    -- Adding arch 60
    -- Adding arch 61
    -- Adding arch 70
    -- Adding arch 75
    -- Adding arch 80
    -- Adding arch 86
    -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86
    -- Downloading pybind11
    -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12")
    -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package):
      Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
      Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
      command to set the policy and suppress this warning.
    
      Environment variable CUDA_ROOT is set to:
    
        /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
      For compatibility, CMake is ignoring the variable.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1")
    -- Caffe2: CUDA detected: 11.1
    -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0
    -- Caffe2: Header version is: 11.1
    -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH)
    CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message):
      Caffe2: Cannot find cuDNN library.  Turning the option off
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):
      Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
      libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log".
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log".
    make: *** No rule to make target `_optimized_transducer'.  Stop.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in <module>
        setuptools.setup(
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension
        raise Exception(
    Exception:
    Build optimized_transducer failed. Please check the error message.
    You can ask for help by creating an issue on GitHub.
    
    Click:
    	https://github.com/csukuangfj/optimized_transducer/issues/new
    
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer Check the logs for full command output.

    opened by AmirHussein96 8
  • Warprnnt gradient for CPU

    Warprnnt gradient for CPU

    @csukuangfj Just wanted to note that the gradient is not incorrect for CPU vs GPU, the instructions clearly state that for CPU you need to provide log_softmax(joint-logits) whereas for the GPU you should only provide joint-logits since the cuda kernel will efficiently compute the log_softmax internally.

    Anyway yours is also an efficient implementation, also written in c++, could you benchmark the solutions if you have time ? Even a naive one would give some hint as to speed in relative terms. The memory efficient implementation of yours is very interesting too, which reduces speed but saves a lot of memory.

    opened by titu1994 2
  • "ModuleNotFoundError: No module named '_optimized_transducer'" when testing.

    I install the optimized_transducer as follows:

    git clone https://github.com/csukuangfj/optimized_transducer.git
    cd optimized_transducer
    mkdir build
    cd build
    cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
    export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH
    

    The cmake log as follows:

    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 7.5.0
    -- The CUDA compiler identification is NVIDIA 10.1.243
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Autodetected CUDA architecture(s):  7.0
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_70,code=sm_70
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75
    -- Skipping arch 35
    -- Skipping arch 50
    -- Skipping arch 60
    -- Skipping arch 61
    -- Adding arch 70
    -- Skipping arch 75
    -- OT_COMPUTE_ARCHS: 70
    -- Downloading pybind11
    -- pybind11 is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python (found version "3.8.11")
    -- Found PythonLibs: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    -- Found CUDA: /usr/local/cuda (found version "10.1")
    -- Caffe2: CUDA detected: 10.1
    -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /usr/local/cuda
    -- Caffe2: Header version is: 10.1
    -- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
    -- Found cuDNN: v7.6.2  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
    -- Autodetected CUDA architecture(s):  7.0
    -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70
    -- Found Torch: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/lib/libtorch.so
    -- PyTorch version: 1.7.0+cu101
    -- PyTorch cuda version: 10.1
    -- Use FetchContent provided by k2
    -- Downloading googletest
    
    -- googletest is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-src
    -- googletest's binary dir is /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-build
    -- The C compiler identification is GNU 7.5.0
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Downloading moderngpu
    -- moderngpu is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/moderngpu-src
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /ceph-meixu/luomingshuang/optimized_transducer/build
    

    But when I use python optimized_transducer/python/tests/test_compute_transducer_loss.py for testing, there is an error as follows:

    /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
      warnings.warn(
    Traceback (most recent call last):
      File "optimized_transducer/python/tests/test_compute_transducer_loss.py", line 8, in <module>
        import optimized_transducer
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ModuleNotFoundError: No module named '_optimized_transducer'
    

    Hope to know how I can solve it. Thanks!

    opened by luomingshuang 2
  • Update transducer-loss.h

    Update transducer-loss.h

    I found that https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/csrc/transducer-loss.h#L17 and https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/python/tests/test_compute_transducer_loss.py#L61 were not Inconsistent. I think that the front was not correct. Here I fixed it. @csukuangfj , what do you think?

    opened by shanguanma 2
  • fix for CMakeLists.txt

    fix for CMakeLists.txt

    When I run make -j in the build dir, there is an error happens: error: #error C++14 or later compatible compiler is required to use ATen.. So I add the following two commands to CMakeLists.txt and the make -j process can run successfully.

    set(CMAKE_CXX_STANDARD 14)
    set(CMAKE_CXX_STANDARD_REQUIRED ON)
    

    I'm not sure if the above two commands are necesary for the CMakeLists.txt in all environments.

    opened by luomingshuang 1
  • Fix installation on macOS.

    Fix installation on macOS.

    To fix the following error when running

    python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"
    

    on macOS:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ImportError: dlopen(/Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so, 2): Symbol not found: _THPVariableClass
      Referenced from: /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
      Expected in: flat namespace
     in /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
    
    opened by csukuangfj 0
  • Disable warp level parallel reduction

    Disable warp level parallel reduction

    Somehow it produces incorrect alpha and beta for a large value of sum_all_TU using warps.

    We disable warp level parallel reduction for now and use the method from https://github.com/HawkAaron/warp-transducer to compute alpha and beta.

    Will revisit the issues about warps after gaining more experience with CUDA programming.

    opened by csukuangfj 0
  • transducer grad compute formular

    transducer grad compute formular

    The formular for gradient is below inwarprnnt_numba and warp_transducer cpu:

        T, U, _ = log_probs.shape
        grads = np.full(log_probs.shape, -float("inf"))
        log_like = betas[0, 0]  # == alphas[T - 1, U - 1] + betas[T - 1, U - 1]
    
        # // grad to last blank transition
        grads[T - 1, U - 1, blank] = alphas[T - 1, U - 1]
        grads[: T - 1, :, blank] = alphas[: T - 1, :] + betas[1:, :]
    
        # // grad to label transition
        for u, l in enumerate(labels):
            grads[:, u, l] = alphas[:, u] + betas[:, u + 1]
    
        grads = -np.exp(grads + log_probs - log_like)
    

    that is not same to torchaudio, optimized_transducer and ,warp_transducer gpu, but you said that warp_transducer cpu grad is same to optimized_transducer and torchaudio, how that is achieved?

    opened by zh794390558 9
  • install error

    install error

    1. CUDA_cublas_LIBRARY not found error when compiling ,my cuda version 10.2
    2. /usr/include/c++/7/bits/basic_string.tcc(1067): error: expression must have pointer type detected during: instantiation of "std::basic_string<_CharT, _Traits, _Alloc>::_Rep *std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc &) [with _CharT=char16_t, _Traits=std::char_traits<char16_t>, _Alloc=std::allocator<char16_t>]"

    To Fix the above two problems, I have to use root to modify some settings of the linux system. Is there any better solution?

    opened by zmqwer 0
  • loss value and decode library?

    loss value and decode library?

    thanks very much for your great project! I have two questions to ask: 1. how big is the the transducer loss for a well performed model? or the model is converged? 2. is there any fast decode solution? I found the decode module in many project implementing the beam search decode algorithm is extremely slow

    opened by xiongjun19 10
Releases(v1.4)
Owner
Fangjun Kuang
Was vorbei ist, ist vorbei.
Fangjun Kuang
PyTorch DepthNet Training on Still Box dataset

DepthNet training on Still Box Project page This code can replicate the results of our paper that was published in UAVg-17. If you use this repo in yo

Clément Pinard 115 Nov 21, 2022
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

waldo.vision 542 Dec 03, 2022
Automatic voice-synthetised summaries of latest research papers on arXiv

PaperWhisperer PaperWhisperer is a Python application that keeps you up-to-date with research papers. How? It retrieves the latest articles from arXiv

Valerio Velardo 124 Dec 20, 2022
A PyTorch implementation of PointRend: Image Segmentation as Rendering

PointRend A PyTorch implementation of PointRend: Image Segmentation as Rendering [arxiv] [Official Implementation: Detectron2] This repo for Only Sema

AhnDW 336 Dec 26, 2022
9th place solution in "Santa 2020 - The Candy Cane Contest"

Santa 2020 - The Candy Cane Contest My solution in this Kaggle competition "Santa 2020 - The Candy Cane Contest", 9th place. Basic Strategy In this co

toshi_k 22 Nov 26, 2021
Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

RandWireNN Unofficial PyTorch Implementation of: Exploring Randomly Wired Neural Networks for Image Recognition. Results Validation result on Imagenet

Seung-won Park 684 Nov 02, 2022
offical implement of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021

LifelongReID Offical implementation of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021 by Nan Pu, Wei Chen, Yu L

PeterPu 76 Dec 08, 2022
Official implementation of Densely connected normalizing flows

Densely connected normalizing flows This repository is the official implementation of NeurIPS 2021 paper Densely connected normalizing flows. Poster a

Matej Grcić 31 Dec 12, 2022
Qcover is an open source effort to help exploring combinatorial optimization problems in Noisy Intermediate-scale Quantum(NISQ) processor.

Qcover is an open source effort to help exploring combinatorial optimization problems in Noisy Intermediate-scale Quantum(NISQ) processor. It is devel

33 Nov 11, 2022
Hummingbird compiles trained ML models into tensor computation for faster inference.

Hummingbird Introduction Hummingbird is a library for compiling trained traditional ML models into tensor computations. Hummingbird allows users to se

Microsoft 3.1k Dec 30, 2022
Official Codes for Graph Modularity:Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks.

Dynamic-Graphs-Construction Official Codes for Graph Modularity:Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Ne

11 Dec 14, 2022
Dilated Convolution with Learnable Spacings PyTorch

Dilated-Convolution-with-Learnable-Spacings-PyTorch Ismail Khalfaoui Hassani Dilated Convolution with Learnable Spacings (abbreviated to DCLS) is a no

15 Dec 09, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

1.1k Jan 03, 2023
A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

Graph2SMILES A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction. 1. Environmental setup System requirements Ubuntu:

29 Nov 18, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 04, 2023
DSL for matching Python ASTs

py-ast-rule-engine This library provides a DSL (domain-specific language) to match a pattern inside a Python AST (abstract syntax tree). The library i

1 Dec 18, 2021
Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

Intro Real-time object detection and classification. Paper: version 1, version 2. Read more about YOLO (in darknet) and download weight files here. In

Trieu 6.1k Jan 04, 2023
Local Attention - Flax module for Jax

Local Attention - Flax Autoregressive Local Attention - Flax module for Jax Install $ pip install local-attention-flax Usage from jax import random fr

Phil Wang 16 Jun 16, 2022
Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

CMPC-Refseg Code of our CVPR 2020 paper Referring Image Segmentation via Cross-Modal Progressive Comprehension. Shaofei Huang*, Tianrui Hui*, Si Liu,

spyflying 55 Dec 01, 2022