PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Overview

Poincaré Embeddings for Learning Hierarchical Representations

PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

wn-nouns.jpg

Installation

Simply clone this repository via

git clone https://github.com/facebookresearch/poincare-embeddings.git
cd poincare-embeddings
conda env create -f environment.yml
source activate poincare
python setup.py build_ext --inplace

Example: Embedding WordNet Mammals

To embed the transitive closure of the WordNet mammals subtree, first generate the data via

cd wordnet
python transitive_closure.py

This will generate the transitive closure of the full noun hierarchy as well as of the mammals subtree of WordNet.

To embed the mammals subtree in the reconstruction setting (i.e., without missing data), go to the root directory of the project and run

./train-mammals.sh

This shell script includes the appropriate parameter settings for the mammals subtree and saves the trained model as mammals.pth.

An identical script to learn embeddings of the entire noun hierarchy is located at train-nouns.sh. This script contains the hyperparameter setting to reproduce the results for 10-dimensional embeddings of (Nickel & Kiela, 2017). The hyperparameter setting to reproduce the MAP results are provided as comments in the script.

The embeddings are trained via multithreaded async SGD. In the example above, the number of threads is set to a conservative setting (NHTREADS=2) which should run well even on smaller machines. On machines with many cores, increase NTHREADS for faster convergence.

Dependencies

  • Python 3 with NumPy
  • PyTorch
  • Scikit-Learn
  • NLTK (to generate the WordNet data)

References

If you find this code useful for your research, please cite the following paper in your publication:

@incollection{nickel2017poincare,
  title = {Poincar\'{e} Embeddings for Learning Hierarchical Representations},
  author = {Nickel, Maximilian and Kiela, Douwe},
  booktitle = {Advances in Neural Information Processing Systems 30},
  editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
  pages = {6341--6350},
  year = {2017},
  publisher = {Curran Associates, Inc.},
  url = {http://papers.nips.cc/paper/7213-poincare-embeddings-for-learning-hierarchical-representations.pdf}
}

License

This code is licensed under CC-BY-NC 4.0.

https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg

Comments
  • Question about evaluation: mean rank and mAP

    Question about evaluation: mean rank and mAP

    Hi, I am new to this task and want to know the relation between mean rank and mAP. I am reproducing the result of link prediction task and trained TransE model as well as Poincare model. When I evaluation those two models, I found that TransE may get higher mean rank and higher MAP ,but Poincare may get lower mean rank and lower MAP. Should it always be lower mean rank and higher MAP? Are there some differences between those two ways of evaluation? Or maybe there is something wrong with my evaluation code :(

    opened by xxkkrr 10
  • KeyError: 'Traceback

    KeyError: 'Traceback

    [email protected]:~/ub16_prj/poincare-embeddings$ NTHREADS=2 ./train-nouns.sh Using 2 threads slurp: objects=82115, edges=743086 Indexing data json_conf: {"distfn": "poincare", "dim": 10, "lr": 1, "batchsize": 50, "negs": 50} Burnin: lr=0.01 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' Traceback (most recent call last): File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 19, in train_mp train(model, data, optimizer, opt, log, rank, queue) File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 46, in train for inputs, targets in loader: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) KeyError: 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' Traceback (most recent call last): File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 19, in train_mp train(model, data, optimizer, opt, log, rank, queue) File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 46, in train for inputs, targets in loader: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) KeyError: 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n'

    opened by loveJasmine 8
  • “train-nouns.py” stops

    “train-nouns.py” stops

    Hi, I am wondering is it normal for “train-nouns.py” to stop for a relatively long time after many epochs. I have run it for nearly 1 day and it does not have any outputs now.

    image

    opened by ShaoTengLiu 6
  • MAP around 0.4 after 300 epochs, far less than results in your paper

    MAP around 0.4 after 300 epochs, far less than results in your paper

    Hi, I just train your code on pytorch 0.4.0, on the mammals dataset, with your default hyperparameters, I can just get an embedding of MAP around 0.4 after 300 epochs, which is far more less than your reported results 0.927 in your paper. BTW, I just add 6 lines of code in the form of t= int(t) because the version of pytorch, other codes remain the same. I am wondering how could this happen?

    opened by ydtydr 6
  • Question on using these embeddings in practice

    Question on using these embeddings in practice

    I wrote this question on stack overflow here... this is not so much an issue with the repo, as it is a question of how to implement these embeddings in practice.

    My assumption for how to implement these is to take a sentence, like """The US and UK could agree a “phenomenal' trade deal after Britain leaves the EU.""", tokenize into a list of synsets [[], Synset('united_states.n.01'), [], Synset('united_kingdom.n.01'), ... ]... but in order to do that, one needs each unique representative synset node of the wordnet, based on the context in which that word lives.

    This seems like a pretty difficult aspect of using these embeddings, and I'm wondering what strategies there are to solve this, what literature is available, whether this is totally not how their implemented in practice, or if there are open source projects which can take both the word and the sentence context into account to map a sentence to a list of "optimal" synsets. Seems like this is a critical aspect of using these (and wordnets in general), but not frequently discussed.

    BTW many thanks for open-sourcing this, I find this technology really fascinating.

    opened by erjenkins29 5
  • ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'

    ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'

    Thank you for sharing this great code. However, I encountered one valueError when I am trying to reproduce the train-mammals.sh. The error information can be as follows:

    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>sh train-mammals.sh
    Specified hogwild training with GPU, defaulting to CPU...
    Using edge list dataloader
    Traceback (most recent call last):
      File "embed.py", line 246, in <module>
        main()
      File "embed.py", line 147, in main
        manifold, opt, idx, objects, weights, sparse=opt.sparse
      File "C:\Users\DELL\Github\poincare-embeddings-master\hype\sn.py", line 64, in initialize
        opt.ndproc, opt.burnin > 0, opt.dampening)
      File "hype\graph_dataset.pyx", line 75, in hype.graph_dataset.BatchedDataset.__cinit__
        self._mk_weights(idx, weights)
      File "hype\graph_dataset.pyx", line 81, in hype.graph_dataset.BatchedDataset._mk_weights
        def _mk_weights(self, npc.ndarray[npc.long_t, ndim=2] idx, npc.ndarray[npc.double_t, ndim=1] weights):
    ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'
    

    I am using pytorch 1.0 with cuda 10.0 on windows:

    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>python -c "import torch; print(torch.version.cuda)"
    10.0
    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
    Cuda compilation tools, release 10.0, V10.0.130
    

    I found that the problem comes from earlier steps python setup.py build_ext --inplace. When I run python setup.py build_ext --inplace, it shows the following information:

    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>python setup.py build_ext --inplace
    Compiling hype/graph_dataset.pyx because it depends on C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Includes\numpy\__init__.pxd.
    Compiling hype/adjacency_matrix_dataset.pyx because it depends on C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Includes\numpy\__init__.pxd.
    [1/2] Cythonizing hype/adjacency_matrix_dataset.pyx
    C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Compiler\Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\DELL\Github\poincare-embeddings-master\hype\adjacency_matrix_dataset.pyx
      tree = Parsing.p_module(s, pxd, full_module_name)
    [2/2] Cythonizing hype/graph_dataset.pyx
    C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Compiler\Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\DELL\Github\poincare-embeddings-master\hype\graph_dataset.pyx
      tree = Parsing.p_module(s, pxd, full_module_name)
    running build_ext
    building 'hype.graph_dataset' extension
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\envs\poincare\lib\site-packages\numpy\core\include -IC:\Anaconda3\envs\poincare\include -IC:\Anaconda3\envs\poincare\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tphype/graph_dataset.cpp /Fobuild\temp.win-amd64-3.6\Release\hype/graph_dataset.obj -std=c++11
    cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
    graph_dataset.cpp
    c:\anaconda3\envs\poincare\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
    hype/graph_dataset.cpp(3033): warning C4244: “=”: 从“Py_ssize_t”转换到“int”,可能丢失数据
    hype/graph_dataset.cpp(3418): warning C4244: “=”: 从“__pyx_t_5numpy_long_t”转换到“long”,可能丢失数据
    hype/graph_dataset.cpp(3429): warning C4244: “=”: 从“__pyx_t_5numpy_long_t”转换到“long”,可能丢失数据
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Anaconda3\envs\poincare\libs /LIBPATH:C:\Anaconda3\envs\poincare\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" /EXPORT:PyInit_graph_dataset build\temp.win-amd64-3.6\Release\hype/graph_dataset.obj /OUT:C:\Users\DELL\Github\poincare-embeddings-master\hype\graph_dataset.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.lib
      正在创建库 build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.lib 和对象 build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.exp
    正在生成代码
    已完成代码的生成
    building 'hype.adjacency_matrix_dataset' extension
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\envs\poincare\lib\site-packages\numpy\core\include -IC:\Anaconda3\envs\poincare\include -IC:\Anaconda3\envs\poincare\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tphype/adjacency_matrix_dataset.cpp /Fobuild\temp.win-amd64-3.6\Release\hype/adjacency_matrix_dataset.obj -std=c++11
    cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
    adjacency_matrix_dataset.cpp
    c:\anaconda3\envs\poincare\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
    hype/adjacency_matrix_dataset.cpp(3144): warning C4244: “=”: 从“Py_ssize_t”转换到“int”,可能丢失数据
    hype/adjacency_matrix_dataset.cpp(5484): warning C4244: “=”: 从“Py_ssize_t”转换到“long”,可能丢失数据
    hype/adjacency_matrix_dataset.cpp(5595): warning C4244: “=”: 从“Py_ssize_t”转换到“long”,可能丢失数据
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Anaconda3\envs\poincare\libs /LIBPATH:C:\Anaconda3\envs\poincare\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" /EXPORT:PyInit_adjacency_matrix_dataset build\temp.win-amd64-3.6\Release\hype/adjacency_matrix_dataset.obj /OUT:C:\Users\DELL\Github\poincare-embeddings-master\hype\adjacency_matrix_dataset.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.lib
      正在创建库 build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.lib 和对象 build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.exp
    正在生成代码
    已完成代码的生成
    

    Your suggestion are welcome. Thanks.

    opened by chengjun 5
  • [Question] Predicting the parent of an unseen word in an existing hierarchy

    [Question] Predicting the parent of an unseen word in an existing hierarchy

    Hello there,

    I saw this question and I am looking for something similar.

    But, in my case, let's say I have a list like:

    parent | child -- | -- animal | carnivore carnivore | dog dog | rottweiler dog | huskey dog | pug

    From this list I can train a Poincare model and get poincare embeddings for every word/node in this hierarchy. Then, let's say I receive a new word: "dogo argentino". What I want at this point is a way to infer automatically that "dogo argentino" is the child of "dog". Can poincare embeddings be of help here? If ignoring poincare embeddings at all, a thought would be to get fastText embeddings for all the words, and find the word that is the closest (maybe in terms of cosine similarity) to the fastText embedding for "dogo argentino". Then assign the parent of that closest word, to the "dogo argentino" word. Any thoughts would be greatly appreciated. Thank you!

    opened by stelmath 4
  • modifies shell script to use correct python version

    modifies shell script to use correct python version

    Going through the code instructions on Mac OSX gave the following error:

    ModuleNotFoundError: No module named 'torch'

    Which was resolved by modifying python3 to python.

    CLA Signed 
    opened by denmonz 4
  • Embedding of Tree Structures: root no in the center of the disk

    Embedding of Tree Structures: root no in the center of the disk

    Hi, I'm dealing with embedding of hierarchical structures (a tree) and I used your algorithm. It works, but when I visualise the 2D or 3D embedding the root of the tree is not in the center of the disk. Is there some command or flag to add to the code? Or is the root supposed to be in the center on his own?

    opened by federicodassereto 4
  • Lack of hypernymysuite

    Lack of hypernymysuite

    Hi, I try to use this tool follow the readme file. But the latest version does not include hypernymysuite file. My system return me

    "from hypernymysuite.base import HypernymySuiteModel ModuleNotFoundError: No module named 'hypernymysuite'"

    opened by QingkaiZeng 3
  • Problem of Poincare Distance and Norm

    Problem of Poincare Distance and Norm

    Hi, I trained a 100 dimension model with train-nouns.sh in poincare manifold. I tested the model by calculating the tensor.norm of some nodes and poincare.distance between some nodes.

    However, for Gauss, Brain, Blue and many other nodes near to the edge, the norm of them become 1.0000, which means infinity in poincare ball. In addition, if I calculate the poincare distance between these nodes near to the edge and other nodes, the distance will be always 12.2061, which is a little confusing.

    Thanks for your attention!

    opened by ShaoTengLiu 3
  • Large dataset that needs continuous training

    Large dataset that needs continuous training

    Hi there, I have a quite large dataset and GPU usage that does not exceed 24 hours. In this case, I need to restore from checkpoint and continue to train the old models. The current situation is that, each time I try to do so, it is shown that the dimension of parameters is not correct--originally I train a 2-dimension embedding but now the dim of parameter is 21. Any thoughts on this? Any advice?

    opened by lkcao 0
  • how was mammal_closure.csv created

    how was mammal_closure.csv created

    Hello, apologies in advance if this is a silly question. I was just looking at mammal_closure.csv because i want to do something similar with my own data and run like train-mammals.sh. I was wondering, what do the numbers in the file indicate? Such as vixen.n.02, mastiff.n.01 - what are the n.01 and n.02? Thanks!

    opened by jayachaturvedi 1
  • What should we do after training?

    What should we do after training?

    Hello, I am Tianyu. I just tried to use my data to train a new model using embed.py. However, after I get the dl model, what is the next step for me to get the embeddings? I don't find clear steps for future work. Thanks a lot.

    opened by HelloWorldLTY 4
  • In the Euclidian space, the distance is incorrectly squared

    In the Euclidian space, the distance is incorrectly squared

    In the formula angle_at_u(...), dist is used squared in the numerator, and unsquared in the denominator. Thus, this does not expect distance(...) to return a squared norm but a true norm.

    image image

    You can notice the inconsistency if you put norm(...) and distance(...) next to each other:

    image

    CLA Signed 
    opened by FremyCompany 4
  • Entailment cones compute the wrong angle?

    Entailment cones compute the wrong angle?

    Hi,

    I could be wrong, but I have the impression the entailment cones energy function is wrong, and optimizes the inverted order relationship between concepts.

    If I understood correctly...:

    • In our data files, id2 should be the generic concept, and id1 the more specific one.
    • After training norm(embedding(id2)) should usually be lower than norm(embedding(id1)) as cones have more space on the edge, so generic concepts should be close to the origin and specific concepts should be far from it.

    However, after training, I observe the reverse to be true:

    • Generic concepts are as far as possible from the center.
    • In their cones, one can find even-more-generic concepts, and no more specific concepts.

    I'm either doing something very wrong, or have incorrect assumptions, or the energy function is reversed compared to what it should be according to the entailment cones article, no?

    opened by FremyCompany 1
Releases(1.0)
Owner
Facebook Research
Facebook Research
Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

2 Jul 05, 2022
Easy to start. Use deep nerual network to predict the sentiment of movie review.

Easy to start. Use deep nerual network to predict the sentiment of movie review. Various methods, word2vec, tf-idf and df to generate text vectors. Various models including lstm and cov1d. Achieve f1

1 Nov 19, 2021
Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Fast (GAN Based Neural) Vocoder Chinese README Todo Submit demo Support NHV Discription Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe include N

Zhengxi Liu (刘正曦) 134 Dec 16, 2022
texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Jörg Thalheim 70 Dec 26, 2022
A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

wav2vec-toolkit A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models This repository accompanies the

Anton Lozhkov 29 Oct 23, 2022
State of the art faster Natural Language Processing in Tensorflow 2.0 .

tf-transformers: faster and easier state-of-the-art NLP in TensorFlow 2.0 ****************************************************************************

74 Dec 05, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
Search for documents in a domain through Google. The objective is to extract metadata

MetaFinder - Metadata search through Google _____ __ ___________ .__ .___ / \

Josué Encinar 85 Dec 16, 2022
TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-popu

TextFlint 587 Dec 20, 2022
Turn clang-tidy warnings and fixes to comments in your pull request

clang-tidy pull request comments A GitHub Action to post clang-tidy warnings and suggestions as review comments on your pull request. What platisd/cla

Dimitris Platis 30 Dec 13, 2022
All the code I wrote for Overwatch-related projects that I still own the rights to.

overwatch_shit.zip This is (eventually) going to contain all the software I wrote during my five-year imprisonment stay playing Overwatch. I'll be add

zkxjzmswkwl 2 Dec 31, 2021
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

10 Jul 01, 2022
Baseline code for Korean open domain question answering(ODQA)

Open-Domain Question Answering(ODQA)는 다양한 주제에 대한 문서 집합으로부터 자연어 질의에 대한 답변을 찾아오는 task입니다. 이때 사용자 질의에 답변하기 위해 주어지는 지문이 따로 존재하지 않습니다. 따라서 사전에 구축되어있는 Knowl

VUMBLEB 69 Nov 04, 2022
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

Sinkhorn Transformer This is a reproduction of the work outlined in Sparse Sinkhorn Attention, with additional enhancements. It includes a parameteriz

Phil Wang 217 Nov 25, 2022
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform This repo try to implement iSTFTNet : Fast

Rishikesh (ऋषिकेश) 126 Jan 02, 2023
Blackstone is a spaCy model and library for processing long-form, unstructured legal text

Blackstone Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project f

ICLR&D 579 Jan 08, 2023
A benchmark for evaluation and comparison of various NLP tasks in Persian language.

Persian NLP Benchmark The repository aims to track existing natural language processing models and evaluate their performance on well-known datasets.

Mofid AI 68 Dec 19, 2022
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t

22 Dec 14, 2022
To classify the News into Real/Fake using Features from the Text Content of the article

Hoax-Detector Authenticity of news has now become a major problem. The Idea is to classify the News into Real/Fake using Features from the Text Conten

Aravindhan 1 Feb 09, 2022
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Introduction Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduc

GUOKUN LAI 197 Dec 11, 2022