Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Overview

Pyserini

Generic badge Maven Central PyPI PyPI Download Stats LICENSE

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations. Retrieval using sparse representations is provided via integration with our group's Anserini IR toolkit, which is built on Lucene. Retrieval using dense representations is provided via integration with Facebook's Faiss library.

Pyserini is primarily designed to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections

With Pyserini, it's easy to reproduce runs on a number of standard IR test collections! A low-effort way to try things out is to look at our online notebooks, which will allow you to get started with just a few clicks.

Package Installation

Install via PyPI (requires Python 3.6+):

pip install pyserini

Sparse retrieval depends on Anserini, which is itself built on Lucene, and thus Java 11.

Dense retrieval depends on neural networks and requires a more complex set of dependencies. A pip installation will automatically pull in the 🤗 Transformers library to satisfy the package requirements. Pyserini also depends on PyTorch and Faiss, but since these packages may require platform-specific custom configuration, they are not explicitly listed in the package requirements. We leave the installation of these packages to you.

The software ecosystem is rapidly evolving and a potential source of frustration is incompatibility among different versions of underlying dependencies. We provide additional detailed installation instructions here.

Development Installation

If you're planning on just using Pyserini, then the pip instructions above are fine. However, if you're planning on contributing to the codebase or want to work with the latest not-yet-released features, you'll need a development installation. For this, clone our repo with the --recurse-submodules option to make sure the tools/ submodule also gets cloned.

The tools/ directory, which contains evaluation tools and scripts, is actually this repo, integrated as a Git submodule (so that it can be shared across related projects). Build as follows (you might get warnings, but okay to ignore):

cd tools/eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make && cd ../../..
cd tools/eval/ndeval && make && cd ../../..

Next, you'll need to clone and build Anserini. It makes sense to put both pyserini/ and anserini/ in a common folder. After you've successfully built Anserini, copy the fatjar, which will be target/anserini-X.Y.Z-SNAPSHOT-fatjar.jar into pyserini/resources/jars/. As with the pip installation, a potential source of frustration is incompatibility among different versions of underlying dependencies. For these and other issues, we provide additional detailed installation instructions here.

You can confirm everything is working by running the unit tests:

python -m unittest

Assuming all tests pass, you should be ready to go!

Quick Links

How do I search?

Pyserini supports sparse retrieval (e.g., BM25 ranking using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well hybrid retrieval that integrates both approaches.

Sparse Retrieval

The SimpleSearcher class provides the entry point for sparse retrieval using bag-of-words representations. Anserini supports a number of pre-built indexes for common collections that it'll automatically download for you and store in ~/.cache/pyserini/indexes/. Here's how to use a pre-built index for the MS MARCO passage ranking task and issue a query interactively:

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
hits = searcher.search('what is a lobster roll?')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

The results should be as follows:

 1 7157707 11.00830
 2 6034357 10.94310
 3 5837606 10.81740
 4 7157715 10.59820
 5 6034350 10.48360
 6 2900045 10.31190
 7 7157713 10.12300
 8 1584344 10.05290
 9 533614  9.96350
10 6234461 9.92200

To further examine the results:

# Grab the raw text:
hits[0].raw

# Grab the raw Lucene Document:
hits[0].lucene_document

Pre-built indexes are hosted on University of Waterloo servers. The following method will list available pre-built indexes:

SimpleSearcher.list_prebuilt_indexes()

A description of what's available can be found here. Alternatively, see this answer for how to download an index manually.

Dense Retrieval

The SimpleDenseSearcher class provides the entry point for dense retrieval, and its usage is quite similar to SimpleSearcher. The only additional thing we need to specify for dense retrieval is the query encoder.

from pyserini.dsearch import SimpleDenseSearcher, TctColBertQueryEncoder

encoder = TctColBertQueryEncoder('castorini/tct_colbert-msmarco')
searcher = SimpleDenseSearcher.from_prebuilt_index(
    'msmarco-passage-tct_colbert-hnsw',
    encoder
)
hits = searcher.search('what is a lobster roll')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

If you encounter an error (on macOS), you'll need the following:

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

The results should be as follows:

 1 7157710 70.53742
 2 7157715 70.50040
 3 7157707 70.13804
 4 6034350 69.93666
 5 6321969 69.62683
 6 4112862 69.34587
 7 5515474 69.21354
 8 7157708 69.08416
 9 6321974 69.06841
10 2920399 69.01737

Hybrid Sparse-Dense Retrieval

The HybridSearcher class provides the entry point to perform hybrid sparse-dense retrieval:

from pyserini.search import SimpleSearcher
from pyserini.dsearch import SimpleDenseSearcher, TctColBertQueryEncoder
from pyserini.hsearch import HybridSearcher

ssearcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
encoder = TctColBertQueryEncoder('castorini/tct_colbert-msmarco')
dsearcher = SimpleDenseSearcher.from_prebuilt_index(
    'msmarco-passage-tct_colbert-hnsw',
    encoder
)
hsearcher = HybridSearcher(dsearcher, ssearcher)
hits = hsearcher.search('what is a lobster roll')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

The results should be as follows:

 1 7157715 71.56022
 2 7157710 71.52962
 3 7157707 71.23887
 4 6034350 70.98502
 5 6321969 70.61903
 6 4112862 70.33807
 7 5515474 70.20574
 8 6034357 70.11168
 9 5837606 70.09911
10 7157708 70.07636

In general, hybrid retrieval will be more effective than dense retrieval, which will be more effective than sparse retrieval.

How do I fetch a document?

Another commonly used feature in Pyserini is to fetch a document (i.e., its text) given its docid. This is easy to do:

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
doc = searcher.doc('7157715')

From doc, you can access its contents as well as its raw representation. The contents hold the representation of what's actually indexed; the raw representation is usually the original "raw document". A simple example can illustrate this distinction: for an article from CORD-19, raw holds the complete JSON of the article, which obviously includes the article contents, but has metadata and other information as well. The contents contain extracts from the article that's actually indexed (for example, the title and abstract). In most cases, contents can be deterministically reconstructed from raw. When building the index, we specify flags to store contents and/or raw; it is rarely the case that we store both, since that would be a waste of space. In the case of the pre-built msmacro-passage index, we only store raw. Thus:

# Document contents: what's actually indexed.
# Note, this is not stored in the pre-built msmacro-passage index.
doc.contents()
                                                                                                   
# Raw document
doc.raw()

As you'd expected, doc.id() returns the docid, which is 7157715 in this case. Finally, doc.lucene_document() returns the underlying Lucene Document (i.e., a Java object). With that, you get direct access to the complete Lucene API for manipulating documents.

Since each text in the MS MARCO passage corpus is a JSON object, we can read the document into Python and manipulate:

import json
json_doc = json.loads(doc.raw())

json_doc['contents']
# 'contents' of the document:
# A Lobster Roll is a bread roll filled with bite-sized chunks of lobster meat...

Every document has a docid, of type string, assigned by the collection it is part of. In addition, Lucene assigns each document a unique internal id (confusingly, Lucene also calls this the docid), which is an integer numbered sequentially starting from zero to one less than the number of documents in the index. This can be a source of confusion but the meaning is usually clear from context. Where there may be ambiguity, we refer to the external collection docid and Lucene's internal docid to be explicit. Programmatically, the two are distinguished by type: the first is a string and the second is an integer.

As an important side note, Lucene's internal docids are not stable across different index instances. That is, in two different index instances of the same collection, Lucene is likely to have assigned different internal docids for the same document. This is because the internal docids are assigned based on document ingestion order; this will vary due to thread interleaving during indexing (which is usually performed on multiple threads).

The doc method in searcher takes either a string (interpreted as an external collection docid) or an integer (interpreted as Lucene's internal docid) and returns the corresponding document. Thus, a simple way to iterate through all documents in the collection (and for example, print out its external collection docid) is as follows:

for i in range(searcher.num_docs):
    print(searcher.doc(i).docid())

How do I index and search my own documents?

To build sparse (i.e., Lucene inverted indexes) on your own document collections, following the instructions below. To build dense indexes (e.g., the output of transformer encoders) on your own document collections, see instructions here. The following covers English documents; if you want to index and search multilingual documents, check out this answer.

Pyserini (via Anserini) provides ingestors for document collections in many different formats. The simplest, however, is the following JSON format:

{
  "id": "doc1",
  "contents": "this is the contents."
}

A document is simply comprised of two fields, a docid and contents. Pyserini accepts collections comprised of these documents organized in three different ways:

  • Folder with each JSON in its own file, like this.
  • Folder with files, each of which contains an array of JSON documents, like this.
  • Folder with files, each of which contains a JSON on an individual line, like this (often called JSONL format).

So, the quickest way to get started is to write a script that converts your documents into the above format. Then, you can invoke the indexer (here, we're indexing JSONL, but any of the other formats work as well):

python -m pyserini.index -collection JsonCollection \
                         -generator DefaultLuceneDocumentGenerator \
                         -threads 1 \
                         -input integrations/resources/sample_collection_jsonl \
                         -index indexes/sample_collection_jsonl \
                         -storePositions -storeDocvectors -storeRaw

Three options control the type of index that is built:

  • -storePositions: builds a standard positional index
  • -storeDocvectors: stores doc vectors (required for relevance feedback)
  • -storeRaw: stores raw documents

If you don't specify any of the three options above, Pyserini builds an index that only stores term frequencies. This is sufficient for simple "bag of words" querying (and yields the smallest index size).

Once indexing is done, you can use SimpleSearcher to search the index:

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher('indexes/sample_collection_jsonl')
hits = searcher.search('document')

for i in range(len(hits)):
    print(f'{i+1:2} {hits[i].docid:4} {hits[i].score:.5f}')

You should get something like the following:

 1 doc2 0.25620
 2 doc3 0.23140

If you want to perform a batch retrieval run (e.g., directly from the command line), organize all your queries in a tsv file, like here. The format is simple: the first field is a query id, and the second field is the query itself. Note that the file extension must end in .tsv so that Pyserini knows what format the queries are in.

Then, you can run:

$ python -m pyserini.search --topics integrations/resources/sample_queries.tsv \
                            --index indexes/sample_collection_jsonl \
                            --output run.sample.txt \
                            --bm25

$ cat run.sample.txt 
1 Q0 doc2 1 0.256200 Anserini
1 Q0 doc3 2 0.231400 Anserini
2 Q0 doc1 1 0.534600 Anserini
3 Q0 doc1 1 0.256200 Anserini
3 Q0 doc2 2 0.256199 Anserini
4 Q0 doc3 1 0.483000 Anserini

Note that output run file is in standard TREC format.

You can also add extra fields in your documents when needed, e.g. text features. For example, the SpaCy Named Entity Recognition (NER) result of contents could be stored as an additional field NER.

{
  "id": "doc1",
  "contents": "The Manhattan Project and its atomic bomb helped bring an end to World War II. Its legacy of peaceful uses of atomic energy continues to have an impact on history and science.",
  "NER": {
            "ORG": ["The Manhattan Project"],
            "MONEY": ["World War II"]
         }
}

Reproduction Guides

With Pyserini, it's easy to reproduce runs on a number of standard IR test collections!

Sparse Retrieval

Dense Retrieval

Baselines

Pyserini provides baselines for a number of datasets.

Additional Documentation

Known Issues

Anserini is designed to work with JDK 11. There was a JRE path change above JDK 9 that breaks pyjnius 1.2.0, as documented in this issue, also reported in Anserini here and here. This issue was fixed with pyjnius 1.2.1 (released December 2019). The previous error was documented in this notebook and this notebook documents the fix.

Release History

With v0.11.0.0 and before, Pyserini versions adopted the convention of X.Y.Z.W, where X.Y.Z tracks the version of Anserini, and W is used to distinguish different releases on the Python end. Starting with Anserini v0.12.0, Anserini and Pyserini versions have become decoupled.

Comments
  • Dense search replication, starting from hgf model

    Dense search replication, starting from hgf model

    Here's I think our end target: start with hgf model from model hub - assume that's fix.

    1. Be able to encode corpus and queries - scripts for doing so should be in https://github.com/castorini/pyserini/tree/master/scripts
    2. Scripts for building hnsw index, also in scripts/
    3. (1) and (2) are what we store as "pre-built".

    This will allow replication and bring every part of the pipeline in sync - other than training the encoder model.

    @MXueguang @justram @jacklin64 thoughts?

    opened by lintool 18
  • Multiple language support?

    Multiple language support?

    Hi,

    Does pyserini currently support languages other than language? Specifically, I am asking about using features such as creating an index by python -m pyserini.index -collection JsonCollection -generator DefaultLuceneDocumentGenerator ... and using searcher.search. If yes, how do I integrate it in python script?

    Thank you!

    opened by velocityCavalry 16
  • SimpleSearcher.search memory leak

    SimpleSearcher.search memory leak

    When calling search method of SimpleSearcher I noticed RAM usage increase with every new iteration. Could you tell me please how to decrease memory leak?

    opened by dmitrijeuseew 16
  • Fold qrels into pyserini directly

    Fold qrels into pyserini directly

    Follow up to #310 - there, we folded the eval scripts directly into pyserini. Now let's do the same with the qrels.

    In actuality, the qrels are already in the anserini jar, since this entire directory is included in the fatjar: https://github.com/castorini/anserini/tree/master/src/main/resources/topics-and-qrels

    Trick is how to get the qrels out...

    This is, in fact, how we can access the topics in anserini: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/topicreader/Topics.java#L22 https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/topicreader/TopicReader.java#L143

    And pyserini just wraps the Java methods above.


    With that background, I propose to apply the same treatment to qrels.

    1. Add a method in Anserini (on the Java end) to read qrels from resources/topics-and-qrels/ into a String. We can use the same "ids" as the topics. Build around here: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/util/Qrels.java
    2. On the Python end, we call the Java method, which reads the qrels as a string. Then we write back the string into ~/.cache/pyserini.
    3. Our eval scripts can then reference ~/.cache/pyserini.

    And at the end of the day, we'll be able to do this directly:

    $ python -m pyserini.search --topics robust04 --index robust04 --output run.robust04.txt --bm25
    $ python -m pyserini.eval.trec_eval --qrels robust04 -m map -m P.30 run.robust04.txt
    

    (With no need to download any intermediate data... everything is self contained!)

    @MXueguang thoughts? Do you like it? Any better way?

    opened by lintool 16
  • Add automate downloading of indexes

    Add automate downloading of indexes

    Currently, this change supports 'ms-marco-passage', 'ms-marco-doc' and 'TREC Disks 4 & 5'.

    • If the index exists, skip the download and use the index under '(pyserini)/indexes'.
    • If not, download the index to cache(~/.cache/pyserini/indexes) and extract the index to (pyserini)/indexes. Finally, delete the gz file in cache. Should we keep the gz file in cache?
    opened by qguo96 16
  • Resolve tiny differences between Anserini and Pyserini on MS MARCO: query iteration order

    Resolve tiny differences between Anserini and Pyserini on MS MARCO: query iteration order

    If we look at the Python replications: https://github.com/castorini/pyserini/blob/master/docs/pypi-replication.md Compared against Anserini replications: e.g., https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc-leaderboard.md

    We'll note tiny differences - e.g., for MS MARCO doc, baselines - pyserini:

    #####################
    MRR @100: 0.2770296928568709
    QueriesRanked: 5193
    #####################
    

    Compared to anserini:

    #####################
    MRR @100: 0.2770296928568702
    QueriesRanked: 5193
    #####################
    

    Previously, we tracked it down issue #257

    I'd like to fix it so get identical results moving forward - my proposed fix is a bit janky, but it'll work: Let's just store, in Python code, an array of integers corresponding to ids of the queries in the original queries file. When we're iterating over the dataset in pyserini.search, we just follow the order of the integers.

    Slightly better, we introduce a new query iterator abstraction and hide this implementation detail in there. So the query iterator would take in the current dictionary, and an optional array holding the iteration order.

    Thoughts @MXueguang? I was thinking you could work on this?

    opened by lintool 15
  • DPR replication docs

    DPR replication docs

    Hi @MXueguang - when everything is implemented DPR should probably get it's own separate replication page, like for MS MARCO: https://github.com/castorini/pyserini/blob/master/docs/experiments-msmarco-passage.md

    Containing both spare, hybrid, and dense retrieval.

    Then we can add a replication log also - starting point for people interested in working more on it.

    opened by lintool 14
  • Incorrect encoding on Windows

    Incorrect encoding on Windows

    When using pyserini under Windows, it seems that the encoding of strings is breaking when passed to the JNI via the pyjnius package.

    It happens when a string is encoded as UTF-8 like this JString(my_str.encode('utf-8')) (e.g., https://github.com/castorini/pyserini/blob/master/pyserini/search/_searcher.py#L114). It only occurs under Windows as it must collide with the default Windows encoding CP-1252.

    I discussed this issue with the maintainers of pyjnius and it seems that to make it work independently from the platform, the .encode('utf-8') could simply be dropped.

    Was there a reason why this manual encoding was used in pyserini?

    I created a branch with the changes, I could do a PR if you wish.

    opened by stekiri 13
  • Dense retrieval draft

    Dense retrieval draft

    An example of usage, since dense index doesn't contains raw data, I loaded the corpus separately.

    import numpy as np
    from pyserini.search import SimpleDenseSearcher
    
    searcher = SimpleDenseSearcher.from_prebuilt_index('msmarco_passage_0', 'collection.tsv')
    
    query_emb = np.random.random(768).astype('float32')
    result = searcher.search(query_emb)
    
    result[0].raw
    >> 'Lander, WY Sales Tax Rate. The current total local sales tax rate in Lander, WY is 5.000%. The December 2015 total local sales tax rate was also 5.000%. Lander, WY is in Fremont County. Lander is in the following zip codes: 82520.'
    
    result[0].docid
    >> '350921'
    
    result[0].score
    >> 0.42547345
    
    searcher.doc('123')
    >> Document(docid='123', raw='With a number of condo developments springing up in the city, it can be difficult to narrow down your choices for the perfect Montreal condo for sale. Our skilled agents organize your steps towards meeting your goals with our condo projects located in popular and trendy neighbourhoods.')
    
    opened by MXueguang 13
  • IndexOutOfBoundsException calling get_term_counts

    IndexOutOfBoundsException calling get_term_counts

    This is code to print the top tf.idf-weighted terms from documents in a run:

    reader = IndexReader.from_prebuilt_index('robust04')
    for topic, docs in run.items():
        print('---', topic)
        for doc in docs:
            print('---', doc)
            vec = reader.get_document_vector(doc)
            weighted = []
            for term, tf in vec.items():
                print('---', term, tf)
                df, cf = reader.get_term_counts(term)
                tfidf = tf / df
                heapq.heappush(weighted, (tfidf, term))
            for weight, term in heapq.nlargest(10, weighted):
                print(topic, doc, term, weight)
    

    The run I am iterating is a BM25 retrieval run on robust04 from Pyserini. On topic 301, document FBIS4-40260, term 'it' (tf=2), I get the following error:

    Traceback (most recent call last):
      File "/Users/soboroff/pyserini-fire/./top-terms.py", line 33, in <module>
        df, cf = reader.get_term_counts(term)
      File "/Users/soboroff/pyserini-fire/venv/lib/python3.10/site-packages/pyserini/index/_base.py", line 259, in get_term_counts
        term_map = self.object.getTermCountsWithAnalyzer(self.reader, JString(term.encode('utf-8')), analyzer)
      File "jnius/jnius_export_class.pxi", line 884, in jnius.JavaMethod.__call__
      File "jnius/jnius_export_class.pxi", line 1056, in jnius.JavaMethod.call_staticmethod
      File "jnius/jnius_utils.pxi", line 91, in jnius.check_exception
    jnius.JavaException: JVM exception occurred: Index 0 out of bounds for length 0 java.lang.IndexOutOfBoundsException
    
    opened by isoboroff 12
  • Unable to do Dense search against own index

    Unable to do Dense search against own index

    My environment:

    • OS - Ubuntu 18.04
    • Java 11.0.11
    • Python 3.8.8
    • Python Package versions:
      • torch 1.8.1
      • faiss-cpu 1.7.0
      • pyserini 0.12.0

    Problem 1

    I followed instructions to create my own minimal index and was able to run the Sparse Retrieval example successfully. However, when I tried to run the Dense retrieval example using the TctColBertQueryEncoder, I encountered the following issues that seem to be caused by me having a newer version of the transformers library, where the requires_faiss and requires_pytorch methods have been replaced with a more general requires_backends method in transformers.file_utils. The following files were affected.

    pyserini/dsearch/_dsearcher.py
    pyserini/dsearch/_model.py
    

    Problem 2

    Replacing them in place in the Pyserini code in my site-packages allowed me to move forward, but now I get the error message:

    RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char*) at /__w/faiss-wheels/faiss-wheels/faiss/faiss/impl/io.cpp:81: Error: 'f' failed: could not open /path/to/lucene_index/index for reading: No such file or directory
    

    The /path/to/lucene_index above is a folder where my lucene index was built using pyserini.index. I am guessing that an additional ANN index might be required to be built from the data to allow Dense searching to happen? I looked in the help for pyserini.index but there did not seem to be anything that indicated creation of ANN index.

    I can live with the first problem (since I have a local solution) but obviously some fix to that would be nice. For the second problem, some documentation or help with building a local index for dense searching will be very much appreciated.

    Thanks!

    opened by sujitpal 12
  • Broken links in prebuilt READMEs

    Broken links in prebuilt READMEs

    From here: https://github.com/castorini/pyserini/blob/master/docs/prebuilt-indexes.md

    Link to robust04 README is broken. Might want to go through and make sure they all work...

    opened by lintool 0
  • Fill in missing conditions in MS MARCO V1 repro maxtrix

    Fill in missing conditions in MS MARCO V1 repro maxtrix

    Here: https://castorini.github.io/pyserini/2cr/msmarco-v1-passage.html

    Screen Shot 2022-12-18 at 10 35 34 AM

    We're missing a bunch of conditions that we should add.

    @MXueguang this is probably pretty easy to do right?

    opened by lintool 0
  • Refactor Dependencies

    Refactor Dependencies

    Initial PR Based on https://github.com/castorini/pyserini/issues/1375

    Modularize imports so that LuceneSearcher does not rely on Faiss, torch, and transformers

    opened by ToluClassics 1
  • Importing LuceneSearcher relies on FAISS and Torch

    Importing LuceneSearcher relies on FAISS and Torch

    Currently, importing LuceneSearcher fails if faiss and torch aren't installed. (They aren't installed by design because they're platform-specific, see: https://github.com/castorini/pyserini#installation)

    This is likely caused by the imports in the following init file: https://github.com/castorini/pyserini/blob/master/pyserini/search/init.py#L23-L26

    A fix would need to modularize those imports.

    If no one gets to it before me, I will attempt to send a PR to fix this.

    opened by cakiki 1
Releases(pyserini-0.19.2)
Owner
Castorini
Deep learning for natural language processing and information retrieval at the University of Waterloo
Castorini
DC540 hacking challenge 0x00005a.

dc540-0x00005a DC540 hacking challenge 0x00005a. PROMOTIONAL VIDEO - WATCH NOW HERE ON YOUTUBE CRITICAL PART 5A VIDEO - WATCH NOW HERE ON YOUTUBE Prio

Kevin Thomas 3 May 09, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

32 Jun 14, 2022
Wileless-PDGNet Implementation

Wileless-PDGNet Implementation This repo is related to the following paper: Boning Li, Ananthram Swami, and Santiago Segarra, "Power allocation for wi

6 Oct 04, 2022
Towards Understanding Quality Challenges of the Federated Learning: A First Look from the Lens of Robustness

FL Analysis This repository contains the code and results for the paper "Towards Understanding Quality Challenges of the Federated Learning: A First L

3 Oct 17, 2022
DataCLUE: 国内首个以数据为中心的AI测评(含模型分析报告)

DataCLUE: A Benchmark Suite for Data-centric NLP You can get the english version of README. 以数据为中心的AI测评(DataCLUE) 内容导引 章节 描述 简介 介绍以数据为中心的AI测评(DataCLUE

CLUE benchmark 135 Dec 22, 2022
Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

Point-Based Modeling of Human Clothing Paper | Project page | Video This is an official PyTorch code repository of the paper "Point-Based Modeling of

Visual Understanding Lab @ Samsung AI Center Moscow 64 Nov 22, 2022
PyTea: PyTorch Tensor shape error analyzer

PyTea: PyTorch Tensor Shape Error Analyzer paper project page Requirements node.js = 12.x python = 3.8 z3-solver = 4.8 How to install and use # ins

ROPAS Lab. 240 Jan 02, 2023
The Simplest DCGAN Implementation

DCGAN in TensorLayer This is the TensorLayer implementation of Deep Convolutional Generative Adversarial Networks. Looking for Text to Image Synthesis

TensorLayer Community 310 Dec 13, 2022
This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

ICCV Workshop 2021 VTGAN This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Sharif Amit Kamran 25 Dec 08, 2022
Accelerating BERT Inference for Sequence Labeling via Early-Exit

Sequence-Labeling-Early-Exit Code for ACL 2021 paper: Accelerating BERT Inference for Sequence Labeling via Early-Exit Requirement: Please refer to re

李孝男 23 Oct 14, 2022
[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Attention Helps CNN See Better: Hybrid Image Quality Assessment Network [CVPRW 2022] Code for Hybrid Image Quality Assessment Network [paper] [code] T

IIGROUP 49 Dec 11, 2022
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
The all new way to turn your boring vector meshes into the new fad in town; Voxels!

Voxelator The all new way to turn your boring vector meshes into the new fad in town; Voxels! Notes: I have not tested this on a rotated mesh. With fu

6 Feb 03, 2022
Mmdet benchmark with python

mmdet_benchmark 本项目是为了研究 mmdet 推断性能瓶颈,并且对其进行优化。 配置与环境 机器配置 CPU:Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz GPU:NVIDIA GeForce RTX 3080 10GB 内存:64G 硬盘:1T

杨培文 (Yang Peiwen) 24 May 21, 2022
Implementation of neural class expression synthesizers

NCES Implementation of neural class expression synthesizers (NCES) Installation Clone this repository: https://github.com/ConceptLengthLearner/NCES.gi

NeuralConceptSynthesis 0 Jan 06, 2022
[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

Learning to Compose Visual Relations This is the pytorch codebase for the NeurIPS 2021 Spotlight paper Learning to Compose Visual Relations. Demo Imag

Nan Liu 88 Jan 04, 2023
SIR model parameter estimation using a novel algorithm for differentiated uniformization.

TenSIR Parameter estimation on epidemic data under the SIR model using a novel algorithm for differentiated uniformization of Markov transition rate m

The Spang Lab 4 Nov 30, 2022
The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Joint t-sne This is the implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets. abstract: We present Jo

IDEAS Lab 7 Dec 18, 2022
Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Semantic Grouping Network for Video Captioning Hobin Ryu, Sunghun Kang, Haeyong Kang, and Chang D. Yoo. AAAI 2021. [arxiv] Environment Ubuntu 16.04 CU

Hobin Ryu 43 Nov 25, 2022
Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

🤗 Transformers Wav2Vec2 + PyCTCDecode Introduction This repo shows how 🤗 Transformers can be used in combination with kensho-technologies's PyCTCDec

Patrick von Platen 102 Oct 22, 2022