Automatically create Faiss knn indices with the most optimal similarity search parameters.

Last update: Jan 01, 2023

Related tags

Machine Learning autofaiss

Overview

AutoFaiss

Automatically create Faiss knn indices with the most optimal similarity search parameters.

It selects the best indexing parameters to achieve the highest recalls given memory and query speed constraints.

How to use autofaiss?

To install run pip install autofaiss

It's probably best to create a virtual env:

python -m venv .venv/autofaiss_env
source .venv/autofaiss_env/bin/activate
pip install -U pip
pip install autofaiss

Create embeddings

import os
import numpy as np
embeddings = np.random.rand(1000, 100)
os.mkdir("embeddings")
np.save("embeddings/part1.npy", embeddings)
os.mkdir("my_index_folder")

Generate a Knn index

autofaiss quantize --embeddings_path="embeddings" --output_path="my_index_folder" --metric_type="ip"

Try the index

import faiss
import glob
import numpy as np

my_index = faiss.read_index(glob.glob("my_index_folder/*.index")[0])

query_vector = np.float32(np.random.rand(1, 100))
k = 5
distances, indices = my_index.search(query_vector, k)

print(list(zip(distances[0], indices[0])))

autofaiss quantize

embeddings_path -> path on the hdfs of your embeddings in .parquet format.
output_path -> destination path on the hdfs for the created index. metric_type -> Similarity distance for the queries.

index_key -> (optional) describe the index to build.
index_param -> (optional) describe the hyperparameters of the index.
memory_available -> (optional) describe the amount of memory available on the machine.
use_gpu -> (optional) wether to use GPU or not (not tested).

Install from source

First, create a virtual env and install dependencies:

python -m venv .venv/autofaiss_env
source .venv/autofaiss_env/bin/activate
make install

Comments

replace embedding iterator by embedding reader package

I extracted and improved the embedding iterator into a new package https://github.com/rom1504/embedding-reader

embedding reader is much faster than the previous embedding iterator thanks to reading pieces of files in parallel

it can also be reused for other embedding reading use cases

opened by rom1504 15
Use central logger to enable verbosity changing

I created a central logger object in autofaiss/__init__.py that can be imported and used. By default it is set to print to stdout. All print statements were exchanged to logger.info calls or logger.error were it makes sense. Existing debug logger calls were adapted to use the new logger.

opened by dobraczka 7

module 'faiss' has no attribute 'swigfaiss'

python 3.8.12
autofaiss                 2.13.2                   pypi_0    pypi
faiss-cpu                 1.7.2                    pypi_0    pypi
libfaiss                  1.7.2            h2bc3f7f_0_cpu    pytorch

First of all, thank you for the great project! I get the error: module 'faiss' has no attribute 'swigfaiss' when running the following command:

import autofaiss

autofaiss.build_index(
    "embeddings.npy",
    "autofaiss.index",
    "autofaiss.json",
    metric_type="ip",
    should_be_memory_mappable=True,
    make_direct_map=True)

The error appears when running it for make_direct_map=True.

Tested using conda 4.11.0 or mamba 0.15.3 using pytorch or conda-forge channel.

opened by njanakiev 6

Is my low recall reasonable?

Hi! Thank you for the great library, it helped me a lot. I am so ignorant but I just wanted to pick your brain and see if my recall is reasonable. I have a training set of ~1M embeddings and I set the max query time limit to 10ms (cuz I would need to query it 200k times during my model training). I also set RAM to 20GB, tho I have more available memory slightly (but no larger than 100GB). The [email protected] I'm seeing now is incredibly low, only ~0.1! Did I do anything wrong?

My code for testing is:

from autofaiss import build_index
import numpy as np
import os
import shutil
import faiss

max_index_query_time_ms = 10 #@param {type: "number"}
max_index_memory_usage = "20GB" #@param
metric_type = "ip" #@param ['ip', 'l2']
D=480

# Create embeddings
embeddings = normalize(np.float32(np.random.rand(100000, D)))

# Create a new folder
embeddings_dir = data_path + "/embeddings_folder"
if os.path.exists(embeddings_dir):
    shutil.rmtree(embeddings_dir)
os.makedirs(embeddings_dir)

# Save your embeddings
# You can split you embeddings in several parts if it is too big
# The data will be read in the lexicographical order of the filenames
np.save(f"{embeddings_dir}/corpus_embeddings.npy", embeddings) 

os.makedirs(data_path+"my_index_folder", exist_ok=True)

build_index(embeddings=embeddings_dir, index_path=data_path+"knn.index", 
            index_infos_path=data_path+"infos.json", 
            metric_type=metric_type, 
            max_index_query_time_ms=max_index_query_time_ms,
            max_index_memory_usage=max_index_memory_usage, 
            make_direct_map=False, use_gpu=True)

temp1 = np.random.randn(1024, D).astype(np.float32)
temp2 = embeddings

index = faiss.read_index(str(data_path+"knn.index"), faiss.IO_FLAG_MMAP | faiss.IO_FLAG_READ_ONLY)
# index.nprobe=64
start = timeit.default_timer()
values, neighbors_q = index.search(normalize(temp1), 20)
end = timeit.default_timer()
print(end - start)
print(sorted(neighbors_q[0]))

temp = normalize(temp1, axis=1) @ normalize(embeddings, axis=1).T
topk_indices_normalize = np.argpartition(temp, kth=temp.shape[1]-20, axis=1)[:, -20:]
print(sorted(topk_indices_normalize[0]))

opened by jasperhyp 4

Fix potential out of disk problem when producing N indices

When we produce N indices (with nb_indices_to_keep larger than 1), within the function of optimize_and_measure_indices, we download N indices from remote in one shot (see here), if the machine running autofaiss has limited disk space, it would fail due to No space left error.

opened by hitchhicker 3

Fix "ValueError: substring not found" on line 201

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-6-2a3c85a29b16> in <module>()
     13     knn_extra_neighbors = 10,                     # num extra neighbors to fetch
     14     max_index_memory_usage = '1m',
---> 15     current_memory_available = '1G'
     16 )

7 frames

/usr/local/lib/python3.7/dist-packages/retro_pytorch/training.py in __init__(self, retro, chunk_size, documents_path, knn, glob, chunks_memmap_path, seqs_memmap_path, doc_ids_memmap_path, max_chunks, max_seqs, max_docs, knn_extra_neighbors, **index_kwargs)
    160             num_nearest_neighbors = knn,
    161             num_extra_neighbors = knn_extra_neighbors,
--> 162             **index_kwargs
    163         )
    164 

/usr/local/lib/python3.7/dist-packages/retro_pytorch/retrieval.py in chunks_to_precalculated_knn_(num_nearest_neighbors, num_chunks, chunk_size, chunk_memmap_path, doc_ids_memmap_path, use_cls_repr, max_rows_per_file, chunks_to_embeddings_batch_size, embed_dim, num_extra_neighbors, **index_kwargs)
    346         chunk_size = chunk_size,
    347         chunk_memmap_path = chunk_memmap_path,
--> 348         **index_kwargs
    349     )
    350 

/usr/local/lib/python3.7/dist-packages/retro_pytorch/retrieval.py in chunks_to_index_and_embed(num_chunks, chunk_size, chunk_memmap_path, use_cls_repr, max_rows_per_file, chunks_to_embeddings_batch_size, embed_dim, **index_kwargs)
    321     index = index_embeddings(
    322         embeddings_folder = EMBEDDING_TMP_SUBFOLDER,
--> 323         **index_kwargs
    324     )
    325 

/usr/local/lib/python3.7/dist-packages/retro_pytorch/retrieval.py in index_embeddings(embeddings_folder, index_file, index_infos_file, max_index_memory_usage, current_memory_available)
    281         max_index_memory_usage = max_index_memory_usage,
    282         current_memory_available = current_memory_available,
--> 283         should_be_memory_mappable = True
    284     )
    285 

/usr/local/lib/python3.7/dist-packages/autofaiss/external/quantize.py in build_index(embeddings, index_path, index_infos_path, ids_path, save_on_disk, file_format, embedding_column_name, id_columns, index_key, index_param, max_index_query_time_ms, max_index_memory_usage, current_memory_available, use_gpu, metric_type, nb_cores, make_direct_map, should_be_memory_mappable, distributed, temporary_indices_folder)
    142         with Timeit("Reading total number of vectors and dimension"):
    143             nb_vectors, vec_dim = read_total_nb_vectors_and_dim(
--> 144                 embeddings_path, file_format=file_format, embedding_column_name=embedding_column_name
    145             )
    146             print(f"There are {nb_vectors} embeddings of dim {vec_dim}")

/usr/local/lib/python3.7/dist-packages/autofaiss/readers/embeddings_iterators.py in read_total_nb_vectors_and_dim(embeddings_path, file_format, embedding_column_name)
    244             dim: embedding dimension
    245         """
--> 246     fs, file_paths = get_file_list(embeddings_path, file_format)
    247 
    248     _, dim = get_file_shape(file_paths[0], file_format=file_format, embedding_column_name=embedding_column_name, fs=fs)

/usr/local/lib/python3.7/dist-packages/autofaiss/readers/embeddings_iterators.py in get_file_list(path, file_format)
    178     """
    179     if isinstance(path, str):
--> 180         return _get_file_list(path, file_format)
    181     all_file_paths = []
    182     fs = None

/usr/local/lib/python3.7/dist-packages/autofaiss/readers/embeddings_iterators.py in _get_file_list(path, file_format, sort_result)
    199     """Get the file system and all the file paths that matches `file_format` given a single path."""
    200     fs, path_in_fs = fsspec.core.url_to_fs(path)
--> 201     prefix = path[: path.index(path_in_fs)]
    202     glob_pattern = path.rstrip("/") + f"**/*.{file_format}"
    203     file_paths = fs.glob(glob_pattern)

ValueError: substring not found

opened by josephcappadona 3

add_with_ids is not implemented for Flat indexes
Hello, I'm encountering an issue using autofaiss with flat indexes. build_index raises an error (in my case, when embeddings are ndarray, I did not test with parquet embeddings) in distributed mode, for flat indexes. This error could be related to https://github.com/facebookresearch/faiss/issues/1212 (method index.add_with_ids is not implemented for flat indexes).

from autofaiss import build_index build_index( embeddings=np.ones((100, 512)), distributed="pyspark", should_be_memory_mappable=True, index_path="hdfs://root/user/foo/knn.index", index_key="Flat", nb_cores=20, max_index_memory_usage="32G", current_memory_available="48G", ids_path="hdfs://root/user/foo/test_indexing_out/ids", temporary_indices_folder="hdfs://root/user/foo/indices/tmp/", nb_indices_to_keep=5, index_infos_path="hdfs://root/user/r.laby/test_indexing_out/index_infos.json", )

raises

RuntimeError: Error in virtual void faiss::Index::add_with_ids(faiss::Index::idx_t, const float*, const idx_t*) at /project/faiss/faiss/Index.cpp:39: add_with_ids not implemented for this type of index

Is it expected ? Or could this be fixed ? Thanks !
opened by RomainLaby 2
Distributed less indices

this is faster and now easy to do thanks to embedding reader integration

based on https://github.com/criteo/autofaiss/pull/92

only the second commit is part of this PR

opened by rom1504 2
fix training memory estimation

This fix training and memory estimation by over estimating training memory a bit (x1.5 of the training vectors) That prevents OOM but is not optimal

A proper fix can be tracked at https://github.com/criteo/autofaiss/issues/85

opened by rom1504 2
Control verbosity of messages

Hi, thanks for this library, it really helps, when working with faiss! One minor problem I have is that I would like to control the verbosity of the messages, since I use this autofaiss in my own library. The simplest way to do that would probably through the use of python's logging module.

Is there anything planned in that regard?

opened by dobraczka 2
add option to build the index with a direct map to enable fast reconstruction

simply call faiss.extract_index_ivf(index).set_direct_map_type(faiss.DirectMap.Array) under an option before starting the .add there https://github.com/criteo/autofaiss/blob/master/autofaiss/external/build.py#L171

opened by rom1504 2

`build_index` fails with "ValueError: No embeddings found in folder"

I'm try to run autofaiss build_index as follows,

[nix-shell:~]$ autofaiss build_index ./deleteme/ --file_format=parquet
2022-12-15 23:23:02,143 [INFO]: Using 32 omp threads (processes), consider increasing --nb_cores if you have more
2022-12-15 23:23:02,144 [INFO]: Launching the whole pipeline 12/15/2022, 23:23:02
2022-12-15 23:23:02,144 [INFO]: Reading total number of vectors and dimension 12/15/2022, 23:23:02
2022-12-15 23:23:02,146 [INFO]: >>> Finished "Reading total number of vectors and dimension" in 0.0028 secs
2022-12-15 23:23:02,147 [INFO]: >>> Finished "Launching the whole pipeline" in 0.0030 secs
Traceback (most recent call last):
  File "/nix/store/0cnbvzcbn02najv78fsqvvjivgy4dpkk-python3.10-autofaiss-2.15.3/bin/.autofaiss-wrapped", line 9, in <module>
    sys.exit(main())
  File "/nix/store/0cnbvzcbn02najv78fsqvvjivgy4dpkk-python3.10-autofaiss-2.15.3/lib/python3.10/site-packages/autofaiss/external/quantize.py", line 596, in main
    fire.Fire(
  File "/nix/store/801g89pidv78hqddvp29r08h1ji62bqk-python3.10-fire-0.4.0/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/nix/store/801g89pidv78hqddvp29r08h1ji62bqk-python3.10-fire-0.4.0/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/nix/store/801g89pidv78hqddvp29r08h1ji62bqk-python3.10-fire-0.4.0/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/nix/store/0cnbvzcbn02najv78fsqvvjivgy4dpkk-python3.10-autofaiss-2.15.3/lib/python3.10/site-packages/autofaiss/external/quantize.py", line 205, in build_index
    embedding_reader = EmbeddingReader(
  File "/nix/store/5gjs659k7bjza921kajn3vikgghkz5dk-python3.10-embedding-reader-1.5.0/lib/python3.10/site-packages/embedding_reader/embedding_reader.py", line 22, in __init__
    self.reader = ParquetReader(
  File "/nix/store/5gjs659k7bjza921kajn3vikgghkz5dk-python3.10-embedding-reader-1.5.0/lib/python3.10/site-packages/embedding_reader/parquet_reader.py", line 68, in __init__
    raise ValueError(f"No embeddings found in folder {embeddings_folder}")
ValueError: No embeddings found in folder ./deleteme/

but I do have embeddings in ./deleteme/!

[nix-shell:~]$ ls ./deleteme/
deleteme.parquet-00000-of-00001

Furthermore, this parquet file parses just fine and matches the column names expected by autofaiss:

In [14]: pq.read_table("./deleteme/deleteme.parquet-00000-of-00001")
Out[14]: 
pyarrow.Table
vin: string
timestamp: int64
camera: string
bbox: fixed_size_list<item: uint16>[4]
  child 0, item: uint16
id: int64
embedding: fixed_size_list<item: float>[768]
  child 0, item: float
----
vin: [["XX4L4100140","XX4L4100140","XX4L4100140","XX4L4100140","XX9L4100103",...,"XXXL4100076","XXXL4100076","XXXL4100076","XXXL4100076","XXXL4100076"]]
timestamp: [[1641009004,1641009004,1641009004,1641009004,1640995845,...,1641002256,1641002256,1641002256,1641002256,1641002256]]
camera: [["camera_back_left","camera_back_left","camera_back_left","camera_back_left","camera_rear_medium",...,"camera_front_left_80","camera_front_left_80","camera_front_left_80","camera_front_left_80","camera_front_left_80"]]
bbox: [[[1476,405,1824,839],[269,444,632,637],...,[826,377,981,492],[1194,404,1480,587]]]
id: [[-8209940914704430861,-8874558295300428965,6706661532224839957,-8984308169583777616,1311470225947591668,...,-8769893754771418171,-8253568985418968059,-6239971725986942111,7715533091743341224,2502116624477591343]]
embedding: [[[-0.015306762,0.054586615,0.022397395,0.008673363,-0.0064821607,...,-0.023860542,0.032048535,-0.029431753,0.012359367,-0.022298913],[-0.006019405,0.04093461,0.010485844,0.00063089275,0.023878522,...,0.018967431,0.006789252,-0.01607387,-0.0037895043,0.009490352],...,[0.009580072,0.06454213,-0.0065298285,0.017814448,0.026221843,...,0.032834977,0.0094326865,-0.007913973,-0.009541624,-0.0115858],[0.009568084,0.057270113,-0.0055452115,0.008511255,0.019073263,...,0.0302203,0.009586956,0.0019548207,0.00042776446,0.0094863055]]]

What's going wrong here? How can I create an index out of a parquet dataset?

opened by samuela 0

How to specify IDs when using `npy` format?

Reading these docs it appears as though one can set the entry IDs when using parquet by setting –id_columns. How does one set entry IDs when using npy format?

opened by samuela 1
autofaiss installation error - Failed building wheel for faiss-cpu!

I have had success to install autofaiss until last week on AWS SageMaker instance (Gpu instances - Amazon Linux AMI release 2018.03) using the following command:

pip3 install autofaiss==1.3.0

But today I suddenly see this error while installing. Has anyone seen this issue? Any ideas what is causing this?

Building wheels for collected packages: autofaiss, faiss-cpu, fire, termcolor Building wheel for autofaiss (setup.py) ... done Created wheel for autofaiss: filename=autofaiss-1.3.0-py3-none-any.whl size=48764 sha256=42d6ce69ff186041b585bd3317cf2e00ce3a4ede3034f58eca1575e50e6c5f91 Stored in directory: /home/ec2-user/.cache/pip/wheels/cf/43/6d/4fc7683a2417491d8fab927f449753834890d49bc686fef63f Building wheel for faiss-cpu (pyproject.toml) ... error ERROR: Command errored out with exit status 1: command: /home/ec2-user/anaconda3/envs/python3/bin/python /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpvvweru4f cwd: /tmp/pip-install-hxmdeqf3/faiss-cpu_ad7ee699550b4e308c2025580dec850a Complete output (10 lines): running bdist_wheel running build running build_py running build_ext building 'faiss._swigfaiss' extension swigging faiss/faiss/python/swigfaiss.i to faiss/faiss/python/swigfaiss_wrap.cpp swig -python -c++ -Doverride= -I/usr/local/include -Ifaiss -doxygen -DSWIGWORDSIZE64 -module swigfaiss -o faiss/faiss/python/swigfaiss_wrap.cpp faiss/faiss/python/swigfaiss.i swig error : Unrecognized option -doxygen Use 'swig -help' for available options. error: command 'swig' failed with exit status 1

ERROR: Failed building wheel for faiss-cpu Building wheel for fire (setup.py) ... done Created wheel for fire: filename=fire-0.4.0-py2.py3-none-any.whl size=115928 sha256=f50e61a858631fddf400046f1f27ab54aacf6014b781a1ad2a4bd207089050e9 Stored in directory: /home/ec2-user/.cache/pip/wheels/a6/12/74/ce0728e3990845862240349a12d7179a262e388ec73938024b Building wheel for termcolor (setup.py) ... done Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-any.whl size=4829 sha256=3987cbb706d21a57b91471ebedcb34b316294332881c99820ddfa4d5f279860f Stored in directory: /home/ec2-user/.cache/pip/wheels/93/2a/eb/e58dbcbc963549ee4f065ff80a59f274cc7210b6eab962acdc Successfully built autofaiss fire termcolor Failed to build faiss-cpu ERROR: Could not build wheels for faiss-cpu, which is required to install pyproject.toml-based projects

opened by kjahan 1
fix(index_utils): #143 Windows 10 compatibility

Fix for issue #143

NamedTemporaryFile should not delete the file as described in this StackOverflow: https://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file

This fix solves a Windows 10 compatibility issue.

opened by ezalos 1

Bug [Windows10]: misuse of NamedTemporaryFile in get_index_size()

Description of the bug:

On Windows10, when creating an index with autofaiss python a Permission Denied is obtained during a call to open:

Traceback (most recent call last):
  File "C:\Users\Adrien\anaconda3\envs\ICONO-prod\lib\site-packages\autofaiss\external\quantize.py", line 286, in build_index
    index, metric_infos = create_index(
  File "C:\Users\Adrien\anaconda3\envs\ICONO-prod\lib\site-packages\autofaiss\external\build.py", line 162, in create_index
    index, metrics = add_embeddings_to_index(
  File "C:\Users\Adrien\anaconda3\envs\ICONO-prod\lib\site-packages\autofaiss\external\build.py", line 114, in add_embeddings_to_index
    return add_embeddings_to_index_local(
  File "C:\Users\Adrien\anaconda3\envs\ICONO-prod\lib\site-packages\autofaiss\indices\build.py", line 105, in add_embeddings_to_index_local
    metric_infos = index_optimizer(trained_index, "") if index_optimizer else None  # type: ignore
  File "C:\Users\Adrien\anaconda3\envs\ICONO-prod\lib\site-packages\autofaiss\indices\build.py", line 59, in _optimize_index_fn
    metric_infos = optimize_and_measure_index(
  File "C:\Users\Adrien\anaconda3\envs\ICONO-prod\lib\site-packages\autofaiss\external\optimize.py", line 568, in optimize_and_measure_index
    metric_infos.update(compute_fast_metrics(embedding_reader, index))
  File "C:\Users\Adrien\anaconda3\envs\ICONO-prod\lib\site-packages\autofaiss\external\scores.py", line 27, in compute_fast_metrics
    size_bytes = get_index_size(index)
  File "C:\Users\Adrien\anaconda3\envs\ICONO-prod\lib\site-packages\autofaiss\indices\index_utils.py", line 29, in get_index_size
    faiss.write_index(index, tmp_file.name)
  File "C:\Users\Adrien\anaconda3\envs\ICONO-prod\lib\site-packages\faiss\swigfaiss.py", line 9645, in write_index
    return _swigfaiss.write_index(*args)
RuntimeError: Error in __cdecl faiss::FileIOWriter::FileIOWriter(const char *) at D:\a\faiss-wheels\faiss-wheels\faiss\faiss\impl\io.cpp:98: Error: 'f' failed: could not open C:\Users\Adrien\AppData\Local\Temp\tmphnewcax0 for writing: Permission denied

Steps to reproduce

On a fresh virtualenv with autofaiss version 2.15.3

from autofaiss import build_index
import numpy as np
import os

os.makedirs("embeddings", exist_ok=True)
os.makedirs("my_index_folder", exist_ok=True)

embeddings = np.float32(np.random.rand(100, 512))
np.save("embeddings/0.npy", embeddings)
ret = build_index(
   embeddings="embeddings",
   index_path="my_index_folder/knn.index",
   index_infos_path="my_index_folder/index_infos.json",
)
print(f"{ret = }")

Solution

In autofaiss\indices\index_utils.py the call to NamedTemporaryFile should not delete the file as described in this StackOverflow

As such, a fix could be:

def get_index_size(index: faiss.Index) -> int:
    """Returns the size in RAM of a given index"""

    delete = True
    if os.name == "nt" :
        delete = False

    with NamedTemporaryFile(delete=delete) as tmp_file:
        faiss.write_index(index, tmp_file.name)
        size_in_bytes = Path(tmp_file.name).stat().st_size

    return size_in_bytes

opened by ezalos 0

Vector normalization while building index

Hi! According to the docs faiss doens't natively support cosine similarity as distance metric. The closest one is inner product which additionaly needs to prenormalize embedding vectors. In FAQ authors propose a way to do it manually with their function faiss.normalize_L2. I have exactly the same case and would be glad, if autofaiss have an optional flag which additionally prenormalize vectors before building index. It seems to me that it's not so difficult and ones should add faiss.normalize_L2 to each place where iterate over embedding_reader. If so i can make a PR.

opened by blatr 7

Releases(2.15.4)

2.15.4(Dec 23, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.06 MB)
autofaiss-3.8.pex(353.09 MB)
2.15.3(Sep 7, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.13 MB)
autofaiss-3.8.pex(344.53 MB)
2.15.2(Sep 1, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.13 MB)
autofaiss-3.8.pex(344.53 MB)
2.15.1(Aug 10, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.13 MB)
autofaiss-3.8.pex(344.02 MB)
2.15.0(Aug 1, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.13 MB)
autofaiss-3.8.pex(344.06 MB)
2.14.3(May 9, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.11 MB)
autofaiss-3.8.pex(343.71 MB)
2.14.2(May 6, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.11 MB)
autofaiss-3.8.pex(343.71 MB)
2.14.1(May 1, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.11 MB)
autofaiss-3.8.pex(343.71 MB)
2.14.0(Mar 31, 2022)
[2.14.0] - 2022-03-31

Added

Add the possibility to tune the index to return at least k nearest neighbors

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.11 MB)
autofaiss-3.8.pex(343.70 MB)
2.13.2(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.11 MB)
autofaiss-3.8.pex(343.49 MB)
2.13.1(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.32 MB)
autofaiss-3.8.pex(343.73 MB)
2.13.0(Mar 9, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.33 MB)
autofaiss-3.8.pex(343.74 MB)
2.12.1(Mar 8, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.32 MB)
autofaiss-3.8.pex(343.74 MB)
2.12.0(Mar 8, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.32 MB)
autofaiss-3.8.pex(343.74 MB)
2.11.1(Mar 6, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.32 MB)
autofaiss-3.8.pex(343.73 MB)
2.11.0(Mar 6, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(338.32 MB)
autofaiss-3.8.pex(343.73 MB)
2.10.3(Mar 4, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(58.72 MB)
autofaiss-3.8.pex(64.06 MB)
2.10.2(Feb 26, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(58.62 MB)
autofaiss-3.8.pex(64.00 MB)
2.10.1(Feb 25, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(58.62 MB)
autofaiss-3.8.pex(64.00 MB)
2.10.0(Feb 25, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(58.62 MB)
autofaiss-3.8.pex(64.00 MB)
2.9.9(Feb 23, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(58.61 MB)
autofaiss-3.8.pex(64.00 MB)
2.9.8(Feb 22, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(58.61 MB)
autofaiss-3.8.pex(63.99 MB)
2.9.7(Feb 21, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss-3.6.pex(58.61 MB)
autofaiss-3.8.pex(63.99 MB)
2.9.6(Feb 21, 2022)

Source code(tar.gz)
Source code(zip)
2.9.5(Feb 21, 2022)

Source code(tar.gz)
Source code(zip)
autofaiss.pex(58.61 MB)
2.9.4(Feb 21, 2022)

[2.9.4] - 2022-02-21

Fixed

better dependencies ranges
Source code(tar.gz)
Source code(zip)
2.9.3(Feb 18, 2022)
[2.9.3] - 2022-02-18

Fixed

Fix/Complete some documents

Disable IVF, Flat index_key for large numbers of vectors on CPU

Source code(tar.gz)
Source code(zip)
2.9.2(Feb 17, 2022)

[2.9.2] - 2022-02-17

Fixed

Fix "Filter empty files"
Source code(tar.gz)
Source code(zip)
2.9.1(Feb 17, 2022)
[2.9.1] - 2022-02-17

Added

Changed

Deprecated

Removed

Fixed

Empty ids path and temporary small indices folder at the beginning

Source code(tar.gz)
Source code(zip)
2.9.0(Feb 16, 2022)
[2.9.0] - 2022-02-16

Added

Use a central logger instead of print functions

Add a verbosity flag to control the log level

Changed

Deprecated

Removed

Fixed

Security
Source code(tar.gz)
Source code(zip)

Owner

Criteo

GitHub Repository https://criteo.github.io/autofaiss/

GroundSeg Clustering Optimized Kdtree

ground seg and clustering based on kitti velodyne data, and a additional optimized kdtree for knn and radius nn search

2 Dec 02, 2021

TensorFlow implementation of an arbitrary order Factorization Machine

This is a TensorFlow implementation of an arbitrary order (=2) Factorization Machine based on paper Factorization Machines with libFM. It supports: d

785 Dec 21, 2022

Retrieve annotated intron sequences and classify them as minor (U12-type) or major (U2-type)

(intron I nterrogator and C lassifier) intronIC is a program that can be used to classify intron sequences as minor (U12-type) or major (U2-type), usi

4 Jul 26, 2022

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Demand-Forecasting Business Problem A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

3 Mar 06, 2022

pure-predict: Machine learning prediction in pure Python

pure-predict speeds up and slims down machine learning prediction applications. It is a foundational tool for serverless inference or small batch prediction with popular machine learning frameworks l

84 Dec 29, 2022

Machine learning template for projects based on sklearn library.

17 Oct 28, 2022

Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber

124 Dec 28, 2022

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

2.5k Jan 06, 2023

MegFlow - Efficient ML solutions for long-tailed demands.

Efficient ML solutions for long-tailed demands.

371 Dec 21, 2022

虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

🎉 第二版本 🎉 （现货趋势网格）介绍在第一版本的基础上趋势判断，不在固定点位开单，选择更优的开仓点位优势： 🎉 简单易上手安全(不用将api_secret告诉他人) 如何启动修改app目录下的authorization文件

250 Jan 07, 2023

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validat

121 Dec 28, 2022

A comprehensive repository containing 30+ notebooks on learning machine learning!

3.8k Jan 09, 2023

Extended Isolation Forest for Anomaly Detection

Table of contents Extended Isolation Forest Summary Motivation Isolation Forest Extension The Code Installation Requirements Use Citation Releases Ext

377 Dec 18, 2022

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them.

3 Apr 03, 2022

The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it inside a loop of Design, Model Development and Operations.

MLOps The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it insid

25 Nov 27, 2022

Automatically create Faiss knn indices with the most optimal similarity search parameters.

Related tags

Overview

AutoFaiss

How to use autofaiss?

Install from source

Comments

Description of the bug:

Steps to reproduce

Solution

Releases(2.15.4)

2.15.4(Dec 23, 2022)

2.15.3(Sep 7, 2022)

2.15.2(Sep 1, 2022)

2.15.1(Aug 10, 2022)

2.15.0(Aug 1, 2022)

2.14.3(May 9, 2022)

2.14.2(May 6, 2022)

2.14.1(May 1, 2022)

2.14.0(Mar 31, 2022)

[2.14.0] - 2022-03-31

Added

2.13.2(Mar 15, 2022)

2.13.1(Mar 10, 2022)

2.13.0(Mar 9, 2022)

2.12.1(Mar 8, 2022)

2.12.0(Mar 8, 2022)

2.11.1(Mar 6, 2022)

2.11.0(Mar 6, 2022)

2.10.3(Mar 4, 2022)

2.10.2(Feb 26, 2022)

2.10.1(Feb 25, 2022)

2.10.0(Feb 25, 2022)

2.9.9(Feb 23, 2022)

2.9.8(Feb 22, 2022)

2.9.7(Feb 21, 2022)

2.9.6(Feb 21, 2022)

2.9.5(Feb 21, 2022)

2.9.4(Feb 21, 2022)

[2.9.4] - 2022-02-21

Fixed

2.9.3(Feb 18, 2022)

[2.9.3] - 2022-02-18

Fixed

2.9.2(Feb 17, 2022)

[2.9.2] - 2022-02-17

Fixed

2.9.1(Feb 17, 2022)

[2.9.1] - 2022-02-17

Added

Changed

Deprecated

Removed

Fixed

2.9.0(Feb 16, 2022)

[2.9.0] - 2022-02-16

Added

Changed

Deprecated

Removed

Fixed

Security

Owner

Criteo

GroundSeg Clustering Optimized Kdtree

TensorFlow implementation of an arbitrary order Factorization Machine

Retrieve annotated intron sequences and classify them as minor (U12-type) or major (U2-type)

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

pure-predict: Machine learning prediction in pure Python

Machine learning template for projects based on sklearn library.

Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

MegFlow - Efficient ML solutions for long-tailed demands.

虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

A comprehensive repository containing 30+ notebooks on learning machine learning!

Extended Isolation Forest for Anomaly Detection

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

The project's goal is to show a real world application of image segmentation using k means algorithm

ML-powered Loan-Marketer Customer Filtering Engine