🤗 Push your spaCy pipelines to the Hugging Face Hub

Overview

spacy-huggingface-hub: Push your spaCy pipelines to the Hugging Face Hub

This package provides a CLI command for uploading any trained spaCy pipeline packaged with spacy package to the Hugging Face Hub. It auto-generates all meta information for you, uploads a pretty README (requires spaCy v3.1+) and handles version control under the hood.

PyPi GitHub

🤗 About the Hugging Face Hub

The Hugging Face Hub hosts Git-based repositories which are storage spaces that can contain all your files. These repositories have multiple advantages: versioning (commit history and diffs), branches, useful metadata about their tasks, languages, metrics and more, browser-based visualizers to explore the models interactively in your browser, as well as an API to use the models in production.

🚀 Quickstart

You can install spacy-huggingface-hub from pip:

pip install spacy-huggingface-hub

To check if the command has been registered successfully:

python -m spacy huggingface-hub --help

Hugging Face uses Git Large File Storage (LFS) to handle files larger than 10mb. You can find instructions on how to download it here.

You can then upload any pipeline packaged with spacy package. Make sure to set --build wheel to output a binary .whl file. The uploader will read all metadata from the pipeline package, including the auto-generated pretty README.md and the model details available in the meta.json.

huggingface-cli login
python -m spacy package ./en_ner_fashion ./output --build wheel
cd ./output/en_ner_fashion-0.0.0/dist
python -m spacy huggingface-hub push en_ner_fashion-0.0.0-py3-none-any.whl

The command will output two things:

pip install https://huggingface.co/spacy/en_core_web_sm/resolve/main/en_core_web_sm-any-py3-none-any.whl

Now you can share your pipelines very quickly with others. Additionally, you can also test your pipeline directly in the browser!

Image of browser widget

⚙️ Usage and API

If spaCy is already installed in the same environment, this package automatically adds the spacy huggingface-hub commands to the CLI. If you don't have spaCy installed, you can also execute the CLI directly via the package.

push

python -m spacy huggingface-hub push [whl_path] [--org] [--msg] [--local-repo] [--verbose]
python -m spacy_huggingface_hub push [whl_path] [--org] [--msg] [--local-repo] [--verbose]
Argument Type Description
whl_path str / Path The path to the .whl file packaged with spacy package.
--org, -o str Optional name of organization to which the pipeline should be uploaded.
--msg, -m str Commit message to use for update. Defaults to "Update spaCy pipeline".
--local-repo, -l str / Path Local path to the model repository (will be created if it doesn't exist). Defaults to hub in the current working directory.
--verbose, -V bool Output additional info for debugging, e.g. the full generated hub metadata.

Usage from Python

Instead of using the CLI, you can also call the push function from Python. It returns a dictionary containing the "url" of the published model and the "whl_url" of the wheel file, which you can install with pip install

from spacy_huggingface_hub import push

result = push("./en_ner_fashion-0.0.0-py3-none-any.whl")
print(result["url"])
Comments
  • HTTP Error 400 when pushing model to HuggingFace hub

    HTTP Error 400 when pushing model to HuggingFace hub

    Hello,

    I'm not quite sure if this issue is related to #5.

    When I'm trying to push a model on Hugging Face Hub organisation with spaCy CLI:

    python -m spacy huggingface-hub push fr_core_ner4archives_v3_default-0.0.0-py3-none-any.whl -o ner4archives -V
    

    This raises an HTTP 400 error:

    Pushing repository to the hub...
    Traceback (most recent call last):
      File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/spacy/__main__.py", line 4, in <module>
        setup_cli()
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/spacy/cli/_util.py", line 71, in setup_cli
        command(prog_name=COMMAND)
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
        return callback(**use_params)  # type: ignore
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/spacy_huggingface_hub/push.py", line 53, in huggingface_hub_push_cli
        push(whl_path, organization, commit_msg, verbose=verbose)
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/spacy_huggingface_hub/push.py", line 130, in push
        url = upload_folder(
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 2109, in upload_folder
        pr_url = self.create_commit(
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1844, in create_commit
        _raise_for_status(commit_resp)
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 84, in _raise_for_status
        _raise_with_request_id(request)
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 95, in _raise_with_request_id
        raise e
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 90, in _raise_with_request_id
        request.raise_for_status()
      File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
        raise HTTPError(http_error_msg, response=self)
    requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/models/lterriel/fr_core_ner4archives_v3_default/commit/main (Request ID: QTOTcNAiXjAsn45fwTKLO)
    
    

    However, the new model repository is well created in Hugging Face Hub organisation but without the model (files) and its card.

    packages installed:

    • spaCy==3.3.1
    • spacy-huggingface-hub==0.0.7
    • huggingface-hub==0.8.1
    opened by Lucaterre 24
  • Error when pushing to huggingface hub

    Error when pushing to huggingface hub

    Hello,

    I have a problem using this library. When I try to run the command to upload the whl file:

    python -m spacy huggingface-hub push en_acnl_electra_pipeline-0.0.1-py3-none-any.whl
    

    The error occurs:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py", line 365, in git_pull
        cwd=self.local_dir,
      File "/usr/lib/python3.7/subprocess.py", line 512, in run
        output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command '['git', 'pull', '--rebase']' returned non-zero exit status 128.
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/usr/local/lib/python3.7/dist-packages/spacy/__main__.py", line 4, in <module>
        setup_cli()
      File "/usr/local/lib/python3.7/dist-packages/spacy/cli/_util.py", line 69, in setup_cli
        command(prog_name=COMMAND)
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/typer/main.py", line 497, in wrapper
        return callback(**use_params)  # type: ignore
      File "/usr/local/lib/python3.7/dist-packages/spacy_huggingface_hub/push.py", line 53, in huggingface_hub_push_cli
        push(whl_path, organization, commit_msg, local_repo_path, verbose=verbose)
      File "/usr/local/lib/python3.7/dist-packages/spacy_huggingface_hub/push.py", line 91, in push
        repo.git_pull(rebase=True)
      File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py", line 368, in git_pull
        raise EnvironmentError(exc.stderr)
    OSError: error: cannot pull with rebase: Your index contains uncommitted changes.
    error: please commit or stash them.
    

    I'm using:

    ============================== Info about spaCy ==============================

    spaCy version 3.1.3
    Location /usr/local/lib/python3.7/dist-packages/spacy Platform Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic Python version 3.7.12
    Pipelines

    opened by danielvasic 2
  • Use new non-git based API

    Use new non-git based API

    With this approach:

    • Users don't need to configure git or use git lfs
    • A local copy of the repo is not needed anymore, so no local artifacts and issues related to git conflicts anymore
    • Due to above, --local-repo flag is removed.
    • Some light cleanup of codebase, in particular there was an edge scenario when the version is not specified in the whl name (which is the case of the whl on the HF Hub), and it was not being handled appropriately. Now we retrieve the version from the metadata file if the version is not specified

    Result: https://huggingface.co/osanseviero/en_core_web_sm

    opened by osanseviero 1
  • License problem when pushing model

    License problem when pushing model

    Hi, I am trying to push my spaCy model to the Huggingface Hub and I get the following error:

    Traceback (most recent call last):
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/huggingface_hub/repository.py", line 412, in git_push
        result = subprocess.run(
      File "/usr/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['git', 'push']' returned non-zero exit status 1.
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/spacy/__main__.py", line 4, in <module>
        setup_cli()
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/spacy/cli/_util.py", line 69, in setup_cli
        command(prog_name=COMMAND)
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
        return callback(**use_params)  # type: ignore
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/spacy_huggingface_hub/push.py", line 53, in huggingface_hub_push_cli
        push(whl_path, organization, commit_msg, local_repo_path, verbose=verbose)
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/spacy_huggingface_hub/push.py", line 131, in push
        url = repo.push_to_hub(commit_message=commit_msg)
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/huggingface_hub/repository.py", line 434, in push_to_hub
        return self.git_push()
      File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/huggingface_hub/repository.py", line 422, in git_push
        raise EnvironmentError(exc.stderr)
    OSError: remote: ----------------------------------------------------------        
    remote: Sorry, your push was rejected:        
    remote: - error : yaml metadata schema issue on key "license"        
    remote: help: key: /license must be equal to one of the allowed values (error rule: enum), please find the license identifier that fits you at https://huggingface.co/docs/hub/model-repos#list-of-license-identifiers        
    remote: ----------------------------------------------------------        
    remote: Please find the documentation at:        
    remote: https://huggingface.co/docs/hub/model-repos        
    remote: ----------------------------------------------------------        
    To https://huggingface.co/nucci/sv_pipeline
     ! [remote rejected] main -> main (pre-receive hook declined)
    error: failed to push some refs to 'https://huggingface.co/nucci/sv_pipeline'
    

    It seems to be a license problem, so I put the identifier corresponding to the license that I want to give my model in meta.json, but it doesn't seem to solve the problem. Any thoughts?

    opened by Nuccy90 1
  • Fix license format

    Fix license format

    Hugging Face started to validate the license from the metadata by ensuring it's in a list of allowed ones. The expectation is that the license is in lower case, but spaCy uses MIT by default (and others are also in uppercase).

    opened by osanseviero 1
  • Fix model-index structure and misc minor fixes

    Fix model-index structure and misc minor fixes

    This PR has multiple changes, including:

    • Allowing to push an unversioned whl name (recall that repos in HF have whl files with any version in the name, and the extraction happens differently for those).
    • Add the task to the metric name
    • Change the model-index in the metadata to match latest specification.

    This will enable showing a "Evaluation Results" section in the model card Screen Shot 2021-08-10 at 10 58 28 PM

    FYI @julien-c

    opened by osanseviero 1
  • bugfix/fix path for windows + add model to lfs

    bugfix/fix path for windows + add model to lfs

    • zip_ref.namelist() produces a different path format than Path(), that's why for windows the base_layer and file_name couldn't be compared. Because of that, the if file_name.startswith(str(base_name)): statement was never true. The fix was to format file_name to the same format as base_name when using .startswith().

    • model was added to repo.lfs_track() because it's size exceeded 10 Mb, at least for windows

    opened by thomashacker 0
  • argument for adding a model card

    argument for adding a model card

    It would be great to be able to pre-generate, (edit), and attach a model card.

    I have created a model, pushed it, edited the card, then had to push it again, and quite logically it just over-wrote the edited model card. Even without full pull functionality, it would be great to be able to attach edited cards to avoid the need to re-do this job every time manually.

    opened by DSLituiev 1
Releases(v0.0.8)
Owner
Explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
Explosion
RoadMap and preparation material for Machine Learning and Data Science - From beginner to expert.

ML-and-DataScience-preparation This repository has the goal to create a learning and preparation roadMap for Machine Learning Engineers and Data Scien

33 Dec 29, 2022
An implementation of the 1. Parallel, 2. Streaming, 3. Randomized SVD using MPI4Py

PYPARSVD This implementation allows for a singular value decomposition which is: Distributed using MPI4Py Streaming - data can be shown in batches to

Romit Maulik 44 Dec 31, 2022
Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System The possibilities to involve

Babu Kumaran Nalini 0 Nov 19, 2021
Chatbot in 200 lines of code using TensorLayer

Seq2Seq Chatbot This is a 200 lines implementation of Twitter/Cornell-Movie Chatbot, please read the following references before you read the code: Pr

TensorLayer Community 820 Dec 17, 2022
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

83 Nov 29, 2022
This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)

Equivariant Subgraph Aggregation Networks (ESAN) This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (IC

Beatrice Bevilacqua 59 Dec 13, 2022
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Andrés Romero 37 Nov 27, 2022
Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

This repository contains tools to simulate the ground filtering process of a registered point cloud. The repository contains two filtering methods. The first method uses a normal vector, and fit to p

5 Aug 25, 2022
Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification"

Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification" This is an end-to-end framework for accurate and robust left ventr

2 Jul 09, 2022
Code for the paper "On the Power of Edge Independent Graph Models"

Edge Independent Graph Models Code for the paper: "On the Power of Edge Independent Graph Models" Sudhanshu Chanpuriya, Cameron Musco, Konstantinos So

Konstantinos Sotiropoulos 0 Oct 26, 2021
Machine learning and Deep learning models, deploy on telegram (the best social media)

Semi Intelligent BOT The project involves : Classifying fake news Classifying objects such as aeroplane, automobile, bird, cat, deer, dog, frog, horse

MohammadReza Norouzi 5 Mar 06, 2022
A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

WGAN-GP An pytorch implementation of Paper "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, SciPy, Matplotlib A recent NVIDIA GPU

Marvin Cao 1.4k Dec 14, 2022
Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza, Jocelyn Chanussot. Graph Convolutional Networks for Hyperspectral Image Classification, IEEE TGRS, 2021.

Graph Convolutional Networks for Hyperspectral Image Classification Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza, Jocelyn Chanussot T

Danfeng Hong 154 Dec 13, 2022
Cowsay - A rewrite of cowsay in python

Python Cowsay A rewrite of cowsay in python. Allows for parsing of existing .cow

James Ansley 3 Jun 27, 2022
DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control One version of our system is implemented using the

260 Nov 28, 2022
Extremely easy multi instancing software for minecraft speedrunning.

Easy Multi Extremely easy multi/single instancing software for minecraft speedrunning. A couple of goals of this project: Setup multi in minutes No fi

Duncan 8 Jul 16, 2022
Code accompanying our NeurIPS 2021 traffic4cast challenge

Traffic forecasting on traffic movie snippets This repo contains all code to reproduce our approach to the IARAI Traffic4cast 2021 challenge. In the c

Nina Wiedemann 2 Aug 09, 2022
Project repo for Learning Category-Specific Mesh Reconstruction from Image Collections

Learning Category-Specific Mesh Reconstruction from Image Collections Angjoo Kanazawa*, Shubham Tulsiani*, Alexei A. Efros, Jitendra Malik University

438 Dec 22, 2022
This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices.

GBW This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices. Ap

Andi Han 0 Oct 22, 2021
Codebase for Time-series Generative Adversarial Networks (TimeGAN)

Codebase for Time-series Generative Adversarial Networks (TimeGAN)

Jinsung Yoon 532 Dec 31, 2022