An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Overview

Welcome to AdaptNLP

A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models for end to end tasks.

CI PyPI

What is AdaptNLP?

AdaptNLP is a python package that allows users ranging from beginner python coders to experienced Machine Learning Engineers to leverage state-of-the-art Natural Language Processing (NLP) models and training techniques in one easy-to-use python package.

Utilizing fastai with HuggingFace's Transformers library and Humboldt University of Berlin's Flair library, AdaptNLP provides Machine Learning Researchers and Scientists a modular and adaptive approach to a variety of NLP tasks simplifying what it takes to train, perform inference, and deploy NLP-based models and microservices.

What is the Benefit of AdaptNLP Rather Than Just Using Transformers?

Despite quick inference functionalities such as the pipeline API in transformers, it still is not quite as flexible nor fast enough. With AdaptNLP's Easy* inference modules, these tend to be slightly faster than the pipeline interface (bare minimum the same speed), while also providing the user with simple intuitive returns to alleviate any unneeded junk that may be returned.

Along with this, with the integration of the fastai library the code needed to train or run inference on your models has a completely modular API through the fastai Callback system. Rather than needing to write your entire torch loop, if there is anything special needed for a model a Callback can be written in less than 10 lines of code to achieve your specific functionalities.

Finally, when training your model fastai is on the forefront of beign a library constantly bringing in the best practices for achiving state-of-the-art training with new research methodologies heavily tested before integration. As such, AdaptNLP fully supports training with the One-Cycle policy, and using new optimizer combinations such as the Ranger optimizer with Cosine Annealing training through simple one-line fitting functions (fit_one_cycle and fit_flat_cos).

Installation Directions

PyPi

To install with pypi, please use:

pip install adaptnlp

Or if you have pip3:

pip3 install adaptnlp

Conda (Coming Soon)

Developmental Builds

To install any developmental style builds, please follow the below directions to install directly from git:

Stable Master Branch The master branch generally is not updated much except for hotfixes and new releases. To install please use:

pip install git+https://github.com/Novetta/adaptnlp

Developmental Branch {% include note.html content='Generally this branch can become unstable, and it is only recommended for contributors or those that really want to test out new technology. Please make sure to see if the latest tests are passing (A green checkmark on the commit message) before trying this branch out' %} You can install the developmental builds with:

pip install git+https://github.com/Novetta/[email protected]

Docker Images

There are actively updated Docker images hosted on Novetta's DockerHub

The guide to each tag is as follows:

  • latest: This is the latest pypi release and installs a complete package that is CUDA capable
  • dev: These are occasionally built developmental builds at certain stages. They are built by the dev branch and are generally stable
  • *api: The API builds are for the REST-API

To pull and run any AdaptNLP image immediatly you can run:

docker run -itp 8888:8888 novetta/adaptnlp:TAG

Replacing TAG with any of the afformentioned tags earlier.

Afterwards check localhost:8888 or localhost:888/lab to access the notebook containers

Navigating the Documentation

The AdaptNLP library is built with nbdev, so any documentation page you find (including this one!) can be directly run as a Jupyter Notebook. Each page at the top includes an "Open in Colab" button as well that will open the notebook in Google Colaboratory to allow for immediate access to the code.

The documentation is split into six sections, each with a specific purpose:

Getting Started

This group contains quick access to the homepage, what are the AdaptNLP Cookbooks, and how to contribute

Models and Model Hubs

These contain any relevant documentation for the AdaptiveModel class, the HuggingFace Hub model search integration, and the Result class that various inference API's return

Class API

This section contains the module documentation for the inference framework, the tuning framework, as well as the utilities and foundations for the AdaptNLP library.

Inference and Training Cookbooks

These two sections provide quick access to single use recipies for starting any AdaptNLP project for a particular task, with easy to use code designed for that specific use case. There are currently over 13 different tutorials available, with more coming soon.

NLP Services with FastAPI

This section provides directions on how to use the AdaptNLP REST API for deploying your models quickly with FastAPI

Contributing

There is a controbution guide available here

Testing

AdaptNLP is run on the nbdev framework. To run all tests please do the following:

  1. pip install nbverbose
  2. git clone https://github.com/Novetta/adaptnlp
  3. cd adaptnlp
  4. pip install -e .
  5. nbdev_test_nbs

This will run every notebook and ensure that all tests have passed. Please see the nbdev documentation for more information about it.

Contact

Please contact Zachary Mueller at [email protected] with questions or comments regarding AdaptNLP.

Follow us on Twitter at @TheZachMueller and @AdaptNLP for updates and NLP dialogue.

License

This project is licensed under the terms of the Apache 2.0 license.

Comments
  • multi-label classification / paperswithcode dataset

    multi-label classification / paperswithcode dataset

    Hi guys,

    Hope you are all well !

    I was wondering if adaptnlp can handle multi-label classification with 1560 labels.

    More precisely, I would like to apply it to paperswithcode dataset where labels are called tasks.

    Refs:

    Thanks for any insights or inputs on that.

    Cheers, X

    opened by ghost 7
  • cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'

    cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'

    Describe the bug Your demo Colab Notebook "Custom Fine-Tuning and Training with Transformer Models" doesn't work and generates the following error: image

    To Reproduce Steps to reproduce the behavior:

    1. Go to '...'
    2. Click on '....'
    3. Scroll down to '....'
    4. See error

    Expected behavior A clear and concise description of what you expected to happen.

    Screenshots If applicable, add screenshots to help explain your problem.

    Desktop (please complete the following information):

    • OS: [e.g. iOS]
    • Browser [e.g. chrome, safari]
    • Version [e.g. 22]

    Smartphone (please complete the following information):

    • Device: [e.g. iPhone6]
    • OS: [e.g. iOS8.1]
    • Browser [e.g. stock browser, safari]
    • Version [e.g. 22]

    Additional context Add any other context about the problem here.

    bug 
    opened by lematmat 5
  • Significant slowdown in EasyTokenTagger release 0.2.0

    Significant slowdown in EasyTokenTagger release 0.2.0

    I'm experiencing a slowdown in NER performance using EasyTokenTagger and 'ner-ontonotes' after updating to release 0.20. Has there been any underlying changes to how the tagger object works?

    Specifically, I am dealing with a very large chunk of text. Prior to this release, the NER tagging took around 15 seconds for this particular text. Now, it's taking 15+ minutes the first time but subsequent calls on that text are very quick. Is there some sort of caching or indexing that's being done now? I'd imagine this could create a lot of overhead for large chunks of text.

    opened by mkongsiri-Novetta 5
  • Can't load big dataset

    Can't load big dataset

    Describe the bug It happens when I want to learning_rate = finetuner.find_learning_rate(**learning_rate_finder_configs) in the tutorial. I have a big dataset with 200k rows and each of them has a text with around 200 words.

    In your code when you instantiate the TextDataset, the line tokenized_text = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text)) takes an eternity for a text of 20 million words. Do you think it can be achieved in the better/faster way like by keeping the rows like they are ?

    For the record: Time for 100 characters: 0.0003399848937988281s Time for 1000 characters: 0.00124359130859375s Time for 10 000 characters: 0.012135982513427734s Time for 100 000 characters: 0.2131056785583496s Time for 1 000 000 characters: 8.782422542572021s Time for 10 000 000 characters: 734.5397665500641s

    Can't reach the end of the full TextDataset (109 610 928 characters).

    To Reproduce Tutorial with a big dataset

    opened by NicoLivesey 5
  • AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    Describe the bug Trying to freeze a LMFinetuner based on Camembert weights and get:


    AttributeError Traceback (most recent call last) in 6 } 7 finetuner = LMFineTuner(**ft_configs) ----> 8 finetuner.freeze()

    ~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/adaptnlp/transformers/finetuning.py in freeze(self) 1630 """Freeze last classification layer group only 1631 """ -> 1632 layers_len = len(list(self.model.cls.parameters())) 1633 self.freeze_to(-layers_len) 1634

    ~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/torch/nn/modules/module.py in getattr(self, name) 573 if name in modules: 574 return modules[name] --> 575 raise AttributeError("'{}' object has no attribute '{}'".format( 576 type(self).name, name)) 577

    AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    To Reproduce

    from adaptnlp import LMFineTuner
    train_file = "path/to/train" 
    valid_file = "path/to/valid"
    ft_configs = {
                  "train_data_file": train_file,
                  "eval_data_file": valid_file,
                  "model_type": "camembert",
                  "model_name_or_path": "camembert-base",
                 }
    finetuner = LMFineTuner(**ft_configs)
    finetuner.freeze()
    

    Expected behavior No error

    Desktop (please complete the following information):

    • OS: Amazon Linux
    • Browser Chrome
    opened by NicoLivesey 4
  • AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

    AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

    Cannot use pool option to generate embeddings (instead of the default rnn).

    A snippet for the problem:

    embedding_type='albert-xxlarge-v2'
    embedding_methods=["pool"]
    doc_embeddings = EasyDocumentEmbeddings(embedding_type, methods = embedding_methods)
    
    

    This is the error I get:

      File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 91, in __init__
       self._initial_setup(self.label_dict, **kwargs)
     File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 97, in _initial_setup
       document_embeddings: DocumentRNNEmbeddings = self.encoder.rnn_embeddings
    AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'
    

    Expected behavior would be to successfully obtain an easy document embeddings object with no errors

    Running on debian buster, python3.7

    If someone could give me a fix or a workaround or if I'm using this incorrectly, then please let me know

    opened by blerstpub 3
  • EasySequenceClassifier tag_text function returns None for FlairSequenceClassifier model

    EasySequenceClassifier tag_text function returns None for FlairSequenceClassifier model

    Hi! I tried to follow the tutorial for training custom sequence classifier: https://novetta.github.io/adaptnlp/tutorial/training-sequence-classification.html The last step returns empty sentences while expected labels: sentences = classifier.tag_text(example_text, model_name_or_path=OUTPUT_DIR)

    To Reproduce the behavior:

    from adaptnlp import EasySequenceClassifier
    from flair.data import Sentence
    
    OUTPUT_DIR = "…/best-model.pt"    # my custom model
    classifier = EasySequenceClassifier()
    
    ex_text = "This is a good text example"
    example_text=[Sentence(ex_text)]
    
    sentences = classifier.tag_text(text=example_text, model_name_or_path=OUTPUT_DIR, mini_batch_size=1)
    print("Label output:\n")
    print(sentences)
    

    Returns

    2020-12-28 17:44:31,111 loading file .../best-model.pt
    Label output:
    
    None
    

    Surprisingly labels got added to example_text print(example_text) Returns [Sentence: " This is a good text example " [− Tokens: 17 − Sentence-Labels: {'label': [0 (0.8812)]}]]

    Proposed explanation/ contribution: I think I know the reason for unexpected behavior and will be happy to help. classifier.tag_text creates FlairSequenceClassifier classifier. FlairSequenceClassifier initiates flair.models.TextClassifier classifier and uses TextClassifier predict method within its own predict method. But flair.models.TextClassifier predict method returns None because the labels are directly added to the sentences. I can re-write FlairSequenceClassifier predict method to return Sentences with labels instead of None.

    opened by DinaraN 3
  • Sequence classification using REST API fails with models except en-sentiment

    Sequence classification using REST API fails with models except en-sentiment

    Sequence classification over REST API using any model except for en-sentiment fails with:

    File "/usr/local/lib/python3.6/dist-packages/starlette/routing.py", line 41, in app response = await func(request) File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 197, in app dependant=dependant, values=values, is_coroutine=is_coroutine File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 147, in run_endpoint_function return await dependant.call(**values) File "./app/main.py", line 87, in sequence_classifier text=text, mini_batch_size=1, model_name_or_path=_SEQUENCE_CLASSIFICATION_MODEL File "/adaptnlp/adaptnlp/sequence_classification.py", line 285, in tag_text return classifier.predict(text=text, mini_batch_size=mini_batch_size, **kwargs,) File "/adaptnlp/adaptnlp/sequence_classification.py", line 140, in predict text_sent.add_label(label) TypeError: add_label() missing 1 required positional argument: 'value'

    Reproducable with: docker run -itp 5000:5000 -e TOKEN_TAGGING_MODE='ner' \ -e TOKEN_TAGGING_MODEL='ner-ontonotes-fast' \ -e SEQUENCE_CLASSIFICATION_MODEL='nlptown/bert-base-multilingual-uncased-sentiment' \ achangnovetta/adaptnlp-rest:latest \ bash

    opened by VogtAI 3
  • AdaptNLP v0.2.x Additional Features Discussion

    AdaptNLP v0.2.x Additional Features Discussion

    There are a lot of ideas that may be floating for feature implementations, so this thread just provides a mini roadmap and environment to think about adaptnlp's progression.

    Ideas can be stated freely in this thread and do not replace feature-request issue posts.

    • [x] Tokenizer Start integrating tokenizers all across adaptnlp for speed and performance enhancements for training and inference.
    • [x] Summarization Add NLP-task of summarization using document-level encoder based on transformer language models
    • [x] GPU Multi-GPU and mixed-precision is prevalent in AdaptNLP, but its implementation can be improved and debugged ~~FastAPI Batch-Serving Improve on the concurrent calls with batch processing from the NLP models (maybe try to make it CPU and GPU agnostic for ease-of-use)~~ ~~Model Downloading Start structuring a way to download and potentially upload pre-trained NLP-task models~~
    enhancement 
    opened by aychang95 3
  • Data API

    Data API

    We probably should have a data API of some form, that ties into https://github.com/Novetta/adaptnlp/issues/128

    Ideally it should simply prep a dataset for tokenization of a model, or tokenize the data itself.

    For now we cover two inputs:

    1. Individual texts
    2. CSV

    We should support something akin to fastai's get_y, but with decent defaults so that customization is available, but not needed.

    Ideally something like:

    dset = TaskDataset.from_df(
      df,  # Can be fname or dataframe
      get_x = ColReader('text'),
      get_y = ColReader('label'),
      splitter = RandomSplitter(),
      model = 'bert-base-uncased', # The name/type of downstream model
      task = "ner" # Or use a `Task.NER` namespace class
    )
    

    And further:

    dset.dataloaders(bs=8, collate_fn=data_collator)
    

    It reads extremely similar to the fastai API, but we do not use the fastai API, as for text doing it like this is a bit easier.

    The highest level API would look like so:

    dls = TaskDataLoaders.from_df(df, 'text', 'label', model='bert-base-uncased')
    

    We should note the model used, and when integrating it with the tuning API if something is off with the model entered, we make note of that

    enhancement 
    opened by muellerzr 2
  • ImportError: cannot import name 'EasyTokenTagger'

    ImportError: cannot import name 'EasyTokenTagger'

    Describe the bug A clear and concise description of what the bug is. I tried to run the code in the tutorial

    from adaptnlp import EasyTokenTagger
    
    
    ## Example Text
    example_text = "Novetta's headquarters is located in Mclean, Virginia."
    
    ## Load the token tagger module and tag text with the NER model 
    tagger = EasyTokenTagger()
    sentences = tagger.tag_text(text=example_text, model_name_or_path="ner")
    
    ## Output tagged token span results in Flair's Sentence object model
    for sentence in sentences:
        for entity in sentence.get_spans("ner"):
            print(entity)
    

    and it gave me the error:

    ...
      File "/home/rajiv/Documents/dev/python/nltk-trial/adaptnlp.py", line 2, in <module>
        from adaptnlp import EasyTokenTagger
    ImportError: cannot import name 'EasyTokenTagger'
    

    Desktop (please complete the following information):

    • OS: Ubuntu
    • Version: 20.04
    • Python: 3.6.9
    opened by RAbraham 2
  • classifier.tag_text on GPU!

    classifier.tag_text on GPU!

    hi i want to classify texts: classifier = EasySequenceClassifier() hub = HFModelHub() hub.search_model_by_task('text-classification') model = hub.search_model_by_name('nlptown/bert-base', user_uploaded=True)[0]; sentence = classifier.tag_text(text=inputs, model_name_or_path=model, mini_batch_size=1)

    Q1: how force to run it on CPU? Q2: now i have GPU but i can't success to run, my errors: ... FileNotFoundError: [Errno 2] No such file or directory: 'nlptown/bert-base-multilingual-uncased-sentiment' During handling of the above exception, another exception occurred: ... RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    opened by topliftarm 0
  • Unified Training API

    Unified Training API

    Training API will use fastai under the hood, and we'll make a few functions to build general datasets.

    Tasks and sample datasets to use:

    Other Information

    Task API's should have a simple user interface, IE high-level can only input specific options, while midlevel has access to the full fastai Learner params.

    Example mid-level API I'm thinking about:

    dls = some_build_data_thing()
    tuner = QAFineTuner(dls, 'bert-base-cased')
    tuner.tune(
      scheduler = 'fit_flat_cos',
      n_epochs = 3,
      lr = None,
      suggest_method = 'valley', # Triggers if lr is None
      additional_callbacks = []
    )
    

    And its high-level:

    tuner = QAFineTuner.from_csv(
      question_column_name = "question",
      answer_column_name = "answer",
      model = "bert-base-cased"
    )
    tuner.tune(...)
    

    We should automatically pull in proper metrics for each task, but users have the option to bring in their own as well and pass it to QAFineTuner (good defaults)

    Tuners should also have a func like QAFineTuner.from_csv() to build the dataset in-house

    enhancement 
    opened by muellerzr 2
  • Save context in QuestionAnswering and re-use it

    Save context in QuestionAnswering and re-use it

    I notices when we run any code snippet, it convert the text to vectors or some similar thing. For example in this code snippet

    from adaptnlp import EasyQuestionAnswering 
    from pprint import pprint
    
    ## Example Query and Context 
    query = "What is the meaning of life?"
    context = "Machine Learning is the meaning of life."
    top_n = 5
    
    ## Load the QA module and run inference on results 
    qa = EasyQuestionAnswering()
    best_answer, best_n_answers = qa.predict_qa(query=query, context=context, n_best_size=top_n, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")
    
    ## Output top answer as well as top 5 answers
    print(best_answer)
    pprint(best_n_answers)
    

    It convert both query and context to vectors first. What if we have very long context and we have a lot of queries, each time it will convert the context to vector. I think there should be a way to save context vector and re-use it instead of creating again and again.

    enhancement 
    opened by talhaanwarch 1
  • Stretch Goals

    Stretch Goals

    • [x] HuggingFace raw embeddings over Flair
    • [x] Try and integrate Callbacks for text generation and other classes that aren't using it

      Note: Didn't do this for text generation, more complex than its worth

    • [x] Use fastrelease (with conda)
    • [x] Improve test coverage
    • [x] GH CI for testing Mac, Windows, and Linux, similar to how fastai has it setup
    • [x] nbdev?
    • [x] Windows support
    • [x] Use Pipeline for inference

      Note: Pipeline is slower on many tasks that AdaptNLP covers, tests are in place to ensure that this is always true

    • [ ] 1.0.0: Unified training framework for at least 4 NLP tasks
    enhancement 
    opened by muellerzr 0
Releases(v0.3.7)
  • v0.3.7(Nov 10, 2021)

  • v0.3.6(Nov 9, 2021)

  • v0.3.3(Sep 3, 2021)

    Bug Squashed

    • Embeddings were conjoined rather than separated out by word
    • Question Answering Results would only return the first instance, rather than top n instances
    • AdaptiveTuner can accept a label_names parameter for where the labels in a batch are present
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Aug 11, 2021)

    New Features

    • A new Data API that integrates with HuggingFace's Dataset class

    • A new Tuner API for training and fine-tuning Transformer models

    • Full integration of the latest fastai library for full access to state-of-the-art practices when training and fine-tuning a model. As improvements are made to the library AdaptNLP will update to accomodate them

    • A new Result API that most inference modules return. This is a filterable result ensuring that you only get the most relevent information when returning a prediction from the Easy* modules

    Breaking Changes

    • The train and eval capabilities in the Easy* modules no longer exist, and all training related functionalities have migrated to the Tuner API
    • LanguageModelFineTuner no longer exists, and the same tuning functionality is in LanguageModelTuner

    Bugs Squashed

    • max_len Attribute Error (127
    • Integrate a complete Data API (milestone) (129
    • Use the latest fastcore (132)
    • Fix unused kwarg arguments in text generation (134)
    • Fix name 'df' is not defined (135)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.3(May 5, 2021)

    Breaking Changes:

    • New versions of AdaptNLP will require a minimum torch version of 1.7, and flair of 0.9 (currently we install via git until 0.9/0.81 is released)

    New Features

    Bugs Squashed

    • Fix accessing bart-large-cnn (110)
    • Fix SAVE_STATE_WARNING (114)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Jan 11, 2021)

    Official AdaptNLP Docker Images updated

    • Using NVIDIA NGC Container Registry Cuda base images #101
    • All images should be deployable via. Kubeflow Jupyter Servers
    • Cleaner python virtualvenv setup #101
    • Official readme can be found at https://github.com/Novetta/adaptnlp/blob/master/docker/README.md

    Minor Bug Fixes

    • Fix token tagging REST application type check #92
    • Semantic fixes in readme #94
    • Standalone microservice REST application images #93
    • Python 3.7+ is now an official requirement #97
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Sep 17, 2020)

    Updated to nlp 0.4 -> datasets 1.0+ and multi-label training for sequence classification fixes.

    EasySequenceClassifier.train() Updates

    • Integrates datasets.Dataset now
    • Swapped order of formatting and label column renaming due to labels not showing up from torch data batches #87

    Tutorials and Documentation

    • Documentation and sequence classification tutorials have been updated to address nlp->datasets name change
    • Broken links also updated

    ODSC Europe Workshop 2020: Notebooks and Colab

    • ODSC Europe 2020 workshop materials now available in repository "/tutorials/Workshop"
    • Easy to run notebooks and colab links aligned with the tutorials are available
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Sep 1, 2020)

    Updated to transformers 3+, nlp 0.4+, flair 0.6+, pandas 1+

    New Features!

    New and "easier" training framework with easy modules: EasySequenceClassifier.train() and EasySequenceClassifier.evaluate()

    • Integrates nlp.Dataset and transformers.Trainer for a streamlined training workflow
    • Tutorials, notebooks, and colab links available
    • Sequence Classification task has been implemented, other NLP tasks are in the works
    • SequenceClassifierTrainer is still available, but will be transitioned into the EasySequenceClassifier and deprecated

    New and "easier" LMFineTuner

    • Integrates transformers.Trainer for a streamlined training workflow
    • Older LMFineTuner is still available as LMFineTunerManual, but will be deprecated in later releases
    • Tutorials, notebooks, and colab links available

    EasyTextGenerator

    • New module for text generation. GPT models are currently supported, other models may work but still experimental
    • Tutorials, notebooks, and colab links available

    Tutorials and Documentation

    • Documentation has been edited and updated to include additional features like the change in training frameworks and fine-tuning
    • The sequence classification tutorial is a good indicator of the direction we are going with the training and fine-tuning framework

    Notebooks and Colab

    • Easy to run notebooks and colab links aligned with the tutorials are available

    Bug fixes

    • Minor bug and implementation error fixes from flair upgrades
    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(May 1, 2020)

  • v0.1.5(Apr 17, 2020)

    Updated to Transformers 2.8.0 which now includes the ELECTRA language model

    EasySummarizer and EasyTranslator Bug Fix #63

    • Address mini batch output format issue for language model heads for the summarization and translation task

    Tutorials and Workshop #64

    • Add the ODSC Timeline Generator notebooks along with colab links
    • Small touch ups in tutorial notebooks

    Documentation

    • Address missing model_name_or_path param in some easy modules
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Apr 2, 2020)

    Updated to Transformers 2.7.0 which includes the Bart and T5 Language Models!

    EasySummarizer #47

    • New module for summarizing documents. These support both the T5 and Bart pre-trained models provided by Hugging Face.
    • Helper objects for the easy module that can be run as standalone instances TransformersSummarizer

    EasyTranslator #49

    • New module for translating documents with T5 pre-trained models provided by Hugging Face.
    • Helper objects for the easy module that can be run as standalone instances TransformersTranslator

    Documentation and Tutorials #52

    • New Class API documentation for EasySummarizer and EasyTranslator
    • New tutorial guides, initial notebooks, and links to colab for the above as well
    • Readme provides quickstart samples that show examples from the notebooks #53

    Other

    • Dockerhub repo for adaptnlp-rest added here https://hub.docker.com/r/achangnovetta/adaptnlp-rest
    • Upgraded CircleCI allowing us to run #40
    • Added Nightly build #39
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Mar 6, 2020)

    Sequence Classification and Question Answering updates to integrate Hugging Face's public models.

    EasySequenceClassifier

    • Can now take Flair and Transformers pre-trained sequence classification models as input in the model_name_or_path param
    • Helper objects for the easy module that can be run as standalone instances TransformersSequenceClassifier FlairSequenceClassifier

    EasyQuestionAnswering

    • Can now take Transformers pre-trained sequence classification models as input in the model_name_or_path param
    • Helper objects for the easy module that can be run as standalone instances TransformersQuestionAnswering

    Documentation and Tutorials

    Documentation has been updated with the above implementations

    • Tutorials updated with better examples to convey changes
    • Class API docs updated
    • Tutorial notebooks updated
    • Colab notebooks better displayed on readme

    FastAPI Rest

    FastAPI updated to latest (0.52.0) FastAPI endpoints can now be stood up and deployed with any huggingface sequence classification or question answering model specified as an env var arg.

    Dependencies

    Transformers pinned for stable updates

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Feb 19, 2020)

    AdaptNLP's first published release on github.

    Easy API:

    • EasyTokenTagger
    • EasySequenceClassifier
    • EasyWordEmbeddings
    • EasyStackedEmbeddings
    • EasyDocumentEmbeddings

    Training and Fine-tuning Interface

    • SequenceClassifierTrainer
    • LMFineTuner

    FastAPI AdaptNLP App for Streamlined Rapid NLP-Model Deployment

    • adaptnlp/rest
    • configured to run any pretrained and custom trained flair/adaptnlp models
    • compatible with nvidia-docker for GPU use
    • AdaptNLP integration but loosely coupled

    Documentation

    • Documentation release with walk-through guides, tutorials, and Class API docs of the above
    • Built with mkdocs, material for mkdocs, and mkautodoc

    Tutorials

    • IPython/Colab Notebooks provided and updated to showcase AdaptNLP Modules

    Continuous Integration

    • CircleCI build and tests running successfully and minimally
    • Github workflow for pypi publishing added

    Formatting

    • Flake8 and Black adherence
    Source code(tar.gz)
    Source code(zip)
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 342 Jan 05, 2023
In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Med-VQA In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Mo

Kshitij Ambilduke 8 Apr 14, 2022
Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

2 Jul 05, 2022
[ICLR 2021 Spotlight] Pytorch implementation for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

RIDE: Long-tailed Recognition by Routing Diverse Distribution-Aware Experts. by Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu and Stella X. Yu at UC

Xudong (Frank) Wang 205 Dec 16, 2022
SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering. Contents Inst

0 Oct 21, 2021
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Max Woolf 3.1k Jan 07, 2023
Transformer training code for sequential tasks

Sequential Transformer This is a code for training Transformers on sequential tasks such as language modeling. Unlike the original Transformer archite

Meta Research 578 Dec 13, 2022
End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

Tiberiu Boros 214 Dec 07, 2022
Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

Demo Code for "Talking Head Anime from a Single Image 2: More Expressive" This repository contains demo programs for the Talking Head Anime

Pramook Khungurn 901 Jan 06, 2023
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 07, 2023
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
CATs: Semantic Correspondence with Transformers

CATs: Semantic Correspondence with Transformers For more information, check out the paper on [arXiv]. Training with different backbones and evaluation

74 Dec 10, 2021
Ray-based parallel data preprocessing for NLP and ML.

Wrangl Ray-based parallel data preprocessing for NLP and ML. pip install wrangl # for latest pip install git+https://github.com/vzhong/wrangl See exa

Victor Zhong 33 Dec 27, 2022
Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, Explosion AI 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 French 1.2.3 German 1.2

Explosion 70 Dec 12, 2022
CMeEE 数据集医学实体抽取

医学实体抽取_GlobalPointer_torch 介绍 思想来自于苏神 GlobalPointer,原始版本是基于keras实现的,模型结构实现参考现有 pytorch 复现代码【感谢!】,基于torch百分百复现苏神原始效果。 数据集 中文医学命名实体数据集 点这里申请,很简单,共包含九类医学

85 Dec 28, 2022
Persian Bert For Long-Range Sequences

ParsBigBird: Persian Bert For Long-Range Sequences The Bert and ParsBert algorithms can handle texts with token lengths of up to 512, however, many ta

Sajjad Ayoubi 63 Dec 14, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据,将清华新闻数据、搜狗新闻数据等新闻数据集,以及开源的一些摘要数据进行整理清洗,构建一个较完善的中文摘要数据集。 数据集清洗时,仅进行了简单地规则清洗。

logCong 785 Dec 29, 2022
Ecommerce product title recognition package

revizor This package solves task of splitting product title string into components, like type, brand, model and article (or SKU or product code or you

Bureaucratic Labs 16 Mar 03, 2022
Estimation of the CEFR complexity score of a given word, sentence or text.

NLP-Swedish … allows to estimate CEFR (Common European Framework of References) complexity score of a given word, sentence or text. CEFR scores come f

3 Apr 30, 2022