Image captioning

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Model is seq2seq model. In the encoder pretrained EfficientNet-b3 model is used to extract the features. Decoder is the LSTM with the Bahdanau Attention.

Dataset

The dataset is available at kaggle and contains 8,000 images that are each paired with five different captions.

Usage

run in terminal: python -m img_caption

Config

The user interface consists of file:

config.yaml - general configuration with data and model parameters

Default config.yaml:

data:
  path_to_data_folder: "data"
  caption_file_name: "captions.txt"
  images_folder_name: "Images"
  output_folder_name: "output"
  logging_file_name: "logging.txt"
  model_file_name: "model.pt"

batch_size: 32
num_worker: 1
gensim_model_name: "glove-wiki-gigaword-200"

model:
  embedding_dimension: 200
  decoder_hidden_dimension: 300
  learning_rate: 0.0001
  momentum: 0.9
  n_epochs: 50
  clip: 5
  fine_tune_encoder: false

Output

After training the model, the pipeline will return the following files:

model.pt - checkpoint with:
- epoch - last epoch
- model_state_dict - model parameters
- optimizer_state_dict - the state of the optimizer
- train_history - training history from a model
- valid_history - validation history from a model
- best_valid_loss - the best validation loss

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Related tags

Overview

Image captioning

Dataset

Usage

Config

Output

Owner

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Python library for interactive topic model visualization. Port of the R LDAvis package.

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

Journalism AI – Quotes extraction for modular journalism

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

A natural language processing model for sequential sentence classification in medical abstracts.

Making text a first-class citizen in TensorFlow.

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

Twitter Sentiment Analysis using #tag, words and username

🎐 a python library for doing approximate and phonetic matching of strings.

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

kochat

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

Natural Language Processing

A minimal Conformer ASR implementation adapted from ESPnet.

auto_code_complete is a auto word-completetion program which allows you to customize it on your need

Count the frequency of letters or words in a text file and show a graph.