Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Last update: Apr 06, 2022

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

========================================================================

Author: Jonathan Kuo
Python: 3.6.1
TensorFlow: 1.0.1 Keras: 2.0.4

Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Introduction

The Keras deep learning architecture of this project was inspired by Deep Visual-Semantic Alignments for Generating Image Descriptions by Andrej Karpathy and Fei-Fei Li.

Given input of a dataset of images and their sentence descriptions, define a Keras (TensorFlow backend) deep learning model that corresponds detected regions on image with description segments. This learning allows the model to output novel descriptions for test images.

Dataset

Microsoft Common Objects in Context (MSCOCO) is an image recognition, segmentation, and captioning dataset. Training data includes 123,000 images and caption pairs. Validation and testing data are both 5,000 images and caption pairs.

Architecture

VGG16 CNN architecture (loaded in Keras) with pre-trained weights on ImageNet are used as the CNN to detect objects in the image. Then, the last dense softmax 200-classification layer was removed in order to pass the 4096-D activations into into the RNN (LSTM). CNN weights are frozen and RNN weights are updated in backpropagation through time (BPTT). The CNN and LSTM is merged before passing into a second LSTM to predict the next word in the sequence. RMSprop is used as the optimizer to combat the vanishing gradient problem.

Demo

View the demo iPython notebook for the model training and prediction on the MSCOCO dataset.

Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Related tags

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

Introduction

Dataset

Architecture

Demo

Owner

Aircraft design optimization made fast through modern automatic differentiation

Codes for CyGen, the novel generative modeling framework proposed in "On the Generative Utility of Cyclic Conditionals" (NeurIPS-21)

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

FCN (Fully Convolutional Network) is deep fully convolutional neural network architecture for semantic pixel-wise segmentation

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

Source code for CAST - Crisis Domain Adaptation Using Sequence-to-sequence Transformers (Accepted to ISCRAM 2021, CorePaper).

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

Example for AUAV 2022 with obstacle avoidance.

Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

Codebase for Image Classification Research, written in PyTorch.

Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge

Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds (Local-Lip)

Learning to Stylize Novel Views

Repositorio oficial del curso IIC2233 Programación Avanzada 🚀✨

A framework for Quantification written in Python

Easy and comprehensive assessment of predictive power, with support for neuroimaging features

Py4fi2nd - Jupyter Notebooks and code for Python for Finance (2nd ed., O'Reilly) by Yves Hilpisch.

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Styled Augmented Translation