Saved searches
Use saved searches to filter your results more quickly
Cancel Create saved search
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
codehacpj/Image_description
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Switch branches/tags
Branches Tags
Could not load branches
Nothing to show
Could not load tags
Nothing to show
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Cancel Create
- Local
- Codespaces
HTTPS GitHub CLI
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI.
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
Latest commit message
Commit time
README.md
Project : Image Captioning
Description
In this project we combine Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) knowledge to build a deep learning model that produces captions given an input image.
Image captioning requires that you create a complex deep learning model with two components: a CNN that transforms an input image into a set of features, and an RNN that turns those features into rich, descriptive language.
One such example of how this architecture performs is pictured below:
Files
- Notebook 0 : Explore MS COCO dataset using COCO API
- Notebook 1 : Load and pre-process data from the MS COCO dataset and design the CNN-RNN model for automatically generating image captions
- Notebook 2 : Training phase of the CNN-RNN model
- Notebook 3 : Using the previously trained model to generate captions for images in the test dataset.
- data_loader.py : Custom data loader for PyTorch combining the dataset and the sampler
- vocabulary.py : Vocabulary constructor built from the captions in the training dataset
- vocab.pkl : Vocabulary file stored to load it immediately from the data loader
CNN Encoder
The encoder is based on a Convolutional neural network that encodes an image into a compact representation.
The CNN-Encoder is a ResNet (Residual Network). These kind of network help regarding to the vanishing and exploding gradient type of problems. The main idea relies on the use of which allows to take the activations from one layer and suddenly feed it to another layer, even much deeper in the neural network and using that, we can build ResNets which enables to train very deep networks. In this project I used the ResNet-152 pre-trained model, which among those available from PyTorch : https://pytorch.org/docs/master/torchvision/models.html , is the one that is performing best on the ImageNet dataset.
This might seem unrelated to judge the architecture to use in the encoder based on the accuracy on a totally different dataset, but what I found interesting is that in section 5.2. Evaluation Procedures of the paper Neural Image Caption Generation with Visual Attention (2015) , the authors found that using more recent architectures such as GoogLeNet (Inception) (winner ILSVRC 2014) or Oxford VGG (3rd place ILSVRC 2014) can give a boost in performance over using the AlexNet (winner ILSVRC 2012). So encoder architecture matter !
RNN Decoder
The CNN encoder is followed by a recurrent neural network that generates a corresponding sentence.
The RNN-Decoder consists in a followed by , this architecture was presented from the paper Show and Tell: A Neural Image Caption Generator (2014) https://arxiv.org/pdf/1411.4555.pdf (figure 3.1)
CNN-RNN model
Now that we have our chosen architecture for the encoder and the decoder, we can look at the whole picture of our image captioning system !
By merging the CNN encoder and the RNN decoder, we can get a model that can find patterns in images and then use that information to help generate a description of those images. The input image will be processed by a CNN and we will connect the output of the CNN to the input of the RNN which will allow us to generate descriptive text.
Описания изображений
Визуальное распознавание ИИ Azure может анализировать изображение и создавать удобочитаемую фразу, описывающую его содержимое. Алгоритм возвращает несколько описаний, основанных на различных визуальных характеристиках. Каждому описанию присваивается оценка достоверности. Готовые выходные данные представлены в виде списка описаний, упорядоченных по оценке достоверности (от самой высокой до самой низкой).
На данный момент английский является единственным поддерживаемым языком в описаниях изображений.
Быстро и легко опробуйте возможности добавления подписей к изображениям в браузере с помощью Vision Studio.
Пример описания изображения
В приведенном ниже ответе JSON показано, что именно возвращает API анализа при описании примера изображения на основе его визуальных характеристик.
< "description":< "tags":[ "outdoor", "city", "white" ], "captions":[ < "text":"a city with tall buildings", "confidence":0.48468858003616333 >] >, "requestId":"7e5e5cac-ef16-43ca-a0c4-02bd49d379e9", "metadata":< "height":300, "width":239, "format":"Png" >, "modelVersion":"2021-05-01" >
Использование API
Функция описания изображений является частью API Анализа изображений. Вы можете вызывать этот API с помощью собственного пакета SDK или с помощью вызовов REST. Включите Description в параметр запроса visualFeatures. Затем, когда вы получите полный ответ JSON, выполните синтаксический анализ строки для содержимого «description» раздела.
При подготовке материала использовались источники:
https://github.com/codehacpj/Image_description
https://learn.microsoft.com/ru-RU/azure/cognitive-services/Computer-vision/concept-describing-images