Saved searches

Use saved searches to filter your results more quickly

Cancel Create saved search

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

codehacpj/Image_description

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Switch branches/tags
Branches Tags
Could not load branches
Nothing to show
Could not load tags
Nothing to show

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Cancel Create

Local
Codespaces

HTTPS GitHub CLI
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI.

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

Latest commit message
Commit time

README.md

Project : Image Captioning

Description

In this project we combine Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) knowledge to build a deep learning model that produces captions given an input image.

Image captioning requires that you create a complex deep learning model with two components: a CNN that transforms an input image into a set of features, and an RNN that turns those features into rich, descriptive language.

One such example of how this architecture performs is pictured below:

Files

Notebook 0 : Explore MS COCO dataset using COCO API
Notebook 1 : Load and pre-process data from the MS COCO dataset and design the CNN-RNN model for automatically generating image captions
Notebook 2 : Training phase of the CNN-RNN model
Notebook 3 : Using the previously trained model to generate captions for images in the test dataset.
data_loader.py : Custom data loader for PyTorch combining the dataset and the sampler
vocabulary.py : Vocabulary constructor built from the captions in the training dataset
vocab.pkl : Vocabulary file stored to load it immediately from the data loader

CNN Encoder

The encoder is based on a Convolutional neural network that encodes an image into a compact representation.

The CNN-Encoder is a ResNet (Residual Network). These kind of network help regarding to the vanishing and exploding gradient type of problems. The main idea relies on the use of which allows to take the activations from one layer and suddenly feed it to another layer, even much deeper in the neural network and using that, we can build ResNets which enables to train very deep networks. In this project I used the ResNet-152 pre-trained model, which among those available from PyTorch : https://pytorch.org/docs/master/torchvision/models.html , is the one that is performing best on the ImageNet dataset.

This might seem unrelated to judge the architecture to use in the encoder based on the accuracy on a totally different dataset, but what I found interesting is that in section 5.2. Evaluation Procedures of the paper Neural Image Caption Generation with Visual Attention (2015) , the authors found that using more recent architectures such as GoogLeNet (Inception) (winner ILSVRC 2014) or Oxford VGG (3rd place ILSVRC 2014) can give a boost in performance over using the AlexNet (winner ILSVRC 2012). So encoder architecture matter !

RNN Decoder

The CNN encoder is followed by a recurrent neural network that generates a corresponding sentence.

The RNN-Decoder consists in a followed by , this architecture was presented from the paper Show and Tell: A Neural Image Caption Generator (2014) https://arxiv.org/pdf/1411.4555.pdf (figure 3.1)

CNN-RNN model

Now that we have our chosen architecture for the encoder and the decoder, we can look at the whole picture of our image captioning system !

By merging the CNN encoder and the RNN decoder, we can get a model that can find patterns in images and then use that information to help generate a description of those images. The input image will be processed by a CNN and we will connect the output of the CNN to the input of the RNN which will allow us to generate descriptive text.

Описания изображений

Визуальное распознавание ИИ Azure может анализировать изображение и создавать удобочитаемую фразу, описывающую его содержимое. Алгоритм возвращает несколько описаний, основанных на различных визуальных характеристиках. Каждому описанию присваивается оценка достоверности. Готовые выходные данные представлены в виде списка описаний, упорядоченных по оценке достоверности (от самой высокой до самой низкой).

На данный момент английский является единственным поддерживаемым языком в описаниях изображений.

Быстро и легко опробуйте возможности добавления подписей к изображениям в браузере с помощью Vision Studio.

Пример описания изображения

В приведенном ниже ответе JSON показано, что именно возвращает API анализа при описании примера изображения на основе его визуальных характеристик.

Черно-белая фотография здания на Манхэттене

 < "description":< "tags":[ "outdoor", "city", "white" ], "captions":[ < "text":"a city with tall buildings", "confidence":0.48468858003616333 >] >, "requestId":"7e5e5cac-ef16-43ca-a0c4-02bd49d379e9", "metadata":< "height":300, "width":239, "format":"Png" >, "modelVersion":"2021-05-01" >

Использование API

Функция описания изображений является частью API Анализа изображений. Вы можете вызывать этот API с помощью собственного пакета SDK или с помощью вызовов REST. Включите Description в параметр запроса visualFeatures. Затем, когда вы получите полный ответ JSON, выполните синтаксический анализ строки для содержимого «description» раздела.

При подготовке материала использовались источники:
https://github.com/codehacpj/Image_description
https://learn.microsoft.com/ru-RU/azure/cognitive-services/Computer-vision/concept-describing-images

Image description что это за программа

Saved searches

Use saved searches to filter your results more quickly

codehacpj/Image_description

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

Project : Image Captioning

Description

Files

CNN Encoder

RNN Decoder

CNN-RNN model

Описания изображений

Пример описания изображения

Использование API