Image Captioning using Deep Learning
Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions. Image Captioning is an important computer vision problem with a multitude of applications. This thesis work aims to replicate a famous paper and improve on it. Attention based models are used to describe the content of the images. The model is trained in a deterministic manner using standard back-propagation techniques. We also show through visualization how the model uses attention layers to automatically learn to fix its gaze on salient objects while generating the corresponding words in the output sequence. This thesis also shows the use of several different hyper-parameters and networks to improve upon the existing work.
Concepts :
- Convolution Recurrent Neural Network(CRNN)
- Attention Networks
- Long Short Term Memory(LSTM)
- Image Captioning