The focus of this PhD will be on neural sequence-to-sequence models and their application to several modalities, mainly image and text for tasks such as machine translation and image captioning.