Seminar Report on Multimodal Deep Learning | PDF

Download PDF Seminar Report on the topic Multimodal Deep Learning submitted in partial fulfillment of the requirements for the award of bachelor of technology in computer science and engineering. The PDF seminar report mostly focuses on deep learning models for multimodal deep learning. The group of multimodal deep learning approaches that were discussed is all based on Restricted Boltzmann Machines (RBMs).

Seminar Report on Multimodal Deep Learning

Abstract of Seminar Report on Multimodal Deep Learning

Deep learning is a new area of machine learning research that imitates the way the human brain works. It has a great number of successful applications in speech recognition, image classification, and natural language processing. It is a particular approach to build and train neural networks. A deep neural network consists of a hierarchy of layers, whereby each layer transforms the input data into more abstract representations. 
Deep networks have been successfully applied to unsupervised and supervised feature learning for single modalities like text, images or audio. As the developments in technology, an application of deep networks to learn features over multiple modalities has surfaced. It involves relating information from multiple sources. The relevance of multi-modality has enhanced tremendously due to the extensive use of social media and online advertising. Social media has been a convenient platform for voicing opinions from posting messages to uploading a media fille, or any combination of messages. There are a number of methods that can be used for multimodal deep learning, but the most efficient one is Deep Boltzmann Machine (DBM). The DBM is a fully generative model which can be utilized for extracting features from data with certain missing modalities. DBM is constructed by stacking one Gaussian RBM and one standard binary RBM. An RBM has three components: visible layer, hidden layer, and a weight matrix containing the weights of the connections between visible and hidden units. There are no connections between the visible units or between the hidden units. That is the reason why this model is called restricted.

Multimodal Deep Learning: Seminar Report PDF

This pdf seminar report presented a deep model for learning multimodal signals coupled with emotions and semantics. Multimodal sensing and processing have shown promising results in detection, recognition, and identification in various applications, such as human-computer interaction, surveillance, medical diagnosis, biometrics, etc. There are many ways to generate multiple modalities; one is via sensor diversity (especially in our everyday life tasks) and the other is via feature diversity (using engineered and/or learned features). In the last few decades, many machine learning models have been proposed to deal with multimodal data. The pdf seminar report shows encouraging results obtained when applying the deep features for cross-modal retrieval, which is not possible for hand-crafted features.

Download PDF Seminar Report on Multimodal Deep Learning