Martin Renqiang Min
Martin Renqiang Min, NEC Laboratories America, Inc.
Deep learning models have achieved revolutionary successes in many real-world applications, leveraging a huge amount of training data to generate powerful representations. Rather than training standard end-to-end supervised models, learning adaptive models with explicit or implicit attention mechanisms capable of understanding different input contexts is one-step closer towards building self-aware machine reasoning systems. In this talk, I will discuss adaptive deep representation learning to understand the interactions between video and text. First I will talk about how to generate dual adaptive spatiotemporal feature representations for translating videos to natural language descriptions. Then I will discuss how to generate high-quality videos from text. Finally, I will present our recent work on analyzing why attention mechanisms that play key roles in generating adaptive representations work in deep neural networks.