Introducing Command R+: Our new, most powerful model in the Command R family.

Learn More

Cohere For AI - Guest Speaker: Muhammad Gohar Javed, Master's Student

other

Date: May 29, 2024

Time: 4:00 PM - 5:00 PM

Location: Online

About the speaker: Gohar is currently pursuing his Master's degree in Electrical and Computer Engineering at the University of Alberta, under the guidance of Dr. Li Cheng and Dr. Xingyu Li. His research focuses on applying Generative AI and Reinforcement Learning for 3D digital human motion control. With a solid background that includes two years of industry experience in Machine Learning and Computer Vision at Alta ML, Teradata and Hotpot.AI, Gohar has worked on projects involving Image Matting, Image Inpainting, and 3D Fault Detection. Prior to his Master's, he earned his Bachelor's degree in Electrical Engineering from the National University of Sciences and Technology, where he explored Neural Network Acceleration for Embedded Hardware.

About the session: We introduce MoMask, a novel masked modeling framework for text-driven 3D human motion generation. In MoMask, a hierarchical quantization scheme is employed to represent human motion as multi-layer discrete motion tokens with high-fidelity details. Starting at the base layer, with a sequence of motion tokens obtained by vector quantization, the residual tokens of increasing orders are derived and stored at the subsequent layers of the hierarchy. This is consequently followed by two distinct bidirectional transformers. For the base-layer motion tokens, a Masked Transformer is designated to predict randomly masked motion tokens conditioned on text input at training stage. During generation (i.e. inference) stage, starting from an empty sequence, our Masked Transformer iteratively fills up the missing tokens; Subsequently, a Residual Transformer learns to progressively predict the next-layer tokens based on the results from current layer. Extensive experiments demonstrate that MoMask outperforms the state-of-art methods on the text-to-motion generation task, with an FID of 0.045 (vs e.g. 0.141 of T2M-GPT) on the HumanML3D dataset, and 0.228 (vs 0.514) on KIT-ML, respectively. MoMask can also be seamlessly applied in related tasks without further model fine-tuning, such as text-guided temporal inpainting. abs: https://arxiv.org/abs/2312.00063

Add event to calendar

Apple Google Office 365 Outlook Outlook.com Yahoo