Abstract: Recently, video recognition is emerging with the help of multi-modal learning, which focuses on integrating distinct modalities to improve the performance or robustness of the model.
Abstract: The aim of the violent recognition task is to determine whether a video contains violent behaviors. Given that violent behavior often comes with visual and audio anomalies, multimodal ...