v-iashin

👤 Developer

7 repositories on SrcLog

7 Repos

1.5k Stars

237 Forks

1.5k Watchers

Repositories (7)

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)

Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)

Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)

PyTorch/Tensorflow solutions for Stanford's CS231n: "CNNs for Visual Recognition"

Personal webpage