University of Verona Researchers Introduce ‘SEAM Match-RCNN’ and ‘MovingFashion’ Dataset For Retrieving e-Fashion in Social Media Videos Using Computer Vision

2294
Source: https://arxiv.org/pdf/2110.02627v1.pdf

The increasing use of social media has led to an exciting new trend in e-fashion known as ‘video-to shop.’ The idea is that videos containing specific clothing items can be matched up against images, potentially from e-commerce databases. The power to identify what social influencers are wearing is an incredible tool that can be used in advertising. With one glance, you could turn any video into priceless commercials and make it much easier for brands to reach their desired clients with minimal effort. 

Video-to Shop allows for an increase in available information by adding additional frames as probes. However, this data could be noisy due to challenging illumination, drastic zooming, human poses, and multiple people appearing or disappearing during the video recording process, leading to challenges when it comes to image analysis techniques used on these videos. Another problem with this system is that it needs millions of training data points for the video-to-shop linkages to be accurate.

University of Verona researchers have developed a dataset called ‘MovingFashion’ (composed of 14855 social videos) that is the first publicly available one to cope with this challenge, and they also present an innovative baseline called SEAM Match-RCNN.

The MovingFashion dataset comprises ∼15K different video sequences, each one related with at least one shop image. The videos were obtained from fashion e-shop Net-A-Porter (10132 videos) along with the social media platforms like Instagram and TikTok (4723 videos) and contain hundreds of frames per item – which have been partitioned into Regular or Hard setups for analysis purposes (ripped outfits). 

Apart from the dataset, the researchers also introduce SElf-Attention Multiframe (SEAM) Match-RCNN, a video to shop baseline which individuates products and extracts features in street videos by applying feature collection and aggregation mechanisms. It then matches the product over a “shop” image gallery. SEAM Match-RCNN takes state-of-the-art (Match-RCNN) to a new level by applying image-to video domain adaptation with an innovative Multi Frame Matching Head.

https://arxiv.org/pdf/2110.02627v1.pdf

The researchers explain that their proposed SEAM Match-RCNN, trained on the new MovingFashion dataset, provides a solid baseline to show video-to-shop matching can be performed in real-time like TikTok. This might unveil big fashion trends directly from social platforms and attract more attention for big players who wish they knew what “big” refers to before investing heavily into TV ads or print advertisements. 

Paper: https://arxiv.org/pdf/2110.02627v1.pdf

Github:https://github.com/humaticslab/seam-match-rcnn