This page serves as the online companion to Yiwei Zhang’s final project for the course CS 639: Computer Vision in Fall 2020.

Neural Style Transfer

The term image style transfer, or alternatively  image stylization, refers to the image processing task that extracts the feature of style of a set of images, and apply such style onto another set of images. Although this kind of style transfer operation has achieved good performance when the to-be-stylized images are landscape photographs, it sometimes meet limitations when the photographs contain other elements, for example, a photo that contains a portrait. By noticing this issue, this peoject comes up with the idea that developing a better image style transfer method for photograph that contains both portrait and landscape. Applying this image style transfer method to Zoom videos, we can get a video style tansfer.

Style

loo

Content

loo

Result

loo


Motivation

Most of us are having courses online this semester. Some of the courses are held upon Zoom and involve video chat. There are two modes of virtual background in Zoom.

Current Zoom Background

a

The first one suppose you have a flat and solid color background. It will replace the major color of your backgound to a static picture. The problem with this mode is that it will face problems when the background is messy.

loo

b

The second one involves the process of segmenting the portrait from the background. It will detect the person segmentation and replace the rest with a static picture. The problem with this mode is that it loses the detail of the background and is not fun or dynamic. loo

Goal

The goal of this project is to propose and implement a video style transfer method that can transfer the style of video background while keeping the portrait.


Approach

  1. Dump frames and audio file from the video file.

  2. Generate person segmentation masks.

  3. Apply neural style transfer on dumped frames.

  4. Combine masks and stylized frames, mask out person or background.

  5. Create result videos.


Implementation

Pipeline

Firstly, the person segmentation mask can be obtained using the state-of-the-art method in Pytorch. Secondly, the transfer result for the whole picture can be obtained using the style transfer model. Finally, combining the mask with the stylized frame, we can get the final result.

loo

Neural Style Transfer Model

loo


Results

Original Video

Fully Transfered Video

Partially Transfered Video

Combined Result Video


Future Work

  • Apply image blending near the boundary.
  • Increase inference speed and decrease computation cost for real-time deployment.

Presentation

Presentation material link


Code

Code link


Footnotes

  1. Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual Losses for Real-Time Style Transfer and Super-Resolution. ECCV.

  2. Chen, L., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking Atrous Convolution for Semantic Image Segmentation. ArXiv, abs/1706.05587.

  3. Chen, L., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ECCV.

  4. Shen, X., Hertzmann, A., Jia, J., Paris, S., Price, B.L., Shechtman, E., & Sachs, I. (2016). Automatic Portrait Segmentation for Image Stylization. Computer Graphics Forum, 35.

  5. Gatys, L.A., Ecker, A.S., & Bethge, M. (2016). Image Style Transfer Using Convolutional Neural Networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2414-2423.