Awesome Data Augmentation

A set of awesome content about Data Augmentation for Deep Learning and other stuff!!!

View project on GitHub

Awesome Augmentations

Pixel-level Transforms

Arithmetic

Add a value to all pixels in an image. Source: Imgaug

Original

Augmentation
Add values to the pixels of images with possibly different values for neighbouring pixels. Source: Imgaug

Original

Augmentation
Add impulse noise to images. Source: Imgaug

Original

Augmentation
Replace pixels in an image with new values. Source: Imgaug

Original

Augmentation
Fill one or more rectangular areas in an image using a fill mode. See paper “Improved Regularization of Convolutional Neural Networks with Cutout” by DeVries and Taylor. Source: Imgaug

Original

Augmentation
Invert the input image by subtracting pixel values from 255. Source: Albumentations

Original

Augmentation
Invert all pixel values above a threshold. Source: Albumentations

Original

Augmentation
Add noise sampled from gaussian distributions elementwise to images. Source: Imgaug

Original

Augmentation
Add noise sampled from laplace distributions elementwise to images. Source: Imgaug

Original

Augmentation
Add noise sampled from poisson distributions elementwise to images. Source: Imgaug

Original

Augmentation
Multiply all pixels in an image with a specific value, thereby making the image darker or brighter. Source: Imgaug

Original

Augmentation
Multiply values of pixels with possibly different values for neighbouring pixels, making each pixel darker or brighter. Source: Imgaug

Original

Augmentation
Augmenter that sets a certain fraction of pixels in images to zero. Source: Imgaug

Original

Augmentation
CoarseDropout of the rectangular regions in the image. Source: Albumentations

Original

Augmentation
Randomly Drop Channels in the input Image. Source: Albumentations

Original

Augmentation
Source: Albumentations

Original

Augmentation
Replace pixels in images with salt noise, i.e. white-ish pixels. Source: Imgaug

Original

Augmentation
Replace rectangular areas in images with white-ish pixel noise. Source: Imgaug

Original

Augmentation
Replace pixels in images with pepper noise, i.e. black-ish pixels. Source: Imgaug

Original

Augmentation
Replace rectangular areas in images with black-ish pixel noise. Source: Imgaug

Original

Augmentation
Replace pixels in images with salt/pepper noise (white/black-ish colors). Source: Imgaug

Original

Augmentation
Replace rectangular areas in images with white/black-ish pixel noise. Source: Imgaug

Original

Augmentation

Artistic

Convert the style of images to a more cartoonish one. Source: Imgaug

Original

Augmentation

Blend

Alpha-blend two image sources using an alpha/opacity value. Source: Imgaug

Original

Augmentation
Alpha-blend two image sources using simplex noise alpha masks. Source: Imgaug

Original

Augmentation
Blend images from two branches using colorwise masks. Source: Imgaug

Original

Augmentation
Blend images from two branches along a vertical linear gradient. Source: Imgaug

Original

Augmentation
Blend images from two branches along a horizontal linear gradient. Source: Imgaug

Original

Augmentation
Blend images from two branches according to a regular grid. Source: Imgaug

Original

Augmentation
Alpha-blend two image sources using alpha/opacity values sampled per pixel. Source: Imgaug

Original

Augmentation
Alpha-blend two image sources using non-binary masks generated per image. Source: Imgaug

Original

Augmentation

Blur

Blur the input image using a random-sized kernel. Source: Albumentations

Original

Augmentation
Blur the input image using a Gaussian filter with a random kernel size. Source: Albumentations

Original

Augmentation
Blur the input image using a median filter with a random aperture linear size. Source: Albumentations

Original

Augmentation
Blur images in a way that fakes camera or object movements. Source: Imgaug

Original

Augmentation
Blur an image by computing simple means over neighbourhoods. Source: Imgaug

Original

Augmentation
Bilateral filters blur homogenous and textured areas, while trying to preserve edges. Source: Imgaug

Original

Augmentation
Apply a pyramidic mean shift filter to each image. Source: Imgaug

Original

Augmentation
Apply glass noise to the input image. Source: Albumentations

Original

Augmentation
Source: Imgaug

Original

Augmentation
Source: Imgaug

Original

Augmentation

Color

Multiply the hue of images by random values. Source: Imgaug

Original

Augmentation
Add random values to the hue of images. Source: Imgaug

Original

Augmentation
Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater than 127, invert the resulting grayscale image. Source: Albumentations

Original

Augmentation
Reduce the number of bits for each color channel. Source: Albumentations

Original

Augmentation
Applies sepia filter to the input RGB image. Source: Albumentations

Original

Augmentation
Augment RGB image using FancyPCA from Krizhevsky's paper "ImageNet Classification with Deep Convolutional Neural Networks". Source: Albumentations

Original

Augmentation
Randomly shift values for each channel of the input RGB image. Source: Albumentations

Original

Augmentation
Multipy hue and saturation by random values. Source: Imgaug

Original

Augmentation
Multiply the saturation of images by random values. Source: Imgaug

Original

Augmentation
Add random values to the saturation of images. Source: Imgaug

Original

Augmentation
Decrease the saturation of images by varying degrees. Source: Imgaug

Original

Augmentation
Source: Imgaug

Original

Augmentation
Increases or decreases hue and saturation by random values. Source: Imgaug

Original

Augmentation
Randomly change hue, saturation and value of the input image. Source: Albumentations

Original

Augmentation
Source: Imgaug

Original

Augmentation
Multiply the brightness channels of input images. Source: Imgaug

Original

Augmentation
Add to the brightness channels of input images. Source: Imgaug

Original

Augmentation
Multiply and add to the brightness channels of input images. Source: Imgaug

Original

Augmentation
Randomly change brightness and contrast of the input image. Source: Albumentations

Original

Augmentation
Change the temperature to a provided Kelvin value. Source: Imgaug

Original

Augmentation
Quantize colors using k-Means clustering. Source: Imgaug

Original

Augmentation
Quantize colors into N bins with regular distance. Source: Imgaug

Original

Augmentation

Randomly rearrange channels of the input RGB image. Source: Albumentations

Original

Augmentation

Contrast

Source: Imgaug

Original

Augmentation
Apply Contrast Limited Adaptive Histogram Equalization to the input image. Source: Albumentations

Original

Augmentation
Apply CLAHE to all channels of images in their original colorspaces. Source: Imgaug

Original

Augmentation
Source: Albumentations

Original

Augmentation
Adjust image contrast by scaling pixel values to 255*((v/255)**gamma). Source: Imgaug

Original

Augmentation
Adjust image contrast to 255*1/(1+exp(gain*(cutoff-I_ij/255))). Source: Imgaug

Original

Augmentation
Adjust image contrast by scaling pixels to 255*gain*log_2(1+v/255). Source: Imgaug

Original

Augmentation
Adjust contrast by scaling each pixel to 127 + alpha*(v-127). Source: Imgaug

Original

Augmentation
Equalize the image histogram. Source: Albumentations

Original

Augmentation
Apply Histogram Eq. to L/V/L channels of images in HLS/HSV/Lab colorspaces. Source: Imgaug

Original

Augmentation
Apply Histogram Eq. to all channels of images in their original colorspaces. Source: Imgaug

Original

Augmentation

Compression

Decrease Jpeg, WebP compression of an image. Source: Albumentations

Original

Augmentation
Decreases image quality by downscaling and upscaling back. Source: Albumentations

Original

Augmentation
Source: Imgaug

Original

Augmentation

Convolutional

Augmenter that sharpens images and overlays the result with the original image. Source: Imgaug

Original

Augmentation
Augmenter that embosses images and overlays the result with the original image. Source: Albumentations

Original

Augmentation
Apply a Convolution to input images. Source: Imgaug

Original

Augmentation

Corruption

Apply gaussian noise to the input image. Source: Albumentations

Original

Augmentation
Multiply image to random number or array of numbers. Source: Albumentations

Original

Augmentation
Apply camera sensor noise. Source: Albumentations

Original

Augmentation
Source: Imgaug

Original

Augmentation
Source: Imgaug

Original

Augmentation
Simulates shadows for the image. Source: Albumentations

Original

Augmentation
Simulates Sun Flare for the image. Source: Albumentations

Original

Augmentation
Source: Imgaug

Original

Augmentation

Edges

Apply a canny edge detector to input images. Source: Imgaug

Original

Augmentation
Augmenter that detects all edges in images, marks them in a black and white image and then overlays the result with the original image. Source: Imgaug

Original

Augmentation
Augmenter that detects edges that have certain directions and marks them in a black and white image and then overlays the result with the original image. Source: Imgaug

Original

Augmentation

Pooling

Apply average pooling to images. Source: Imgaug

Original

Augmentation
Apply max pooling to images. Source: Imgaug

Original

Augmentation
Apply minimum pooling to images. Source: Imgaug

Original

Augmentation
Apply median pooling to images. Source: Imgaug

Original

Augmentation

Segmentation

Completely or partially transform images to their superpixel representation. Source: Imgaug

Original

Augmentation
Average colors of an image within Voronoi cells. Source: Imgaug

Original

Augmentation
Uniformly sample Voronoi cells on images and average colors within them. Source: Imgaug

Original

Augmentation
Sample Voronoi cells from regular grids and color-average them. Source: Imgaug

Original

Augmentation
Sample Voronoi cells from image-dependent grids and color-average them. Source: Imgaug

Original

Augmentation

Weather

Simulates fog for the image. Source: Imgaug

Original

Augmentation
Adds rain effects. Source: Albumentations

Original

Augmentation
Source: Imgaug

Original

Augmentation
Source: Imgaug

Original

Augmentation
Add falling snowflakes to images. Source: Imgaug

Original

Augmentation
Source: Imgaug

Original

Augmentation
Add clouds to images. Source: Imgaug

Original

Augmentation

Spatial-level transforms

Affine

Augmenter to apply affine transformations to images. Source: Imgaug

Original

Augmentation
Apply affine transformations that differ between local neighbourhoods. Source: Imgaug

Original

Augmentation
Randomly apply affine transforms: translate, scale and rotate the input. Source: Albumentations

Original

Augmentation
Apply affine scaling on the x-axis to input data. Source: Imgaug

Original

Augmentation
Apply affine scaling on the y-axis to input data. Source: Imgaug

Original

Augmentation
Apply affine translation on the x-axis to input data. Source: Imgaug

Original

Augmentation
Apply affine translation on the y-axis to input data. Source: Imgaug

Original

Augmentation

Crop

Crop region from image. Source: Albumentations

Original

Augmentation
Crop images down to a fixed maximum width/height. Source: Imgaug

Original

Augmentation
Crop images down until their height/width is a multiple of a value. Source: Imgaug

Original

Augmentation
Crop images until their height/width is a power of a base. Source: Imgaug

Original

Augmentation
Crop images until their width/height matches an aspect ratio. Source: Imgaug

Original

Augmentation
Crop images until their width and height are identical. Source: Imgaug

Original

Augmentation
Crop the central part of the input. Source: Albumentations

Original

Augmentation
Take a crop from the center of each image. Source: Imgaug

Original

Augmentation
Crop images equally on all sides until H/W are multiples of given values. Source: Imgaug

Original

Augmentation
Crop images until their height/width is a power of a base. Source: Imgaug

Original

Augmentation
Crop images until their width/height matches an aspect ratio. Source: Imgaug

Original

Augmentation
Crop images until their width and height are identical. Source: Imgaug

Original

Augmentation
Crop a random part of the input. Source: Albumentations

Original

Augmentation
Crop/pad images by pixel amounts or fractions of image sizes. Source: Imgaug

Original

Augmentation
Torchvision's variant of crop a random part of the input and rescale it to some size. Source: Albumentations

Original

Augmentation
Crop a random part of the input and rescale it to some size. Source: Albumentations

Original

Augmentation

Distortion

Source: Albumentations

Original

Augmentation
Source: Albumentations

Original

Augmentation
Random shuffle grid's cells on image. Source: Albumentations

Original

Augmentation
Source: Imgaug

Original

Augmentation
Source: Imgaug

Original

Augmentation
Augmenter that applies other augmenters in a polar-transformed space. Source: Imgaug

Original

Augmentation
Move cells within images similar to jigsaw patterns. Source: Imgaug

Original

Augmentation

Flip

Flip the input either horizontally, vertically or both horizontally and vertically. Source: Albumentations

Original

Augmentation
Flip the input horizontally around the y-axis. Source: Albumentations

Original

Augmentation
Flip the input vertically around the x-axis. Source: Albumentations

Original

Augmentation
Flip/mirror input images horizontally. Source: Imgaug

Original

Augmentation
Flip/mirror input images vertically. Source: Imgaug

Original

Augmentation

Pad

Pad images to minimum width/height. Source: Imgaug

Original

Augmentation
Pad images until their height/width is a multiple of a value. Source: Imgaug

Original

Augmentation
Pad images until their height/width is a power of a base. Source: Imgaug

Original

Augmentation
Pad images until their width/height matches an aspect ratio. Source: Imgaug

Original

Augmentation
Pad images until their height and width are identical. Source: Imgaug

Original

Augmentation
Pad images equally on all sides up to given minimum heights/widths. Source: Imgaug

Original

Augmentation
Pad images equally on all sides until H/W are multiples of given values. Source: Imgaug

Original

Augmentation
Pad images equally on all sides until H/W is a power of a base. Source: Imgaug

Original

Augmentation
Pad images equally on all sides until H/W matches an aspect ratio. Source: Imgaug

Original

Augmentation
Pad images equally on all sides until their height & width are identical. Source: Imgaug

Original

Augmentation

Rotate

Rotate the input by an angle selected randomly from the uniform distribution. Source: Albumentations

Original

Augmentation
Randomly rotate the input by 90 degrees zero or more times. Source: Albumentations

Original

Augmentation
Transpose the input by swapping rows and columns. Source: Albumentations

Original

Augmentation

Size

Resize the input to the given height and width. Source: Albumentations

Original

Augmentation
Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image. Source: Albumentations

Original

Augmentation
Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image. Source: Albumentations

Original

Augmentation

Awesome Articles:

A list of awesome articles and tutorials for easy understanding of deep learning and data augmentation!

Awesome Libraries and Frameworks

A list of awesome deep learning libraries and frameworks in python!

Awesome Surveys:

A list of awesome surveys in many different subjects of deep learning!

2019

- Link: Springer
- Authors: Connor Shorten and Taghi M. Khoshgoftaar
- Link: Arxiv
- Authors: Zhengwei Wang, Qi She, Tomas E. Ward

2020

- Link: Arxiv
- Authors: Abdul Jabbar, Xi Li, Bourahla Omar

Awesome Papers:

2016

- Link: Arxiv
- Authors: Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros
- Code: Github
- Project: http://efrosgans.eecs.berkeley.edu/iGAN/
The authors proposed a method to help non-artistic users to modify ou create new images with simple image transformations. First, the input image is converted in the closest feature vector in the latent space of a Generative Adversarial Network (GAN). Then, through a user-friendly interface, the user can apply some transformations like color change, sketching, and warping, into the image. The feature vector in the latent space is adjusted to match those transformations. In the end, a new photo-realist is generated, with the object in the original image transformed according to the user's modifications.

2017

- Link: Arxiv
- Authors: Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
- Code: Github, Github
The proposed approach is a Generative Adversarial Network (GAN), called Cycle-Gan, that learns how to translate images from a source domain to a target domain. For example, it can translate images containing horses into images containing zebras, it can change the weather, and also can translate images with a painting style to another. The advantage of the proposed method over its predecessors is that the Cycle-Gan doesn't need paired training data to achieve the proposed task.
- Link: Arxiv
- Authors: Terrance DeVries, Graham W. Taylor
- Code: Github
The proposed method, cutout, is a data augmentation technique that consists of removing square regions of the image by placing a grey square mask in random positions in the image. This augmentation technique forces the network to learn the object representation even when it has some occlusion while reduces the network overfitting.
- Link: Arxiv
- Authors: Xun Huang, Serge Belongie
- Code: Github, Github
The authors proposed a neural network to perform style transfer between images. That is, the proposed method takes an image with a real scenario and outputs the same image as it was painting. The proposed network uses a VGG-19 as an encoder, to extract features from both, real image A and a painting example B. Then, the features enter in the Adaptive Instance Normalization (AdaIn), proposed by the authors, to perform style transfer between the images. The output of the AdaIn enters the decoder to generate the image with the new painting style. This decoder uses another encoder, that also uses a VGG-net, to learn how to perform such conversion.
- Link: Arxiv
- Authors: Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, Yi Yang
- Code: Github
In the proposed data augmentation, a rectangle is positioned on top of training images. The size and position of the rectangle are random, that is, it can be of any size and position of the image. The values of the pixels in the image corresponding to the rectangle position are set to random values, creating a noise are inside the image. This technique improves the classification by turning the Convolutional Neural Network (CNN) more robust to object occlusion. Also, it helps to reduce overfitting when training the network model.
- Link: Arxiv
- Authors: Terrance DeVries, Graham W. Taylor
In general, data augmentations are applied to the input images, usually before the training step of a deep neural network. In this work, the authors proposed data augmentations in the feature space generated by the neural network. To achieve that, they used an encoder-decoder network. The encoder is used to convert the input image into the feature space. Then augmentations of noise, interpolating, or extrapolating are applied to this feature space. Then, the feature space can be re-converted into an image through a decoder structure used in the classification through fully connected layers.
- Link: Arxiv
- Authors: Luke Taylor, Geoff Nitschke
The authors performed a series of experiments and comparisons of different data augmentation techniques in the classification problem. They focused on geometric methods (flipping, rotation, and cropping), and photometric methods (color jittering, edge enhancement, and fancy PCA).
- Link: Arxiv
- Authors: Xinyue Zhu, Yifan Liu, Zengchang Qin, Jiahong Li
The authors utilized a Cycle-GAN to generate images of emotion to the classification problem. More specifically, they perform a dataset balancement by applying the Cycle-GAN to perform a dataset balancing by generating new examples for the classes with fewer samples in the training dataset. They used the classes with more examples in the training set as the reference to generate the new images of other classes.
- Link: Arxiv
- Authors: Luis Perez, Jason Wang
The authors evaluated different augmentation techniques for the classification problem. They compared classical augmentations like shift, zoom, rotation, flip, and distortion, with GANs based augmentations. They also proposed an augmentation called neural augmentation. In the proposed approach, they use a CNN to concatenate two images and generate a new one.
- Link: IEEE Xplore
- Authors: Joseph Lemley, Shabab Bazrafkan, Peter Corcoran
In this work, the authors proposed a data augmentation called Smart Augmentation. In this approach, they use two networks, A and B. The former is a generative model, and the letter is a standard classification model. The generative model, network A, receives a set of images from a determined class and learns to generate new examples of this class. They use the loss of Network B to improve the results of Network A. So, A learns to generate images in a way that improves the classification result of B.
- Link: Research Gate
- Authors: Francesco Calimeri, Aldo Marzullo, Claudio Stamile, Giorgio Terracina
The authors evaluated a data augmentation technique based on Generative Adversarial Network (GAN) to generate new images in the Magnetic Resonance Images (MRI) classification problem.
- Link: IEEE Xplore
- Authors: Jia Shijie, Wang Ping, Jia Peiyi, Hu Siping
The authors evaluated several data augmentation techniques on CIFAR10 and ImageNet classification datasets. They evaluated Flipping, Cropping, Shifting, PCA jittering, Color jittering, Noise, Rotation, GAN, and WGAN.

2018

- Link: Arxiv
- Authors: Alexander Buslaev, Alex Parinov, Eugene Khvedchenya, Vladimir I. Iglovikov, Alexandr A. Kalinin
- Code: Github
- Documentation: https://albumentations.ai/docs/
The authors proposed the Albumentations, a Python library with a huge amount of different data augmentation techniques to use in deep learning applications. They showed that the augmentations from the Albumentation library helped to improve the results in different deep learning applications like image classification, object detection, and semantic segmentation tasks.
- Link: Arxiv
- Authors: Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le
- Code: Github
- Details: https://ai.googleblog.com/2018/06/improving-deep-learning-performance.html
The proposed augmentation framework learns the best augmentation strategies for a determined classification problem. The authors created a search space with several augmentation techniques. A Recurrent Neural Network (RNN) samples an augmentation technique to train a Convolutional Neural Network (CNN). The RNN is updated based on the validation accuracy achieved in the CNN, and it learns the best augmentation strategy for the problem approached.
- Link: Arxiv
- Authors: Hiroshi Inoue
The author proposed a data augmentation called SamplePairing. In this technique, he takes two random images from the training dataset and mixup them by making a per-pixel average of both images. The label of the first image is used as the label of the mixed image. With this augmentation, the training step does not achieve good results. However, a network trained with this augmentation, and then fine-tuned without the augmentation, showed improvements on the classification problem.
- Link: Arxiv
- Authors: Cecilia Summers, Michael J. Dinneen
- Code: Github
The authors realized a series of experiments and investigated the effectiveness of many variations mixing augmentations.
- Link: Arxiv
- Authors: Ryo Takahashi, Takashi Matsubara, Kuniaki Uehara
- Code: Github
A mixing augmentation method called random image cropping and patching (RICAP) is proposed. In this method, four images are randomly cropped and patched, generated a new mixed image. For the label, they mix the one-hot code of the four images by summing up them, with each label weighted by its proportion in the newly generated image.
- Link: Arxiv
- Authors: Xiaofeng Zhang, Zhangyang Wang, Dong Liu, Qing Ling
- Code: Github
The authors proposed a data augmentation technique called deep adversarial data augmentation (DADA). The proposed augmentation is based on Generative Adversarial Networks (GANs) to generate new examples for small datasets.
- Link: Arxiv
- Authors: Nikita Dvornik, Julien Mairal, Cordelia Schmid
The authors trained a Convolution Neural Network to learn the context of the object in the dataset. They cropped the object from the image and created a context image. Then, the network is trained to learn the probability of different objects being in that background. They used this network to guide the proposed data augmentation, where objects are placed on new background images.
- Link: Arxiv
- Authors: Maayan Frid-Adar, Idit Diamant, Eyal Klang, Michal Amitai, Jacob Goldberger, Hayit Greenspan
The authors evaluated data augmentation techniques to the computed tomography (CT) classification problem. They evaluated classical augmentations like translation, rotation, scaling, flipping, and shearing. They also evaluated two Generative Adversarial Networks (GANs): the Deep Convolutional GAN (DCGAN) and the Auxiliary Classifier GAN (ACGAN).
- Link: SPIE Digital Library
- Authors: Ali Madani, Mehdi Moradi, Alexandros Karargyris, Tanveer Syeda-Mahmood
The authors applied a Generative Adversarial Network (GAN) to generate chest x-ray images.
- Link: Open Access
- Authors: Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-Yi Wu, Po-Hao Hsu, Shang-Hong Lai
The authors proposed a Generative Adversarial Network (GAN) to the image-to-image translation problem. The input image is fed into the encoder network and transformed into the encoded feature space. The encoded feature space inputs two networks in parallel: a decoder network and a semantic segmentation network. Both networks have their weights shared. The output of the decoder network is fed into the inverse network to reconstruct the original image.

2019

- Link: Academic
- Authors: Marcus D. Bloice, Peter M. Roth, Andreas Holzinger
- Code: Github
- Documentation: https://augmentor.readthedocs.io/en/master/
The authors developed a package for biomedical images augmentations.
- Link: Arxiv
- Authors: Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo
- Code: Github
The proposed augmentation strategy performs the mix of two training samples A and B, by cutting a rectangle area in sample A and replacing it by a slice of sample B, generating a new training sample C as the combination of both training samples. The label is also adjusted to match the proportion of each sample in the new image. The advantage of this method over the ones that just fill a rectangle region of the image with zeros or random noise values is that there is no information loss, which grows the training efficiency.
- Link: Arxiv
- Authors: Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, Sungwoong Kim
- Code: Github
The authors proposed an algorithm to automatically find the best augmentation policies to train a classification network. The proposed framework is similar to its predecessor, AutoAugment, but focused on reducing the time to find the best augmentations to the problem. In the proposed algorithm, the training dataset is split in K-folds, with each fold split into two sets, called Dm and Da. The k sets of Dm are used to train neural networks and the k sets of Da are used to find the best augmentation policies. In each Da, they applied a Bayes Optimization algorithm to select B augmentation policies, which are validated in the neural networks trained in with the Dm sets. For each fold, the top-N policies are selected and concatenated, generating a unique set of augmentation policies. In the end, the network is trained with the entire training dataset plus the best augmentations and validated in the validation dataset.
- Link: Arxiv
- Authors: Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le
- Code: Github
The proposed method also tries to overcome the problem of a human selecting the best augmentation techniques to be applied in the network training. Similar methods turn the training step extremely complex and time-consuming. This method tries to simplify and speed up this process by simply selecting a random set of N augmentations in the list of all possible augmentations.
- Link: Arxiv
- Authors: Mate Kisantal, Zbigniew Wojna, Jakub Murawski, Jacek Naruniec, Kyunghyun Cho
- Code: Github
In the proposed work, the authors showed that the detection and segmentation of small objects in images are still an open problem. In state-of-art networks based on anchor boxes, small objects have a limited number of anchors, which harms the detection process. To improve the ability of the network to find small objects, a data augmentation is proposed. In images with the occurrence of small objects, the authors made a copy of the object and pasted many copies over the image. This augmentation helped the network to generate more anchor boxes for the objects, which helped the network to detect such objects.
- Link: Arxiv
- Authors: Hao-Shu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yong-Lu Li, Cewu Lu
- Code: Github
The authors proposed an augmentation technique called Random InstaBoost. In this approach, given an image and its respective segmentation mask, they cut the object from the scene and place it elsewhere on the image. The hole left by the object in its original location is closed by an inpainting technique.
- Link: Arxiv
- Authors: Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz
- Code: Github
The idea behind the proposed method is to perform the image-to-image translation task with few examples of a target class. To achieve this objective, a Generative Adversarial Network, called FUNIT, is proposed. The FUNIT is divided into two steps: training and deployment. In training, the network is trained with several images from different classes, also called source images. In deployment, the FUNIT receives a set of few images from a target class, not used in the training step, and the network is able to translate examples from the source images to the target images class.
- Link: ACM Digital Library
- Authors: Rui Ma, Pin Tao, Huiyun Tang
The authors evaluated seven popular data augmentations in a semantic segmentation problem. They evaluated color transform, flipping, projection transform, jpeg compression, cropping, local shift, and local copy.
- Link: IEEE Xplore
- Authors: Shuangting Liu, Jiaqi Zhang, Yuxin Chen, Yifan Liu, Zengchang Qin, Tao Wan
The authors proposed a data augmentation technique based on Generative Adversarial Networks (GANs). They trained a GAN to generate realistic images from segmentation masks. Then, they manually created segmentation masks with the classes which they wanted to augment and used the GAN to generate a corresponding image.
- Link: Arxiv
- Authors: Yang Liu, Pietro Perona, Markus Meister
The authors proposed a data augmentation called PanDA for panoptic segmentation problems. In this augmentation, they use the panoptic labels to separate foreground objects from the background. The background is filled with noise to hide the regions where the objects were removed. Then, they apply operations like shift and resize on the foreground objects and put them again on the background image.
- Link: Arxiv
- Authors: Jaehoon Choi, Taekyung Kim, Changick Kim
The authors proposed a data augmentation called Target-Guided and Cycle-Free Data Augmentation (TGCF-DA) . This augmentation is based on Generative Adversarial Networks (GANs) to generate realistic and labeled images from synthetic labeled images and realistic unlabeled images.

2020

- Link: Arxiv
- Authors: Taesung Park, Alexei A. Efros, Richard Zhang, Jun-Yan Zhu
- Code: Github
- Project: http://taesung.me/ContrastiveUnpairedTranslation/
The authors uses a patch strategy to improve the image-to-image translation problem. They use a Generative Adversarial Network to translate images from a specific domain to another. Then, they perform a final adjustment by comparing patches from the input image and the resulting image.
- Link: Arxiv
- Authors: Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan
- Code: Github
The authors proposed a mixing of different augmentation techniques. Instead of performing a series of augmentations in sequence, they use different augmentations chains in parallel, where each chain is a sequence of augmentation techniques. Each sequence of augmentations generates a different augmented image, that is mixed through elementwise convex combinations, with all augmented images receiving a different weight. In the end, the resulting augmented image is mixed with the original image.
- Link: Arxiv
- Authors: Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia
- Code: Github
The proposed augmentation generates a mask with a sequence of black squares uniformly distributed forming a grid on top of the image. The regions of the image corresponding to the black squares in the mask are removed by setting the pixels to zero. This method is similar to others that remove regions from the image, forcing the network to learn the same concepts with different regions of the image and improve the ability to learn objects with occlusion. It also helps to reduce overfitting. The advantage of this method over others similar is that this other method uses random positions and size of the image to be removed, which can generate examples that remove the entire object in the image or do not remove any relevant information on it.
- Link: Arxiv
- Authors: Kuniaki Saito, Kate Saenko, Ming-Yu Liu
- Code: Coming soon
- Project: https://nvlabs.github.io/COCO-FUNIT/
Motivated by some content losses when translating more complex images using the FUNIT network, the authors proposed some improvements to the original work. The idea is the same as the original FUNIT, learn to translate images from a source class to a target class with a few examples from the target class. Aiming to mitigate the content loss in the FUNIT image-to-image translation results, the authors of COCO-FUNIT proposed an adaptation to the FUNIT called e COntent-COnditioned style encoder (COCO), to replace the original style encoder from FUNIT. In the COCO encoder, both style and content from the images are used to encoder the target class, which reduced the content loss problem.
- Link: Arxiv
- Authors: Ali Dabouei, Sobhan Soleymani, Fariborz Taherkhani, Nasser M. Nasrabadi
- Code: Github
The authors proposed a mixing augmentation based on saliency regions. In the supermix, for a set of images to be mixed, a set of mixing binary masks are created. Then, a trained model is responsible to optimize these mixing masks in a way that the salient regions in the images are present in the final mixed image.
- Link: Arxiv
- Authors: Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le
- Code: Github
The authors proposed a study about the behavior of several augmentations techniques, usually applied to classification problems, on object detection problems. The authors also proposed a search method to find the best augmentation policies to be applied in an object detection problem. To achieve this goal, they defined an augmentation policy as a set of K sub-policies, with each sub-policy being a set of N image transformations. They trained a Recurrent Neural Network (RNN) to find the K sub-policies that better that composes an augmentation policy.
- Link: Arxiv
- Authors: Mulham Fawakherji, Ciro Potena, Alberto Pretto, Domenico D. Bloisi, Daniele Nardi
The proposed work uses a Generative Adversarial Network to generate new instances to the crop/weed segmentation problem. The idea behind this method is to crop regions of the image close to the crop/weed and use the GAN to generate only a new instance of the object, instead of an entirely new image. Then, they use another GAN to replace instances of crop/weed in real images to the generated ones in the previous step.
- Link: Arxiv
- Authors: Viktor Olsson, Wilhelm Tranheden, Juliano Pinto, Lennart Svensson
- Code: Github
A data augmentation for semantic segmentation problem, called ClassMix, is proposed. The proposed method takes two random, unlabeled, images A and B. Then, uses a neural network to perform the segmentation of both images, generating two segmentation masks Sa and Sb. From Sa, a binary mask is created by randomly selecting half of the classes in Sa. This binary mask is used to mix the images A and B and the segmentation masks Sa and Sb.
- Link: Arxiv
- Authors: Iñigo Azqueta-Gavaldon, Florian Fröhlich, Klaus Strobl, Rudolph Triebel
The authors applied the CycleGAN to convert non-realist images (3D models) of surgical instruments into realist images. Then, these new realist images are used as augmentations to train a semantic segmentation network.