Pixel Level Data Augmentation with GANs

Jiaqi Zhang
Nov 18, 2018
2 min read

Fig. 1 Pipeline of our proposed method for data augmentation.

Unbalanced semantic label distribution could have a negative inﬂuence on segmentation accuracy. In this paper, we investigate using data augmentation approach to balance the label distribution in order to improve segmentation performance. We propose using generative adversarial networks (GANs) to generate realistic images for improving the performance of semantic segmentation networks. Experimental results show that the proposed method can not only improve segmentation accuracy of those classes with low accuracy, but also obtain 1.3% to 2.1% increase in average segmentation accuracy.

Label distribution analysis:

To verify the relevance between the frequency of each label class and segmentation accuracy, we analyse the statistics.

We choose Cityscapes as the test data set. This data set records city street scenes in 50 different cities. It deﬁnes 30 visual classes (labels) for annotation, leaving 19 classes for evaluation. In our experiments, we just use 19 classes of semantic labels. We ﬁrst calculate the label distribution. For each label class, the frequency of each label class appearing in the training set and the validation set is derived, and we call it as appearance frequency. Then we calculate the average segmentation accuracy of top 5 ranked models on Cityscapes website. Fig. 2 illustrates the correlation between the label distribution and segmentation accuracy. Comparing those classes with low appearance frequency and those have low segmentation accuracy, we ﬁnd out that two groups are highly overlapped. In other word, it is possible to balance data distribution and further improve segmentation accuracy by increasing the appearance frequency on some speciﬁc classes.

Method:

The main idea of our method is generating supplementary data for semantic segmentation to balance the distribution of semantic labels and improve the segmentation results. Fig. 1 schematically shows the procedure of the approach we proposed. Our method is consist of 3 steps, 1. Train data generator. 2. Synthesis training data. 3. Segmentation

Fig. 2 An original image (a) and its corresponding semantic label map (b). We select several semantic labels including street, car, vegetation and etc. to reconstruct a new semantic label map (c). Then we use GANs to generate its corresponding realistic image (d).

1. Training data generator:

We use the Pix2pix HD model to generate realistic images given a speciﬁc semantic label map as our data generator. Real images (e.g. Fig. 2a) and their corresponding semantic label maps (e.g. Fig. 2b) from the original dataset are trained in pairs. Besides the generator G, there is a discriminator D to help completing the whole training process. G and D constitute Generative Adversarial Networks (GANs). The aim of generator G is to transfer semantic label maps to realistic images, while the discriminator D is used to distinguish real images (original images) from fake and realistic images generated by the generator G. We use minimax algorithm to model the strategy.

2. Synthesis training data

To synthesis supplementary data, there are three small steps in this step.

Class extraction: Separating every classes apart and form class repositories.

Reconstruction: Selecting classes from repositories and form new semantic label maps

Synthesis: Using trained GANs transfer the new label maps into realistic images.

Results:

Our pixel level data augmentation method can effectively balance the unbalanced data set and further specifically promote the accuracy of low-accuracy labels nearly without accuracy dropping of other labels. The results shown that mean accuracy of a speciﬁc class can increase up to 5.5% and the average segmentation accuracy increases 2%.

More details are in our thesis: Pixel-level data augmentation with GANs.