記事

What Is a Generative Adversarial Network?

概要

Generative adversarial networks (GANs) are a type of machine learning architecture involving two models, the generator and the discriminator, which work in opposition to refine the generation of data. The generator creates data resembling real datasets, while the discriminator evaluates its authenticity. This unsupervised learning process enhances both models’ capabilities over time, with the generator improving its mimicry of real data and the discriminator becoming better at distinguishing between real and fake data.

Types of GANs include vanilla GANs, conditional GANs (cGANs), and deep convolutional GANs (DCGANs), each with specific modifications for different applications. Despite their potential, GANs face challenges like mode collapse, non-convergence, and instability during training. However, their ability to generate synthetic data that mirrors real data distributions is advancing fields such as genomics, indicating their significant impact on data analysis and artificial intelligence.

A generative adversarial network (GAN) is a deep-learning machine learning (ML) architecture in which two competing models “work against” each other. In essence, the purpose of the two adversarial models—one that generates data (the generator) and one that evaluates it (the discriminator)—is to ensure the generated outputs’ objectivity.

Like much of ML deep learning, GANs are an exciting and quickly evolving space with a wide range of both current and potential applications. Read on as we cover how GANs work, their use cases, and their different types.

How does a GAN work?

The GAN deep-learning architecture involves two models that work complementary to one another: the generator and the discriminator. You can begin to think about the two in the following way:

The generator works to create data that mimics real data, attempting to fool the discriminator into believing that the generated data is genuine
The discriminator evaluates the data presented to it, determining whether it is real or produced by the generator

More specifically, the generator model takes a fixed-length random vector from a Gaussian distribution as input. This vector represents a point in the latent space—a compressed representation of the data distribution. The generator uses this vector to generate a new data sample that attempts to mimic the real data distribution as closely as possible. As it generates new samples, it learns to map regions of the latent space to plausible data points in the target domain.

The discriminator model takes these generated samples, along with real samples from the target domain, as input. It then assesses each sample to classify it as either "real" (from the actual dataset) or "fake" (produced by the generator). The discriminator's task is to accurately identify the source of each input it receives, enhancing its ability to discern between genuine and artificial data.

This takes place unsupervised. Over time, the process refines both models: The generator improves at creating data that resembles the real dataset, making it increasingly difficult for the discriminator to make accurate classifications. Meanwhile, the discriminator hones its ability to detect the nuances that distinguish real data from fake. This iterative, adversarial cycle ultimately drives both models toward greater generative efficacy.

4 GAN use cases

GANs have a wide range of implications, from generating images and creating 2D/3D models to generating training data and filling gaps in missing data.

Generating new images. GANs can create—and edit—images based on user prompts. The created images are novel and applicable to the real world. For example, if a user asks for an image of a futuristic city, GANs can generate a unique, detailed visualization that matches the request.

Creating 2D and 3D models. Applicable from gaming to healthcare and everything in between, GANs can synthesize detailed and realistic 2D/3D models that can be used for virtual reality experiences, simulations, architectural visualization, prototyping, and more.

Producing training data. GANs can be employed to augment training datasets, especially in scenarios where data is scarce or sensitive in nature. By generating synthetic—yet realistic—examples, GANs can enhance the variety and volume of data available for training machine learning models. For example, GANs can create detailed and diverse medical images for conditions that are rare or hard to document, helping to train diagnostic tools with a broader dataset.

Filling data gaps. GANs can accurately guess and complete the missing information in datasets. In a similar vein to our first example, you can train GANs to generate detailed cityscapes by learning from vast databases of urban imagery. By analyzing patterns in building designs, street layouts, and urban planning concepts, these networks can produce realistic yet entirely new city images.

Beyond these four applications, GAN use cases expand to also include video enhancement, text-to-speech synthesis, style transfer, and deep-fake generation, among others. Even more specifically, a 2024 review published in Mathematics spoke to the integration and impacts of GANs on gene expression data analysis, highlighting the period between 2019 and 2023. It elucidates how GANs, through their capability to generate synthetic data that mirrors real data distributions, address significant challenges in gene expression studies—chiefly the constraints of data scarcity, ethical issues in data collection, and the need for diverse datasets.

The review particularly focuses on the role of GANs in augmenting gene expression datasets, allowing for enhanced analysis by circumventing the limitations posed by sparse and imbalanced datasets.

Furthermore, the review speaks to the innovative applications of GANs in generating synthetic genetic data, aiding in the exploration of genomic structures and functions with greater depth. That said, the review, along with others who work closely with GAN architectures, acknowledges the challenges inherent in the use of GANs, such as:

Mode collapse. This occurs when the generator starts producing a limited variety of outputs or even the same output repeatedly, rather than a diverse range of outputs that accurately reflects the distribution of the training data.
Non-convergence. In this situation, the training process fails to reach a stable solution, with the generator and discriminator continuously oscillating or failing to improve over time.
Instability. Training GANs can be highly unstable, with small changes in parameters or architecture—or the training process leading to significantly different outcomes. This instability makes it challenging to design and train GANs effectively.

The review ultimately alludes to the interesting potential of GANs in advancing the field of genomics and gene expression analysis, indicating the potential for overcoming other longstanding data-related challenges across fields.

Why use GANs?

Since their inception, GANs have rapidly increased in sophistication and capabilities—progressing faster than ever today. Data augmentation through GANs has surpassed traditional image manipulations to evolve into a robust methodology for generating high-quality, widely applicable data.

By synthesizing complex data that maintains the underlying characteristics of real datasets, GANs help in mitigating overfitting—thus improving the generalization capabilities of models on unseen data. Moreover, the advanced capabilities of GANs in generating multi-modal outputs have opened new avenues for research and application, including but not limited to sophisticated image and video synthesis, voice generation, and even the creation of virtual environments and realistic simulations for training autonomous systems.

The continuous improvement of GAN architectures, training stability, and efficiency contribute to advancing not only data augmentation techniques but also the broader field of artificial intelligence (AI).

Types of GANs

There are many types of GANs: CycleGAN, Super-Resolution GAN (SRGAN), Laplacian Pyramid GAN (LAPGAN), and so on. Each type is distinguished by its specific application or architectural modification—aimed at improving performance, efficiency, or output quality in different scenarios.

Among the most prevalent are vanilla GANs, conditional GANs, and deep convolutional GANs.

Vanilla GAN

A vanilla GAN is, as the name implies, the simplest architecture in GAN development, characterized by its straightforward generator and discriminator networks. It serves as a basic model for generating data, though it often requires enhancements for complex applications.

Conditional GAN (cGAN)

Conditional GANs implement an element of control into the generative process by conditioning both the generator and discriminator on additional information—such as class labels, data from other modalities, or even textual descriptions. As a result, this allows for the targeted generation of data, making cGANs particularly useful in tasks requiring specific output characteristics.

For example, a cGAN could be used in fashion design to generate clothing items matching specific attributes or styles. By conditioning the model on textual descriptions, such as "winter coat" or "summer dress," the cGAN can produce images of clothing that fit the given descriptions. This, in turn, enables designers to visualize and refine new ideas based on season-specific requirements.

Deep convolutional GAN (DCGAN)

Deep convolutional GANs enhance the generation of high-quality, detailed images by leveraging deep convolutional neural network (CNN) architectures. This integration improves the GAN's ability to capture and reproduce intricate patterns and textures, yielding highly effective image synthesis.

Practically, a DCGAN could be specifically used to generate detailed architectural designs. After being trained on a vast dataset of building images from various architectural styles, the DCGAN can synthesize new building designs that combine elements from the learned styles, producing innovative and detailed architectural visuals that maintain structural realism.