top of page

Generative Adversarial Networks

A Generative Adversarial Network (GAN) trains two neural networks in opposition: a Generator (G) that maps latent noise (optionally conditioned on input) into synthetic samples, and a Discriminator (D) that distinguishes real data from generated data and provides gradients that drive G to improve; training is a minimax game where D maximizes correct real/fake classification while G minimizes D’s ability to detect fakes, producing a model that can sample from a learned data distribution without explicit likelihood modeling.

This repo contains notebook implementations of multiple GAN families and a production-grade system design that describes how to operationalize generative models end-to-end (versioned data → training → evaluation gates → deployment → monitoring → retraining).

GAN Architecture Progression + Production Design

This repo is a GAN architecture progression implemented as notebooks plus a production-grade system design for training/serving/monitoring generative models. The production layer is a design spec, not an infra codebase (no Kubernetes manifests, CI pipelines, or serving service implementation).


What is a GAN

A GAN is a two-network minimax game:

  • Generator (G): maps input (noise or condition) → synthetic sample. learns a mapping from latent/condition input to a synthetic sample.
    G: (z, c) → x̂
  • Discriminator (D): predicts real vs fake and supplies gradients to G learns a binary classifier over samples and supplies gradients to G.
    D: x → [0, 1]

Objective

min_G max_D  E_{x~p_data}[log D(x)] + E_{z~p(z)}[log(1 − D(G(z)))]

Training loop

  • Discriminator step (maximize): increase D(x) for real data and decrease D(G(z)) for generated samples.
  • Generator step (minimize): update G to increase D(G(z)) while D is frozen.

Implication

GAN training is not likelihood-based. It is a coupled game. Loss curves are weak quality signals; use sample grids across checkpoints and multi-seed diversity as primary diagnostics.


Project Summary

# Project Architecture Capability demonstrated Input → Output
1 Slanted Land Vanilla GAN adversarial dynamics in the smallest setting noise → 2×2 binary image
2 Fake Faces DCGAN convolutional synthesis + improved stability noise → face image
3 Frontalization Pix2Pix (cGAN) paired conditional translation angled face → frontal face
4 Text-to-Image StackGAN Stage-I/II text conditioning + staged refinement caption embedding → image
5 High-Res Synthesis StyleGAN2 + ADA high-res realism + small-data stabilization latent → high-res image

Key design decisions

Vanilla GAN (2×2)
Focus: isolate the adversarial loop without convolutional complexity.

  • shallow MLP G/D
  • tiny binary image domain to visualize collapse and instability quickly

DCGAN
Focus: add convolutional inductive bias to stabilize image synthesis.

  • conv G/D
  • BatchNorm + ReLU / LeakyReLU
  • latent sampling + grid visualization as primary eval

Pix2Pix (cGAN)
Focus: enforce conditional consistency using paired supervision.

  • G conditioned on source image
  • D judges realism + (source, output) consistency
  • inspect alignment artifacts (edges, symmetry, texture continuity)

StackGAN Stage-I/II
Focus: split the problem into layout then refinement.

  • Stage-I: coarse structure from text embedding
  • Stage-II: refine into higher fidelity
  • primary risk: text-image mismatch; inspect adherence vs realism trade-off

StyleGAN2 + ADA
Focus: photorealistic high-res synthesis with small-data stability.

  • style-based generator (controllable latent transformations)
  • ADA adjusts augmentation strength to reduce discriminator overfit
  • showcase: style mixing and latent interpolation panels
gan_system_design.png

System design

The design doc frames GAN work as a full ML lifecycle:

  • versioned data + artifacts
  • training + validation gates
  • orchestration + CI/CD
  • GPU serving + autoscaling
  • monitoring + drift triggers
  • security (IAM/VPC/KMS/secrets)
bottom of page