Skip to content

MNIST Diffusion Baseline

Status: Complete
Type: Baseline

Objective

Establish a working baseline for image diffusion using the standard DDPM approach on MNIST digits. This serves as a validation of our diffusion implementation before moving to text modalities.

Configuration

  • Model: SimpleUNet with residual blocks and temporal embeddings
  • Training: Standard DDPM on MNIST dataset
  • Architecture:
  • UNet backbone with down/up sampling
  • Temporal embedding for timestep conditioning
  • Residual connections throughout
  • Dataset: MNIST handwritten digits (28x28 grayscale)
  • Hardware: NVIDIA T4
  • Git Commit: 4422ce927fbf61e226157e4a3f2ac8de91b583bb

Hypothesis

Standard DDPM should work well for MNIST generation, providing a solid foundation for understanding diffusion mechanics before tackling text generation challenges.

Results

Quantitative

  • Training converged successfully

Qualitative

  • Generated samples are recognizable MNIST digits
  • Clear progression from noise to structured digits during reverse process

The training progression shows the model learning to generate increasingly coherent MNIST digits:

Early Training (Epochs 1-3)

Epoch 1

Epoch 1
Initial noise, barely recognizable patterns

Epoch 2

Epoch 2
Some digit-like shapes emerging

Epoch 3

Epoch 3
More defined structures appearing

Training Progression (Epochs 100-1000)

Epoch 100

Epoch 100
Clear digit shapes, some noise remaining

Epoch 500

Epoch 500
Well-formed digits, improved clarity

Epoch 1000

Epoch 1000
High-quality, recognizable MNIST digits

Key Observations: - Epochs 1-3: Model learns basic structure and shape concepts - Epoch 100: Recognizable digits with some artifacts - Epochs 500-1000: Converged to high-quality digit generation

Next Steps

  • Move to text diffusion experiments using similar architecture principles
  • Investigate embedding space approaches for text generation

Sample Generation:

uv run python -m src.mnist --sample