MNIST Diffusion Baseline

Status: Complete
Type: Baseline

Objective

Establish a working baseline for image diffusion using the standard DDPM approach on MNIST digits. This serves as a validation of our diffusion implementation before moving to text modalities.

Configuration

Model: SimpleUNet with residual blocks and temporal embeddings
Training: Standard DDPM on MNIST dataset
Architecture:
UNet backbone with down/up sampling
Temporal embedding for timestep conditioning
Residual connections throughout
Dataset: MNIST handwritten digits (28x28 grayscale)
Hardware: NVIDIA T4
Git Commit: 4422ce927fbf61e226157e4a3f2ac8de91b583bb

Hypothesis

Standard DDPM should work well for MNIST generation, providing a solid foundation for understanding diffusion mechanics before tackling text generation challenges.

Results

Quantitative

Training converged successfully

Qualitative

Generated samples are recognizable MNIST digits
Clear progression from noise to structured digits during reverse process

The training progression shows the model learning to generate increasingly coherent MNIST digits:

Early Training (Epochs 1-3)

Epoch 1
Initial noise, barely recognizable patterns

Epoch 2
Some digit-like shapes emerging

Epoch 3
More defined structures appearing

Training Progression (Epochs 100-1000)

Epoch 100
Clear digit shapes, some noise remaining

Epoch 500
Well-formed digits, improved clarity

Epoch 1000
High-quality, recognizable MNIST digits

Key Observations: - Epochs 1-3: Model learns basic structure and shape concepts - Epoch 100: Recognizable digits with some artifacts - Epochs 500-1000: Converged to high-quality digit generation

Next Steps

Move to text diffusion experiments using similar architecture principles
Investigate embedding space approaches for text generation

Sample Generation:

uv run python -m src.mnist --sample