πŸ§ͺ ResNet vs Plain CNN on CIFAR-10 (Degradation Study)

This repository contains implementations, trained weights, and experiment notebooks comparing plain convolutional networks and residual networks on the CIFAR-10 dataset, demonstrating the degradation problem and how residual connections mitigate it.

πŸ“‹ Contents

  • checkpoints/ β€” trained model weights: Plain20, Plain56, ResNet20, ResNet56
  • notebooks/ β€” Jupyter notebooks covering architectures, training, evaluation, and comparisons
  • results/ β€” performance plots (accuracy, loss curves, degradation behaviour)
  • README.md β€” this file

🧠 Experiment Summary

Motivation

As network depth increases, plain convolutional nets may suffer increased training error (the β€œdegradation” problem). Residual networks (ResNets) with skip-connections address this problem.

Models

  • Plain20 β€” plain CNN, 20 layers
  • Plain56 β€” plain CNN, 56 layers (demonstrates degradation)
  • ResNet20 β€” residual network, 20 layers
  • ResNet56 β€” residual network, 56 layers (shows improved depth)

Dataset

  • CIFAR-10: 10 classes, 32Γ—32 colour images

Key Findings

  • Plain56 shows higher training/test error than Plain20 (degradation).
  • ResNet56 trains and generalises better than Plain56, illustrating the value of skip-connections.
  • Detailed curves and comparisons in notebooks/ and results/.

βš™οΈ How to Use

import torch
from huggingface_hub import hf_hub_download
from models import create_model

repo_id = "arpit-gour02/resnet-vs-plainnets-cifar10"

ckpt = hf_hub_download(repo_id=repo_id, filename="resnet56.pth")
model = create_model("resnet56", num_classes=10)
state_dict = torch.load(ckpt, map_location="cpu")
model.load_state_dict(state_dict)
model.eval()

# Example inference
x = torch.randn(1, 3, 32, 32)
logits = model(x)
pred = logits.argmax(dim=1).item()
print("Predicted class:", pred)

βš™οΈ Training Configuration

The training setup strictly follows the original paper to ensure fair comparison.

Hyperparameter Value
Dataset CIFAR-10
Batch Size 128
Optimizer SGD (Stochastic Gradient Descent)
Initial Learning Rate 0.1
Momentum 0.9
Weight Decay 0.0001 ($10^{-4}$)
Total Epochs ~164 (64k iterations)
Initialization He Normal (kaiming_normal_)

Learning Rate Schedule

We use a MultiStepLR scheduler to drop the learning rate when the loss plateaus.

  • Epoch 0 - 81: lr = 0.1
  • Epoch 82 - 122: lr = 0.01
  • Epoch 123 - End: lr = 0.001

πŸ–ΌοΈ Data Preprocessing

Standard data augmentation is applied to prevent overfitting on the small CIFAR-10 images:

  1. Normalization: Per-channel mean subtraction and division by standard deviation.
    • Mean: (0.4914, 0.4822, 0.4465)
    • Std: (0.2023, 0.1994, 0.2010)
  2. Padding: Pad 4 pixels on each side (image becomes $40 \times 40$).
  3. Random Crop: Crop back to $32 \times 32$.
  4. Horizontal Flip: Random probability of 0.5.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train arpit-gour02/resnet-vs-plainnets-cifar10

Evaluation results