π§ͺ ResNet vs Plain CNN on CIFAR-10 (Degradation Study)
This repository contains implementations, trained weights, and experiment notebooks comparing plain convolutional networks and residual networks on the CIFAR-10 dataset, demonstrating the degradation problem and how residual connections mitigate it.
π Contents
checkpoints/β trained model weights: Plain20, Plain56, ResNet20, ResNet56notebooks/β Jupyter notebooks covering architectures, training, evaluation, and comparisonsresults/β performance plots (accuracy, loss curves, degradation behaviour)README.mdβ this file
π§ Experiment Summary
Motivation
As network depth increases, plain convolutional nets may suffer increased training error (the βdegradationβ problem). Residual networks (ResNets) with skip-connections address this problem.
Models
- Plain20 β plain CNN, 20 layers
- Plain56 β plain CNN, 56 layers (demonstrates degradation)
- ResNet20 β residual network, 20 layers
- ResNet56 β residual network, 56 layers (shows improved depth)
Dataset
- CIFAR-10: 10 classes, 32Γ32 colour images
Key Findings
- Plain56 shows higher training/test error than Plain20 (degradation).
- ResNet56 trains and generalises better than Plain56, illustrating the value of skip-connections.
- Detailed curves and comparisons in
notebooks/andresults/.
βοΈ How to Use
import torch
from huggingface_hub import hf_hub_download
from models import create_model
repo_id = "arpit-gour02/resnet-vs-plainnets-cifar10"
ckpt = hf_hub_download(repo_id=repo_id, filename="resnet56.pth")
model = create_model("resnet56", num_classes=10)
state_dict = torch.load(ckpt, map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
# Example inference
x = torch.randn(1, 3, 32, 32)
logits = model(x)
pred = logits.argmax(dim=1).item()
print("Predicted class:", pred)
βοΈ Training Configuration
The training setup strictly follows the original paper to ensure fair comparison.
| Hyperparameter | Value |
|---|---|
| Dataset | CIFAR-10 |
| Batch Size | 128 |
| Optimizer | SGD (Stochastic Gradient Descent) |
| Initial Learning Rate | 0.1 |
| Momentum | 0.9 |
| Weight Decay | 0.0001 ($10^{-4}$) |
| Total Epochs | ~164 (64k iterations) |
| Initialization | He Normal (kaiming_normal_) |
Learning Rate Schedule
We use a MultiStepLR scheduler to drop the learning rate when the loss plateaus.
- Epoch 0 - 81:
lr = 0.1 - Epoch 82 - 122:
lr = 0.01 - Epoch 123 - End:
lr = 0.001
πΌοΈ Data Preprocessing
Standard data augmentation is applied to prevent overfitting on the small CIFAR-10 images:
- Normalization: Per-channel mean subtraction and division by standard deviation.
- Mean:
(0.4914, 0.4822, 0.4465) - Std:
(0.2023, 0.1994, 0.2010)
- Mean:
- Padding: Pad 4 pixels on each side (image becomes $40 \times 40$).
- Random Crop: Crop back to $32 \times 32$.
- Horizontal Flip: Random probability of 0.5.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support