File size: 3,574 Bytes
eba9c9b 21f06b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
---
language:
- en
license: mit
tags:
- handwriting-recognition
- ocr
- computer-vision
- pytorch
- crnn
- ctc
- iam-dataset
library_name: pytorch
datasets:
- Teklia/IAM-line
metrics:
- cer
- wer
model-index:
- name: handwriting-recognition-iam
results:
- task:
type: image-to-text
name: Handwriting Recognition
dataset:
name: IAM Handwriting Database
type: Teklia/IAM-line
metrics:
- type: cer
value: 0.1295
name: Character Error Rate
- type: wer
value: 0.4247
name: Word Error Rate
---
# Handwriting Recognition
Complete handwriting recognition system using CNN-BiLSTM-CTC on the IAM dataset.
## π Files
### 1. **analysis.ipynb** - Dataset Analysis
- Exploratory Data Analysis (EDA)
- 5 detailed charts saved to `charts/` folder
- Run locally or on Colab (no GPU needed)
### 2. **train_colab.ipynb** - Model Training (GPU)
- **β‘ Google Colab GPU compatible**
- Full training pipeline
- CNN-BiLSTM-CTC model (~9.1M parameters)
- Automatic model saving
- Download trained model for deployment
## π Quick Start
### Option 1: Analyze Dataset (Local/Colab)
```bash
jupyter notebook analysis.ipynb
```
- No GPU needed
- Generates 5 EDA charts
- Fast (~2 minutes)
### Option 2: Train Model (Google Colab GPU)
1. **Upload `train_colab.ipynb` to Google Colab**
2. **Change runtime to GPU:**
- Runtime β Change runtime type β GPU (T4 recommended)
3. **Run all cells**
4. **Download trained model** (last cell)
**Training Time:** ~1-2 hours for 20 epochs on T4 GPU
## π Charts Generated
From `analysis.ipynb`:
1. `charts/01_sample_images.png` - 10 sample handwritten texts
2. `charts/02_text_length_distribution.png` - Text statistics
3. `charts/03_image_dimensions.png` - Image analysis
4. `charts/04_character_frequency.png` - Character distribution
5. `charts/05_summary_statistics.png` - Summary table
## π― Model Details
**Architecture:**
- **CNN**: 7 convolutional blocks (feature extraction)
- **BiLSTM**: 2 layers, 256 hidden units (sequence modeling)
- **CTC Loss**: Alignment-free training
**Dataset:** Teklia/IAM-line (Hugging Face)
- Train: 6,482 samples
- Validation: 976 samples
- Test: 2,915 samples
**Metrics:**
- **CER** (Character Error Rate)
- **WER** (Word Error Rate)
## πΎ Model Files
After training in Colab:
- `best_model.pth` - Trained model weights
- `training_history.png` - Loss/CER/WER plots
- `predictions.png` - Sample predictions
## π¦ Requirements
```
torch>=2.0.0
datasets>=2.14.0
pillow>=9.5.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.13.0
jupyter>=1.0.0
jiwer>=3.0.0
```
## π§ Usage
### Load Trained Model
```python
import torch
# Load checkpoint
checkpoint = torch.load('best_model.pth')
char_mapper = checkpoint['char_mapper']
# Create model
from train_colab import CRNN # Copy model class
model = CRNN(num_chars=len(char_mapper.chars))
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Predict
# ... (preprocessing + inference)
```
## π Notes
- **GPU strongly recommended** for training (use Colab T4)
- Training on CPU will be extremely slow (~20x slower)
- Colab free tier: 12-hour limit, sufficient for 20 epochs
- Model checkpoint includes character mapper for deployment
## π Training Tips
1. **Start with fewer epochs** (5-10) to test
2. **Monitor CER/WER** - stop if not improving
3. **Increase epochs** if still improving (up to 50)
4. **Save checkpoint** before Colab disconnects
5. **Download model immediately** after training
## π License
Dataset: IAM Database (research use)
|