--- tags: - fraud-detection - ensemble-learning - e-commerce - imbalanced-data license: mit metrics: - accuracy - precision - recall - f1 - auc --- # E-Commerce Fraud Detection Model ## Model Description This is an ensemble fraud detection system trained on 1.47M e-commerce transactions with a 5.01% fraud rate. ### Architecture **Weighted Ensemble Strategy (70%-30%)** - **Stage 1 - Recall Specialists (70% weight):** Logistic Regression + Random Forest - **Stage 2 - Precision Specialists (30% weight):** Neural Network + XGBoost ### Performance Metrics | Model | Accuracy | Precision | Recall | F1-Score | AUC-ROC | |-------|----------|-----------|--------|----------|---------| | Logistic Regression | 0.5723 | 0.0988 | 0.9273 | 0.1786 | 0.8619 | | Random Forest | 0.6203 | 0.1075 | 0.8999 | 0.1920 | 0.8712 | | Neural Network | 0.9569 | 0.7013 | 0.2442 | 0.3623 | 0.8748 | | XGBoost | 0.9558 | 0.6632 | 0.2389 | 0.3513 | 0.8459 | | Stacking Ensemble | 0.8973 | 0.2640 | 0.5868 | 0.3642 | 0.8731 | ### Key Features - **52 engineered features** including: - Transaction patterns (amount, quantity, frequency) - Customer behavior (account age, transaction history) - Temporal features (time-based patterns) - Risk indicators (unusual patterns, high-value flags) - Interaction features (multi-dimensional risk signals) ### Training - **Resampling:** ADASYN (1:1 balance) - **GPU Acceleration:** RAPIDS cuML, PyTorch, XGBoost - **Threshold Optimization:** F-beta score optimization - **Validation:** Stratified K-Fold Cross-Validation ### Usage ```python ### Usage ## Warning: Need GPU environment with CUDA installed ```python import joblib import numpy as np # Load models lr_model = joblib.load("lr_model.pkl") rf_model = joblib.load("rf_model.pkl") nn_model = joblib.load("nn_model.pkl") xgb_model = joblib.load("xgb_model.pkl") ensemble_model = joblib.load("ensemble_model.pkl") scaler = joblib.load("scaler.pkl") # Prepare your data df = ... X = df[df.columns.difference(['Is Fraudulent'])].copy() y = df['Is Fraudulent'].copy() # Predict with ensemble fraud_proba = ensemble_model.predict_proba(X)[:, 1] fraud_pred = ensemble_model.predict(X) # Evaluate predictions evaluate_models([lr_model, rf_model, nn_model, xgb_model, ensemble_model], X, y, ['Logistic Regression', 'Random Forest', 'Neural Network', 'XGBoost', 'Stacking Ensemble']) ``` ### License MIT License ### Contact COMPSCI 4AL3 - Group 34 Viransh Shah (shahv47@mcmaster.ca) Ellen Xiong (xionge1@mcmaster.ca)