Jan 5, 2025

Comparing Python ML Libraries: Scikit-learn vs TensorFlow vs PyTorch

A comprehensive comparison of the three most popular machine learning libraries with practical examples, benchmarks, and guidance on choosing the right tool for your projects.

Dery Febriantara Developer

Comparing Python ML Libraries: Scikit-learn vs TensorFlow vs PyTorch

Choosing the right machine learning library can significantly impact your productivity, model performance, and deployment options. In this comprehensive guide, we’ll dive deep into the three most popular Python ML libraries: Scikit-learn, TensorFlow, and PyTorch. By the end, you’ll know exactly which tool to reach for in any situation.

Overview: The Big Three

Before we dive into details, let’s understand what each library is designed for:

Library	Primary Use	Created By	First Release
Scikit-learn	Classical ML	INRIA	2007
TensorFlow	Deep Learning	Google	2015
PyTorch	Deep Learning	Facebook/Meta	2016

When to Use Each

Scikit-learn: Tabular data, quick prototyping, classical algorithms
TensorFlow: Production deployment, mobile/edge devices, large-scale systems
PyTorch: Research, experimentation, custom architectures, rapid iteration

Scikit-learn: The Swiss Army Knife

Scikit-learn is the go-to library for traditional machine learning. It provides a consistent, well-documented API that makes it easy to experiment with different algorithms.

Philosophy and Design

Scikit-learn follows a simple design philosophy:

Consistent API: All models use .fit(), .predict(), .transform()
Composability: Build pipelines with preprocessing and models
Sensible defaults: Works out of the box with good hyperparameters
Extensive documentation: Every function is thoroughly documented

Strengths

Simple, consistent API: Learn once, apply everywhere
Excellent documentation: Tutorials, examples, user guide
Great for classical ML: SVM, Random Forest, Gradient Boosting, etc.
Built-in preprocessing: Scaling, encoding, feature selection
Model selection tools: Cross-validation, grid search, metrics
Integration: Works seamlessly with NumPy, Pandas, and visualization libraries

Complete Example: Classification Pipeline

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
import matplotlib.pyplot as plt

# Load and prepare data
# Using a sample dataset structure
np.random.seed(42)
n_samples = 1000

data = pd.DataFrame({
    'age': np.random.randint(18, 70, n_samples),
    'income': np.random.normal(50000, 20000, n_samples),
    'education': np.random.choice(['high_school', 'bachelor', 'master', 'phd'], n_samples),
    'credit_score': np.random.randint(300, 850, n_samples),
    'years_employed': np.random.randint(0, 40, n_samples),
    'approved': np.random.randint(0, 2, n_samples)
})

# Add some missing values
data.loc[np.random.choice(data.index, 50), 'income'] = np.nan
data.loc[np.random.choice(data.index, 30), 'credit_score'] = np.nan

# Separate features and target
X = data.drop('approved', axis=1)
y = data['approved']

# Identify column types
numeric_features = ['age', 'income', 'credit_score', 'years_employed']
categorical_features = ['education']

# Create preprocessing pipelines
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('encoder', LabelEncoder())
])

# Note: For OneHotEncoder use this instead
from sklearn.preprocessing import OneHotEncoder
categorical_transformer_onehot = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('encoder', OneHotEncoder(handle_unknown='ignore'))
])

# Combine preprocessors
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer_onehot, categorical_features)
    ])

# Create full pipeline with classifier
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(random_state=42))
])

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train and evaluate
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]

print("Classification Report:")
print(classification_report(y_test, y_pred))
print(f"ROC-AUC Score: {roc_auc_score(y_test, y_proba):.4f}")

# Cross-validation
cv_scores = cross_val_score(pipeline, X, y, cv=5, scoring='roc_auc')
print(f"\nCross-validation ROC-AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")

Hyperparameter Tuning

# Grid search with cross-validation
param_grid = {
    'classifier__n_estimators': [50, 100, 200],
    'classifier__max_depth': [5, 10, 20, None],
    'classifier__min_samples_split': [2, 5, 10],
    'classifier__min_samples_leaf': [1, 2, 4]
}

grid_search = GridSearchCV(
    pipeline,
    param_grid,
    cv=5,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1
)

grid_search.fit(X_train, y_train)

print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.4f}")

# Evaluate best model
best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test)
y_proba_best = best_model.predict_proba(X_test)[:, 1]
print(f"Test ROC-AUC: {roc_auc_score(y_test, y_proba_best):.4f}")

Comparing Multiple Algorithms

from sklearn.model_selection import cross_validate

# Define models to compare
models = {
    'Logistic Regression': LogisticRegression(max_iter=1000),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
    'Gradient Boosting': GradientBoostingClassifier(n_estimators=100, random_state=42),
    'SVM': SVC(probability=True, random_state=42)
}

# Compare models
results = {}
for name, model in models.items():
    # Create pipeline with each model
    clf_pipeline = Pipeline(steps=[
        ('preprocessor', preprocessor),
        ('classifier', model)
    ])

    # Cross-validate
    cv_results = cross_validate(
        clf_pipeline, X, y,
        cv=5,
        scoring=['accuracy', 'roc_auc', 'f1'],
        return_train_score=True
    )

    results[name] = {
        'accuracy': cv_results['test_accuracy'].mean(),
        'roc_auc': cv_results['test_roc_auc'].mean(),
        'f1': cv_results['test_f1'].mean()
    }

# Display results
results_df = pd.DataFrame(results).T
print(results_df.round(4))

When to Use Scikit-learn

Perfect for:

Tabular/structured data (CSV, databases)
Quick prototyping and experimentation
Classical ML algorithms (SVM, trees, linear models)
Feature engineering and preprocessing
Model selection and hyperparameter tuning
Small to medium datasets

Not ideal for:

Deep learning and neural networks
Image, video, or audio processing
Large-scale distributed training
GPU acceleration
State-of-the-art NLP models

TensorFlow: Production-Ready Deep Learning

TensorFlow is Google’s deep learning framework, designed for production deployment at scale.

Philosophy and Design

TensorFlow prioritizes:

Production readiness: Easy deployment to servers, mobile, browsers
Ecosystem: TensorBoard, TensorFlow Lite, TensorFlow.js, TFX
Scalability: Distributed training across multiple GPUs/TPUs
Keras integration: High-level API for rapid development

Strengths

Production deployment: TensorFlow Serving, TF Lite, TF.js
Comprehensive ecosystem: Tools for every part of the ML lifecycle
TensorBoard: Excellent visualization and debugging
Keras API: User-friendly high-level interface
TPU support: Native support for Google’s TPUs
Mobile and edge: Deploy to phones, IoT devices, browsers

Complete Example: Image Classification

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np

# Check GPU availability
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU available: {tf.config.list_physical_devices('GPU')}")

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Preprocessing
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert labels to categorical
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

# Data augmentation
datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    zoom_range=0.1
)
datagen.fit(x_train)

# Build CNN model
def create_cnn_model(input_shape, num_classes):
    model = keras.Sequential([
        # First convolutional block
        layers.Conv2D(32, (3, 3), padding='same', input_shape=input_shape),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.Conv2D(32, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),

        # Second convolutional block
        layers.Conv2D(64, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.Conv2D(64, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),

        # Third convolutional block
        layers.Conv2D(128, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.Conv2D(128, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),

        # Dense layers
        layers.Flatten(),
        layers.Dense(512),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation='softmax')
    ])

    return model

model = create_cnn_model((32, 32, 3), num_classes)

# Compile model
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()

# Callbacks
callbacks = [
    keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=10,
        restore_best_weights=True
    ),
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=5,
        min_lr=1e-6
    ),
    keras.callbacks.TensorBoard(
        log_dir='./logs',
        histogram_freq=1
    ),
    keras.callbacks.ModelCheckpoint(
        'best_model.keras',
        save_best_only=True,
        monitor='val_accuracy'
    )
]

# Train model
history = model.fit(
    datagen.flow(x_train, y_train, batch_size=64),
    epochs=100,
    validation_data=(x_test, y_test),
    callbacks=callbacks,
    verbose=1
)

# Evaluate
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print(f"\nTest accuracy: {test_acc:.4f}")

Transfer Learning with TensorFlow

from tensorflow.keras.applications import ResNet50, VGG16, MobileNetV2
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout

def create_transfer_learning_model(base_model_name='resnet50', num_classes=10):
    # Choose base model
    base_models = {
        'resnet50': ResNet50,
        'vgg16': VGG16,
        'mobilenet': MobileNetV2
    }

    # Load pretrained model without top layers
    base_model = base_models[base_model_name](
        weights='imagenet',
        include_top=False,
        input_shape=(224, 224, 3)
    )

    # Freeze base model layers
    base_model.trainable = False

    # Build model
    inputs = keras.Input(shape=(224, 224, 3))
    x = base_model(inputs, training=False)
    x = GlobalAveragePooling2D()(x)
    x = Dense(256, activation='relu')(x)
    x = Dropout(0.5)(x)
    outputs = Dense(num_classes, activation='softmax')(x)

    model = keras.Model(inputs, outputs)

    return model, base_model

# Create and compile
model, base_model = create_transfer_learning_model('mobilenet', num_classes=10)
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# After initial training, fine-tune the base model
def fine_tune_model(model, base_model, fine_tune_at=100):
    # Unfreeze top layers of base model
    base_model.trainable = True

    # Freeze layers before fine_tune_at
    for layer in base_model.layers[:fine_tune_at]:
        layer.trainable = False

    # Recompile with lower learning rate
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=1e-5),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    return model

Custom Training Loop in TensorFlow

# For more control, use custom training loops
@tf.function
def train_step(model, optimizer, loss_fn, x_batch, y_batch):
    with tf.GradientTape() as tape:
        predictions = model(x_batch, training=True)
        loss = loss_fn(y_batch, predictions)

    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    return loss

@tf.function
def test_step(model, loss_fn, x_batch, y_batch):
    predictions = model(x_batch, training=False)
    loss = loss_fn(y_batch, predictions)
    accuracy = tf.reduce_mean(
        tf.cast(tf.argmax(predictions, axis=1) == tf.argmax(y_batch, axis=1), tf.float32)
    )
    return loss, accuracy

# Training loop
def custom_training(model, train_dataset, test_dataset, epochs=10):
    optimizer = keras.optimizers.Adam(learning_rate=0.001)
    loss_fn = keras.losses.CategoricalCrossentropy()

    for epoch in range(epochs):
        # Training
        train_losses = []
        for x_batch, y_batch in train_dataset:
            loss = train_step(model, optimizer, loss_fn, x_batch, y_batch)
            train_losses.append(loss.numpy())

        # Validation
        test_losses = []
        test_accuracies = []
        for x_batch, y_batch in test_dataset:
            loss, acc = test_step(model, loss_fn, x_batch, y_batch)
            test_losses.append(loss.numpy())
            test_accuracies.append(acc.numpy())

        print(f"Epoch {epoch+1}: "
              f"Train Loss = {np.mean(train_losses):.4f}, "
              f"Test Loss = {np.mean(test_losses):.4f}, "
              f"Test Acc = {np.mean(test_accuracies):.4f}")

Saving and Deploying TensorFlow Models

# Save full model
model.save('my_model.keras')

# Save in SavedModel format (recommended for serving)
model.save('saved_model/my_model')

# Convert to TensorFlow Lite for mobile
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/my_model')
tflite_model = converter.convert()

with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

# Quantize for smaller model size
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_model = converter.convert()

# Export to TensorFlow.js for browser
# Run in terminal: tensorflowjs_converter --input_format=keras my_model.keras tfjs_model/

When to Use TensorFlow

Perfect for:

Production deployment at scale
Mobile and edge deployment (TF Lite)
Browser-based ML (TensorFlow.js)
Large-scale distributed training
TPU training on Google Cloud
Enterprise environments

Not ideal for:

Quick research prototyping (use PyTorch)
When you need maximum flexibility
Small projects where Keras overhead isn’t worth it

PyTorch: Research-First Deep Learning

PyTorch is Facebook/Meta’s deep learning framework, beloved by researchers for its flexibility and Pythonic design.

Philosophy and Design

PyTorch prioritizes:

Pythonic: Feels like writing regular Python
Dynamic graphs: Define-by-run for flexibility
Debugging: Standard Python debugging tools work
Research-friendly: Easy to experiment with new ideas

Strengths

Dynamic computation graphs: Flexibility for complex architectures
Intuitive: Feels like native Python
Excellent debugging: Use pdb, print statements, etc.
Strong community: Most research papers use PyTorch
Hugging Face integration: State-of-the-art NLP models
TorchScript: Production deployment option

Complete Example: Image Classification

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms
from tqdm import tqdm
import numpy as np

# Check GPU availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Data transformations
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

# Load CIFAR-10
train_dataset = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True, transform=train_transform
)
test_dataset = torchvision.datasets.CIFAR10(
    root='./data', train=False, download=True, transform=test_transform
)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=256, shuffle=False, num_workers=4)

# Define CNN architecture
class CNN(nn.Module):
    def __init__(self, num_classes=10):
        super(CNN, self).__init__()

        # Feature extractor
        self.features = nn.Sequential(
            # Block 1
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            nn.Dropout(0.25),

            # Block 2
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            nn.Dropout(0.25),

            # Block 3
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            nn.Dropout(0.25),
        )

        # Classifier
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(256 * 4 * 4, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

# Initialize model
model = CNN(num_classes=10).to(device)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)

# Learning rate scheduler
scheduler = optim.lr_scheduler.OneCycleLR(
    optimizer,
    max_lr=0.01,
    epochs=50,
    steps_per_epoch=len(train_loader)
)

# Training function
def train_epoch(model, loader, criterion, optimizer, scheduler, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    pbar = tqdm(loader, desc='Training')
    for inputs, targets in pbar:
        inputs, targets = inputs.to(device), targets.to(device)

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, targets)

        # Backward pass
        optimizer.zero_grad()
        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

        optimizer.step()
        scheduler.step()

        # Statistics
        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        pbar.set_postfix({
            'loss': running_loss / (pbar.n + 1),
            'acc': 100. * correct / total
        })

    return running_loss / len(loader), correct / total

# Evaluation function
@torch.no_grad()
def evaluate(model, loader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, targets in loader:
        inputs, targets = inputs.to(device), targets.to(device)

        outputs = model(inputs)
        loss = criterion(outputs, targets)

        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

    return running_loss / len(loader), correct / total

# Training loop
best_acc = 0
epochs = 50

for epoch in range(epochs):
    print(f"\nEpoch {epoch+1}/{epochs}")

    train_loss, train_acc = train_epoch(
        model, train_loader, criterion, optimizer, scheduler, device
    )

    test_loss, test_acc = evaluate(model, test_loader, criterion, device)

    print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2%}")
    print(f"Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2%}")

    # Save best model
    if test_acc > best_acc:
        best_acc = test_acc
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'best_acc': best_acc,
        }, 'best_model.pth')
        print(f"New best model saved with accuracy: {best_acc:.2%}")

print(f"\nBest Test Accuracy: {best_acc:.2%}")

Transfer Learning with PyTorch

import torchvision.models as models

def create_transfer_model(num_classes, model_name='resnet18', pretrained=True):
    # Load pretrained model
    if model_name == 'resnet18':
        model = models.resnet18(weights='IMAGENET1K_V1' if pretrained else None)
        num_features = model.fc.in_features
        model.fc = nn.Linear(num_features, num_classes)

    elif model_name == 'resnet50':
        model = models.resnet50(weights='IMAGENET1K_V1' if pretrained else None)
        num_features = model.fc.in_features
        model.fc = nn.Linear(num_features, num_classes)

    elif model_name == 'efficientnet':
        model = models.efficientnet_b0(weights='IMAGENET1K_V1' if pretrained else None)
        num_features = model.classifier[1].in_features
        model.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(num_features, num_classes)
        )

    return model

# Freeze and unfreeze layers for fine-tuning
def freeze_layers(model, freeze_until='layer3'):
    for name, param in model.named_parameters():
        if freeze_until in name:
            break
        param.requires_grad = False

def unfreeze_all(model):
    for param in model.parameters():
        param.requires_grad = True

# Usage
model = create_transfer_model(num_classes=10, model_name='resnet18')
freeze_layers(model)  # First train only the classifier

# After a few epochs, unfreeze for fine-tuning
unfreeze_all(model)

Custom Loss Functions in PyTorch

class FocalLoss(nn.Module):
    """Focal Loss for addressing class imbalance."""

    def __init__(self, alpha=1, gamma=2, reduction='mean'):
        super(FocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.reduction = reduction

    def forward(self, inputs, targets):
        ce_loss = nn.functional.cross_entropy(inputs, targets, reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = self.alpha * (1 - pt) ** self.gamma * ce_loss

        if self.reduction == 'mean':
            return focal_loss.mean()
        elif self.reduction == 'sum':
            return focal_loss.sum()
        return focal_loss

class LabelSmoothingLoss(nn.Module):
    """Label smoothing for better generalization."""

    def __init__(self, num_classes, smoothing=0.1):
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.num_classes = num_classes

    def forward(self, pred, target):
        pred = pred.log_softmax(dim=-1)
        with torch.no_grad():
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.num_classes - 1))
            true_dist.scatter_(1, target.unsqueeze(1), self.confidence)
        return torch.mean(torch.sum(-true_dist * pred, dim=-1))

Mixed Precision Training

from torch.cuda.amp import autocast, GradScaler

# Initialize gradient scaler for mixed precision
scaler = GradScaler()

def train_epoch_mixed_precision(model, loader, criterion, optimizer, scaler, device):
    model.train()
    running_loss = 0.0

    for inputs, targets in loader:
        inputs, targets = inputs.to(device), targets.to(device)

        optimizer.zero_grad()

        # Mixed precision forward pass
        with autocast():
            outputs = model(inputs)
            loss = criterion(outputs, targets)

        # Scaled backward pass
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        running_loss += loss.item()

    return running_loss / len(loader)

When to Use PyTorch

Perfect for:

Research and experimentation
Custom neural network architectures
When debugging and flexibility are important
Working with Hugging Face transformers
Academic projects and publications
Quick prototyping of new ideas

Not ideal for:

Production deployment without additional tools
Mobile deployment (though improving)
When TensorFlow ecosystem is required

Head-to-Head Comparison

Syntax Comparison

Creating a simple neural network:

# Scikit-learn (using MLPClassifier)
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(hidden_layer_sizes=(128, 64), max_iter=1000)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# TensorFlow/Keras
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
model.fit(X_train, y_train, epochs=10)
predictions = model.predict(X_test)

# PyTorch
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

model = Net()
optimizer = optim.Adam(model.parameters())
# Training loop required

Performance Comparison

Aspect	Scikit-learn	TensorFlow	PyTorch
Training Speed (CPU)	Fast	Medium	Medium
Training Speed (GPU)	N/A	Fast	Fast
Inference Speed	Fast	Fast	Fast
Memory Efficiency	Good	Good	Good
Startup Time	Fast	Slow	Medium
Model Size	Small	Medium	Medium

Ecosystem Comparison

Feature	Scikit-learn	TensorFlow	PyTorch
Visualization	Matplotlib	TensorBoard	TensorBoard/Weights&Biases
Serving	Limited	TF Serving	TorchServe
Mobile	No	TF Lite	PyTorch Mobile
Browser	No	TensorFlow.js	ONNX.js
NLP	Limited	TF Hub	Hugging Face
CV	Limited	TF Hub	torchvision

Decision Framework

Choose Scikit-learn if:

Working with tabular/structured data
Using classical ML algorithms (not deep learning)
Need quick prototyping and experimentation
Dataset fits in memory
Interpretability is important
Team is new to ML

Choose TensorFlow if:

Deploying to production at scale
Need mobile/edge deployment
Using Google Cloud/TPUs
Building end-to-end ML pipelines
Need comprehensive ecosystem
Enterprise environment

Choose PyTorch if:

Doing research or experimentation
Building custom architectures
Working with NLP (Hugging Face)
Need maximum flexibility
Debugging is important
Academic environment

Using Multiple Libraries Together

In practice, you’ll often use multiple libraries:

# Preprocessing with Scikit-learn
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Model with PyTorch
import torch
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.LongTensor(y_train)

# Evaluation with Scikit-learn metrics
from sklearn.metrics import classification_report
predictions = model(X_test_tensor).argmax(dim=1).numpy()
print(classification_report(y_test, predictions))

# Visualization with TensorBoard
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()

Conclusion

Each library has its strengths:

Scikit-learn: Best for classical ML, tabular data, and quick experimentation
TensorFlow: Best for production deployment and comprehensive ecosystem
PyTorch: Best for research, flexibility, and modern NLP

The “best” library depends on your specific needs. Many practitioners use all three, choosing the right tool for each task. Start with one, become proficient, then expand your toolkit as needed.

Recommended Learning Path

Start with Scikit-learn: Learn ML fundamentals
Add PyTorch or TensorFlow: Choose based on your focus (research vs. production)
Master one deep learning framework: Go deep before going wide
Learn the ecosystem: TensorBoard, Weights & Biases, MLflow
Production skills: Docker, APIs, model serving

Further Resources

Scikit-learn: Official docs and user guide
TensorFlow: TensorFlow tutorials and Keras documentation
PyTorch: PyTorch tutorials and documentation
Fast.ai: Practical deep learning course (uses PyTorch)
Coursera/Deeplearning.ai: Comprehensive ML courses