Course Content
Machine Learning in just 30 Days
0/39
Data Science 30 Days Course easy to learn

    Welcome to Day 14 of the 30 Days of Data Science Series! Today, we’re diving into Linear Discriminant Analysis (LDA), a powerful technique for classification and dimensionality reduction. By the end of this lesson, you’ll understand the concept, implementation, and evaluation of LDA in Python.


    1. What is Linear Discriminant Analysis (LDA)?

    LDA is a supervised learning algorithm used for classification and dimensionality reduction. It projects data points onto a lower-dimensional space while maximizing the separation between multiple classes. LDA assumes that the data for each class is generated from a Gaussian distribution with the same covariance matrix.

    Key Concepts:

    1. Mean Vectors: Compute the mean vector for each class.

    2. Scatter Matrices:

      • Within-Class Scatter Matrix: Measures the spread of features within each class.

      • Between-Class Scatter Matrix: Measures the spread of the means of each class.

    3. Eigenvalue Problem: Solve the generalized eigenvalue problem to find the linear discriminants.

    4. Linear Discriminants: Select the top eigenvectors to form a matrix for projecting the data.

    5. Projection: Transform the original data onto the new subspace.


    2. When to Use LDA?

    • When you need to reduce dimensionality while preserving class separability.

    • For classification tasks where the data is assumed to be Gaussian distributed.

    • Applications include face recognition, bioinformatics, and marketing.


    3. Implementation in Python

    Let’s implement LDA on the Iris dataset for classification and visualization.

    Step 1: Import Libraries

    python
    Copy
    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
    from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
    import matplotlib.pyplot as plt
    import seaborn as sns

    Step 2: Load and Prepare the Data

    We’ll use the Iris dataset, which has four features (sepal length, sepal width, petal length, petal width) and three classes (species of iris flowers).

    python
    Copy
    # Load Iris dataset
    iris = load_iris()
    X = iris.data  # Features
    y = iris.target  # Target (species)

    Step 3: Train-Test Split

    python
    Copy
    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

    Step 4: Train the LDA Model

    python
    Copy
    # Create and train the LDA model
    lda = LinearDiscriminantAnalysis()
    lda.fit(X_train, y_train)

    Step 5: Make Predictions

    python
    Copy
    # Make predictions on the test set
    y_pred = lda.predict(X_test)

    Step 6: Evaluate the Model

    Accuracy

    python
    Copy
    accuracy = accuracy_score(y_test, y_pred)
    print("Accuracy:", accuracy)

    Output:

     
    Copy
    Accuracy: 1.0

    Confusion Matrix

    python
    Copy
    conf_matrix = confusion_matrix(y_test, y_pred)
    print("Confusion Matrix:n", conf_matrix)

    Output:

     
    Copy
    Confusion Matrix:
     [[11  0  0]
      [ 0 13  0]
      [ 0  0  6]]

    Classification Report

    python
    Copy
    class_report = classification_report(y_test, y_pred)
    print("Classification Report:n", class_report)

    Output:

     
    Copy
    Classification Report:
                   precision    recall  f1-score   support
               0       1.00      1.00      1.00        11
               1       1.00      1.00      1.00        13
               2       1.00      1.00      1.00         6
        accuracy                           1.00        30
       macro avg       1.00      1.00      1.00        30
    weighted avg       1.00      1.00      1.00        30

    Step 7: Transform the Data for Visualization

    LDA can also be used for dimensionality reduction. We’ll project the data onto the first two LDA components.

    python
    Copy
    # Transform the data
    X_lda = lda.transform(X)
    
    # Plot the LDA result
    plt.figure(figsize=(8, 6))
    sns.scatterplot(x=X_lda[:, 0], y=X_lda[:, 1], hue=iris.target_names[y], palette='Set1')
    plt.title('LDA of Iris Dataset')
    plt.xlabel('LDA Component 1')
    plt.ylabel('LDA Component 2')
    plt.show()

    4. Key Takeaways

    • LDA is a supervised technique for classification and dimensionality reduction.

    • It maximizes class separability by projecting data onto a lower-dimensional space.

    • It assumes that the data for each class is Gaussian distributed with the same covariance matrix.


    5. Applications of LDA

    • Face Recognition: Reducing the dimensionality of facial features while preserving class separability.

    • Bioinformatics: Classifying gene expression data.

    • Marketing: Segmenting customers based on purchasing behavior.


    6. Practice Exercise

    1. Experiment with different datasets (e.g., Wine dataset) and observe how LDA performs.

    2. Compare LDA with PCA for dimensionality reduction on the same dataset.

    3. Apply LDA to a real-world classification problem (e.g., email spam detection) and evaluate the results.


    7. Additional Resources


    That’s it for Day 14! Tomorrow, we’ll explore Gaussian Mixture Models (GMM), another powerful clustering algorithm. Keep practicing, and feel free to ask questions in the comments! 🚀

    Scroll to Top
    Verified by MonsterInsights