Course Content
Machine Learning in just 30 Days
0/39
Data Science 30 Days Course easy to learn
About Lesson

Welcome to Day 14 of the 30 Days of Data Science Series! Today, we’re diving into Linear Discriminant Analysis (LDA), a powerful technique for classification and dimensionality reduction. By the end of this lesson, you’ll understand the concept, implementation, and evaluation of LDA in Python.


1. What is Linear Discriminant Analysis (LDA)?

LDA is a supervised learning algorithm used for classification and dimensionality reduction. It projects data points onto a lower-dimensional space while maximizing the separation between multiple classes. LDA assumes that the data for each class is generated from a Gaussian distribution with the same covariance matrix.

Key Concepts:

  1. Mean Vectors: Compute the mean vector for each class.

  2. Scatter Matrices:

    • Within-Class Scatter Matrix: Measures the spread of features within each class.

    • Between-Class Scatter Matrix: Measures the spread of the means of each class.

  3. Eigenvalue Problem: Solve the generalized eigenvalue problem to find the linear discriminants.

  4. Linear Discriminants: Select the top eigenvectors to form a matrix for projecting the data.

  5. Projection: Transform the original data onto the new subspace.


2. When to Use LDA?

  • When you need to reduce dimensionality while preserving class separability.

  • For classification tasks where the data is assumed to be Gaussian distributed.

  • Applications include face recognition, bioinformatics, and marketing.


3. Implementation in Python

Let’s implement LDA on the Iris dataset for classification and visualization.

Step 1: Import Libraries

python
Copy
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

Step 2: Load and Prepare the Data

We’ll use the Iris dataset, which has four features (sepal length, sepal width, petal length, petal width) and three classes (species of iris flowers).

python
Copy
# Load Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target (species)

Step 3: Train-Test Split

python
Copy
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Step 4: Train the LDA Model

python
Copy
# Create and train the LDA model
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)

Step 5: Make Predictions

python
Copy
# Make predictions on the test set
y_pred = lda.predict(X_test)

Step 6: Evaluate the Model

Accuracy

python
Copy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Output:

 
Copy
Accuracy: 1.0

Confusion Matrix

python
Copy
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:n", conf_matrix)

Output:

 
Copy
Confusion Matrix:
 [[11  0  0]
  [ 0 13  0]
  [ 0  0  6]]

Classification Report

python
Copy
class_report = classification_report(y_test, y_pred)
print("Classification Report:n", class_report)

Output:

 
Copy
Classification Report:
               precision    recall  f1-score   support
           0       1.00      1.00      1.00        11
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00         6
    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Step 7: Transform the Data for Visualization

LDA can also be used for dimensionality reduction. We’ll project the data onto the first two LDA components.

python
Copy
# Transform the data
X_lda = lda.transform(X)

# Plot the LDA result
plt.figure(figsize=(8, 6))
sns.scatterplot(x=X_lda[:, 0], y=X_lda[:, 1], hue=iris.target_names[y], palette='Set1')
plt.title('LDA of Iris Dataset')
plt.xlabel('LDA Component 1')
plt.ylabel('LDA Component 2')
plt.show()

4. Key Takeaways

  • LDA is a supervised technique for classification and dimensionality reduction.

  • It maximizes class separability by projecting data onto a lower-dimensional space.

  • It assumes that the data for each class is Gaussian distributed with the same covariance matrix.


5. Applications of LDA

  • Face Recognition: Reducing the dimensionality of facial features while preserving class separability.

  • Bioinformatics: Classifying gene expression data.

  • Marketing: Segmenting customers based on purchasing behavior.


6. Practice Exercise

  1. Experiment with different datasets (e.g., Wine dataset) and observe how LDA performs.

  2. Compare LDA with PCA for dimensionality reduction on the same dataset.

  3. Apply LDA to a real-world classification problem (e.g., email spam detection) and evaluate the results.


7. Additional Resources


That’s it for Day 14! Tomorrow, we’ll explore Gaussian Mixture Models (GMM), another powerful clustering algorithm. Keep practicing, and feel free to ask questions in the comments! 🚀

Scroll to Top
Verified by MonsterInsights