This project analyzes electric vehicle data to identify frequent itemsets and generate association rules using the Apriori algorithm. Below are the steps for running the code and understanding the results.
Ensure you have the following installed in your Python environment:
pandas
mlxtend
You can install the required packages using the following command:
pip install pandas mlxtend
The analysis is based on the dataset Updated_Electric_Vehicle_Data_VIN.csv
. Ensure the file is present in the same directory as the script.
- County
- City
- State
- Model Year
- Make
- Model
- Electric Vehicle Type
- Clean Alternative Fuel Vehicle CAFV Eligibility
-
Data Preprocessing:
- Missing values are removed.
- Selected columns are combined into transactions (list format).
-
Transaction Encoding:
- Transactions are transformed into a binary matrix using
TransactionEncoder
.
- Transactions are transformed into a binary matrix using
-
Frequent Itemsets:
- The Apriori algorithm is applied with a minimum support of
0.2
. - Results are saved to
frequent_itemsets.csv
if frequent itemsets are found.
- The Apriori algorithm is applied with a minimum support of
-
Association Rules:
- Rules are generated from the frequent itemsets with a minimum confidence of
0.7
. - Results are sorted by confidence and saved to
association_rules.csv
.
- Rules are generated from the frequent itemsets with a minimum confidence of
-
Place the
Updated_Electric_Vehicle_Data_VIN.csv
dataset in the project directory. -
Execute the script using Python:
python script_name.py
-
Check the terminal output for frequent itemsets and top association rules.
-
Output files (
frequent_itemsets.csv
andassociation_rules.csv
) will be generated in the project directory.
-
Shows combinations of items that appear together frequently.
-
Example:
Itemset Support {'Model A'} 0.25
-
Provides relationships between items, including metrics such as support, confidence, and lift.
-
Example:
Antecedents Consequents Support Confidence Lift {'Model A'} {'City X'} 0.25 0.8 1.5
You can adjust the following parameters:
- Minimum Support: Change the value of
min_support
in the Apriori function. - Minimum Confidence: Modify
min_threshold
in theassociation_rules
function.
- Ensure the dataset has no missing or inconsistent data before running the script.
- The output files can be used for further analysis or visualization.
For issues or suggestions, please contact [Your Name/Email].