Population of all US Cities 2024

Overview

This notebook uses regression modeling to predict the annual population change of US cities based on their population in 2024 and 2020, population density, and area.

Objectives

Predict the annual population change.
Calculate the $R^2$ value and visualize the results with matplotlib.

Tools Used

numpy
pandas
scikit-learn
matplotlib
pickle

Dataset

This dataset provides detailed information about the population of 300 US cities for the years 2024 and 2020. It includes:

US City
US State
Popuation 2024 (x1)
Population 2020 (x2)
Annual change (y)
Density (x3)
Area (x4)

Model

We will use the KNeighborsRegressor model for this task. KNeighborsRegressor is suitable for understanding the relationship between the dependent variable (annual population change) and the independent variables (population in 2024 and 2020, population density, and area).

Credits

Dataset Author:

Ibrar Hussain

Model Author:

Kevin Thomas

Date:

07-06-24

Version:

1.0

import os
import requests
import zipfile
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
import pickle
import matplotlib.pyplot as plt

Step 1: Data Preparation

# Download and extract the dataset
url = 'https://www.kaggle.com/api/v1/datasets/download/dataanalyst001/population-of-all-us-cities-2024?datasetVersionNumber=1'
local_filename = 'archive.zip'
response = requests.get(url, stream=True)
if response.status_code == 200:
    with open(local_filename, 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print(f'Download completed: {local_filename}')
else:
    print(f'Failed to download the file. Status code: {response.status_code}')
if response.status_code == 200:
    with zipfile.ZipFile(local_filename, 'r') as zip_ref:
        zip_ref.extractall('.')
    print('Unzipping completed')
else:
    print('Skipping unzipping due to download failure')

# Load the dataset
data = pd.read_csv('Population of all US Cities 2024.csv')

# Observe data
data.head()

Download completed: archive.zip
Unzipping completed

	Rank	US City	US State	Population 2024	Population 2020	Annual Change	Density (/mile2)	Area (mile2)
0	1	New York	New York	8097282	8740292	-0.0195	26950	300.46
1	2	Los Angeles	California	3795936	3895848	-0.0065	8068	470.52
2	3	Chicago	Illinois	2638159	2743329	-0.0099	11584	227.75
3	4	Houston	Texas	2319119	2299269	0.0021	3620	640.61
4	5	Phoenix	Arizona	1662607	1612459	0.0076	3208	518.33

Step 2: Feature Engineering

# Drop unnecessary features
data = data.drop(["Rank", "US City", "US State"], axis=1)
data

# Split into X, y
X = data.drop("Annual Change", axis=1)
y = data["Annual Change"]

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.2)

Step 3: Modeling

# Train the model
model = KNeighborsRegressor(
    algorithm="auto", 
    leaf_size=20, 
    metric="euclidean", 
    n_neighbors=3, 
    weights="distance")
model.fit(X_train, y_train)
model.score(X_test, y_test)

0.8527974958533183

Step 4: Visualization

# Plot the results
y_pred = model.predict(X_test)
plt.scatter(y_test, y_pred, color='blue', alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
plt.xlabel('Actual Annual Change')
plt.ylabel('Predicted Annual Change')
plt.title('Actual vs Predicted Annual Change')
plt.show()

Step 5: Save & Load Model

# Save model 
pickle.dump(model, open("model.pkl", "wb"))

# Load the saved model
loaded_model = pickle.load(open("model.pkl", "rb"))

Step 6: Inference

# Inference
washington_dc_data = np.array([[8097282, 8740292, 26950, 300.46]])
columns = ['Population 2024', 'Population 2020', 'Density (/mile2)', 'Area (mile2)']
washington_dc_data_df = pd.DataFrame(washington_dc_data, columns=columns)
predicted_metrics = loaded_model.predict(washington_dc_data_df)
print(f"Predicted Annual Change: {predicted_metrics}")

Predicted Annual Change: [-0.00375231]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
population-of-all-us-cities-2024_files		population-of-all-us-cities-2024_files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
population-of-all-us-cities-2024.ipynb		population-of-all-us-cities-2024.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Population of all US Cities 2024

Overview

Objectives

Tools Used

Dataset

Model

Credits

Step 1: Data Preparation

Step 2: Feature Engineering

Step 3: Modeling

Step 4: Visualization

Step 5: Save & Load Model

Step 6: Inference

About

Releases

Packages

Languages

License

mytechnotalent/Population-of-all-US-Cities-2024

Folders and files

Latest commit

History

Repository files navigation

Population of all US Cities 2024

Overview

Objectives

Tools Used

Dataset

Model

Credits

Step 1: Data Preparation

Step 2: Feature Engineering

Step 3: Modeling

Step 4: Visualization

Step 5: Save & Load Model

Step 6: Inference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages