Data Preparation and Aggregation of Local Climatological Data Dataset

Overview

The dataset includes data points of climatic values at monthly, daily, and hourly intervals.

The average, variance, and standard deviation was calculated on data points for each interval. Monthly data points are grouped by year, daily grouped by day of the week, and hourly grouped by hour of the day.

All three intervals share some similar columns such as:

Wind speed (miles per hour)
Minimum/Maximum temperature (Fahrenheit)
Atmospheric pressure (in Hg)
Precipitation (inches)

There are several "outlier" columns:

Monthly: Days with Heavy Fog
Monthly: Days with Thunderstorms
Hourly: Relative Humidity (percentage)
Hourly: Visibility (miles)

Dataset and documenation:

West-Chicago-DuPage-Airport-Local-Climatological-Data-FROM-2014-01-01-TO-2023-09-03.csv contains the dataset.
LCD_documentation.pdf goes into greater detail about the contents of the dataset.

Tech

Spark.NET library is being used to implement the Spark job to perform data preparation and aggregation. The serverless Spark pools in Azure Synapse Analytics is being using to run the Spark job. CSV dataset file and subsequent parquet files are stored in Azure Data Lake Storage Gen2.

Implementation Steps Summary

CSV dataset is written to parquet files
Filter out unnecessary columns, including:
- Columns that don't have any values or only has one distinct value
- Redundant columns
- Raw data column
Split filtered DataFrame into three DataFrames based on their report type:
- Monthly summary DataFrame
- Daily summary DataFrame
- Hourly DataFrame
Cast several columns and perform some aggregation on each DataFrame, results shown below

Results (text files)

Other

"Home page" for U.S. Local Climatological Data.
Other climate data datasets provided by the NOAA can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
azure-synapse-analytics		azure-synapse-analytics
results		results
src/dotnet/LocalClimateData		src/dotnet/LocalClimateData
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
West-Chicago-DuPage-Airport-Local-Climatological-Data-FROM-2014-01-01-TO-2023-09-03.csv		West-Chicago-DuPage-Airport-Local-Climatological-Data-FROM-2014-01-01-TO-2023-09-03.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Preparation and Aggregation of Local Climatological Data Dataset

Overview

Tech

Implementation Steps Summary

Results (text files)

Other

About

Releases

Packages

Languages

License

jkrajcir/LocalClimateData

Folders and files

Latest commit

History

Repository files navigation

Data Preparation and Aggregation of Local Climatological Data Dataset

Overview

Tech

Implementation Steps Summary

Results (text files)

Other

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages