Skip to content

Commit

Permalink
Fetching Data (#4)
Browse files Browse the repository at this point in the history
  • Loading branch information
s2t2 authored Jun 28, 2024
1 parent e25b826 commit 4824928
Show file tree
Hide file tree
Showing 20 changed files with 338 additions and 72 deletions.
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@



pip:
pip install -r docs/requirements.txt

build:
quarto render docs/
open docs/_build/index.html
Expand Down
126 changes: 70 additions & 56 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,22 +47,22 @@ website:
contents:
- section:
href: notes/dev-tools/google-colab/overview.qmd
#contents:
# - section:
# href: notes/dev-tools/google-colab/table-of-contents.qmd
# text: "Table of Contents"
# - section:
# href: notes/dev-tools/google-colab/filesystem.ipynb
# text: "Accessing the Filesystem"
# - section:
# href: notes/dev-tools/google-colab/form-inputs.ipynb
# text: "Forms and Inputs"
# - section:
# href: notes/dev-tools/google-colab/notebook-secrets.qmd
# text: "Notebook Secrets"
# - section:
# href: notes/dev-tools/google-colab/advanced-integrations.ipynb
# text: "Advanced Integrations"
contents:
- section:
href: notes/dev-tools/google-colab/table-of-contents.qmd
text: "Table of Contents"
- section:
href: notes/dev-tools/google-colab/filesystem.ipynb
text: "Accessing the Filesystem"
- section:
href: notes/dev-tools/google-colab/form-inputs.ipynb
text: "Forms and Inputs"
- section:
href: notes/dev-tools/google-colab/notebook-secrets.qmd
text: "Notebook Secrets"
- section:
href: notes/dev-tools/google-colab/advanced-integrations.ipynb
text: "Advanced Integrations"
- section:
href: notes/dev-tools/pip.ipynb
text: "Installing Packages with Pip"
Expand Down Expand Up @@ -91,11 +91,6 @@ website:
href: notes/python-lang/basic-datatypes/numbers.qmd
- section:
href: notes/python-lang/basic-datatypes/strings.qmd
# container datatypes here:
#- section:
# href: notes/python-lang/container-datatypes/lists.qmd
#- section:
# href: notes/python-lang/container-datatypes/dictionaries.qmd

- section:
href: notes/python-lang/python-operators.qmd
Expand All @@ -116,20 +111,19 @@ website:
# href: notes/python-lang/control-flow/errors.qmd
- section:
href: notes/python-lang/control-flow/while-loops.qmd
#text: "While Loops, Counters, and Accumulators"

- section:
#href: notes/python-lang/container-datatypes/index.qmd
href: notes/python-lang/container-datatypes/index.qmd
text: "Container Datatypes"
contents:
- section:
href: notes/python-lang/container-datatypes/lists.qmd
- section:
href: notes/python-lang/container-datatypes/dictionaries.qmd

#- section:
# href: notes/python-modules/index.qmd
# text: "Python Modules"
- section:
href: notes/python-modules/index.qmd
text: "Python Modules"
# contents:
# - section:
# href: notes/python-modules/datetime.qmd
Expand All @@ -145,9 +139,9 @@ website:
href: notes/data-processing/index.qmd
text: "Data Processing"
contents:
#- section:
# href: notes/data-processing/looping.qmd
# text: "List Iteration and Looping"
- section:
href: notes/data-processing/for-loops.qmd
text: "List Iteration and Looping"
- section:
href: notes/data-processing/sorting.qmd
text: "Sorting Lists"
Expand Down Expand Up @@ -180,11 +174,17 @@ website:
- section:
href: notes/fetching-data/overview.qmd
text: "Fetching Data from the Internet"
#contents:
# - section:
# href: notes/data-processing/sorting.qmd
# text: "Sorting Lists"

contents:
- section:
href: notes/fetching-data/json-data.qmd
- section:
href: notes/fetching-data/csv-data.qmd
- section:
href: notes/fetching-data/html-web-scraping.qmd
#text: "HTML Data (Web Scraping)"
#- section:
# href: notes/fetching-data/apis.qmd
# text: "APIs"



Expand All @@ -196,8 +196,22 @@ website:
- section:
href: courses/applied-ds.qmd
text: "II. Applied Data Science" #"II. Applied Data Science for Finance in Python" #"II. Applied Data Science"
#contents:
# - applied-ds/unit2.qmd
contents:
#
# PANDAS PACKAGE OVERVIEW
#
#- section:
# href: notes/pandas/overview.ipynb
# text: "Pandas Package Overview"
# contents:
# - section:
# href: notes/pandas/dataframes.qmd
# text: "Dataframes"






- "---------"
- section:
Expand All @@ -212,27 +226,27 @@ website:
#
# APPENDICES ???
#
- "---------"
- text: "Appendices"
#- "---------"
- section:
text: "A. Google Colab Extras" # In Depth
contents:
- section:
href: notes/dev-tools/google-colab/table-of-contents.qmd
text: "Table of Contents"
- section:
href: notes/dev-tools/google-colab/filesystem.ipynb
text: "Accessing the Filesystem"
- section:
href: notes/dev-tools/google-colab/form-inputs.ipynb
text: "Forms and Inputs"
- section:
href: notes/dev-tools/google-colab/notebook-secrets.qmd
text: "Notebook Secrets"
- section:
href: notes/dev-tools/google-colab/advanced-integrations.ipynb
text: "Advanced Integrations"
#- text: "Appendices"
##- "---------"
#- section:
# text: "A. Google Colab Extras" # In Depth
# contents:
# - section:
# href: notes/dev-tools/google-colab/table-of-contents.qmd
# text: "Table of Contents"
# - section:
# href: notes/dev-tools/google-colab/filesystem.ipynb
# text: "Accessing the Filesystem"
# - section:
# href: notes/dev-tools/google-colab/form-inputs.ipynb
# text: "Forms and Inputs"
# - section:
# href: notes/dev-tools/google-colab/notebook-secrets.qmd
# text: "Notebook Secrets"
# - section:
# href: notes/dev-tools/google-colab/advanced-integrations.ipynb
# text: "Advanced Integrations"

format:
# https://quarto.org/docs/reference/formats/html.html#table-of-contents
Expand Down
6 changes: 3 additions & 3 deletions docs/notes/data-processing/filtering.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ execute:
A **filter operation** applies a filter condition to arrive at a subset of the data, where only the items that match the filter condition are retained.


We saw we can access a particular item in a list by its numeric position, but a filter condition will allow us to access only the items that match some condition.
We saw we can access a particular item in a list by its numeric position, and we can access a sequential subset using list slicing, but a filter condition will allow us to access only the items that match some condition.

The simplest way to implement this is by introducing an "if" statement into the scope of the loop:
The simplest way to implement this is by introducing an \"if\" statement into the scope of the loop:

```{python}
my_numbers = [1, 2, 3, 4, 5, 6, 7]
Expand All @@ -29,7 +29,7 @@ for n in my_numbers:

We see we are only printing numbers that match the condition.

However in this case we lose access to the matching items. To retain access for later, we can implement a familiar collection operation using the `append` function, similar to the [mapping](./mapping.qmd) operation:
However in this case we lose access to the matching items. To retain access for later, we can implement a familiar collection operation using the `append` method, similar to the [mapping](./mapping.qmd) operation:

```{python}
matching_nums = []
Expand Down
3 changes: 1 addition & 2 deletions docs/notes/data-processing/for-loops.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ We can use a **\"for\" loop** to access each item one at a time:
```{python}
print("TOP")
symbols = [""]
for item in symbols:
print("--------")
print(item)
Expand Down Expand Up @@ -60,4 +59,4 @@ for symbol in symbols:
print("BOTTOM")
```

Loops are essentially important and foundational. They will form the basis of more advanced operations, such as [mapping](./mapping.qmd) and [filtering](./filtering.qmd).
Loops are essential and foundational. They will form the basis of more advanced operations, such as [mapping](./mapping.qmd) and [filtering](./filtering.qmd).
6 changes: 3 additions & 3 deletions docs/notes/data-processing/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

Let's work with lists in much more detail, as we study list-based data processing techniques:

+ [List Sorting](./sorting.qmd)
+ [Sorting Lists](./sorting.qmd)
+ [List Iteration and Looping](./for-loops.qmd)
+ [List Mapping](./mapping.qmd)
+ [List Filtering](./filtering.qmd)
+ [Mapping Lists](./mapping.qmd)
+ [Filtering Lists](./filtering.qmd)
+ [List Comprehensions](./list-comprehensions.qmd)
6 changes: 4 additions & 2 deletions docs/notes/data-processing/mapping.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ for symbol in symbols:

To retain a list of the transformed items, we'll need to store them for later.

In practice, to perform a mapping operation, we start with an empty list that will contain the transformed values. Then we loop through the original list as normal, but within the loop we can collect or append each transformed item into the new list.
In practice, to perform a mapping operation, we start with an empty list that will contain the transformed values. Then we loop through the original list as normal, but within the loop we can collect or append each transformed item into the new list. Then when the loop is finished, our new list will be full.

```{python}
new_list = []
Expand All @@ -39,7 +39,7 @@ for symbol in symbols:
print(new_list)
```

To illustrate the iterative collection of items, we can print the full list within the scope of the loop, and see it iteratively grow with each passing of the loop, however we will seldom do this in practice:
To illustrate the iterative collection of items, we can print the full list within the scope of the loop, and see it incrementally grow with each passing of the loop (however we will seldom do this in practice):



Expand All @@ -51,5 +51,7 @@ for symbol in symbols:
new_list.append(symbol.lower()) # COLLECT FOR LATER
print(new_list) # JUST FOR ILLUSTRATIVE PURPOSES
print("---------")
print("BOTTOM")
print(new_list)
```
9 changes: 8 additions & 1 deletion docs/notes/data-processing/sorting.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,19 @@ sorted(symbols, reverse=True) # DESCENDING ORDER

We see that if we have a simple list, such as a list of numbers or list of strings, the `sorted` function will know how to sort the items. It understands numeric order in which 2 is greater than 1, and it understands alphabetical order in which "b" is greater than "a".


```{python}
print(1 < 2)
print("a" < "b")
```
But it is not possible to compare dictionaries using greater than or less than operators:

```{python}
# print({} < {}) # INVALID
#> TypeError: '<' not supported between instances of 'dict' and 'dict'
```

However if we have a complex list, such as a list of dictionaries, we need to specify on the basis of which key we should sort on.
So if we have a complex list, such as a list of dictionaries, we need to specify on the basis of which key we should sort on.


```{python}
Expand Down
4 changes: 2 additions & 2 deletions docs/notes/dataviz/candlestick-charts.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ execute:

# Candlestick Charts with `plotly`

In financial applications, we often have access to OHLC data (containing the open, high, low, and close price. We can use a candlestick chart can help us see the movement of the price within each day.
In financial applications, we often have access to OHLC data (containing the open, high, low, and close price on each day). We can use a candlestick chart can help us see the movement of the price within each day.


Let's make a [candlestick chart](ttps://plotly.com/python/candlestick-charts/), using the [`Candlestick` class](https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Candlestick.html) from plotly's Graph Objects sub-library.
To implement a [candlestick chart](ttps://plotly.com/python/candlestick-charts/), we can use the [`Candlestick` class](https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Candlestick.html) from plotly's Graph Objects sub-library.

We start with some OHLC data:

Expand Down
2 changes: 2 additions & 0 deletions docs/notes/dataviz/overview.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,8 @@ fig.show()

### Scatter Plots

We can use a scatter plot to examine the relationship between two variables (`x` and `y`).

Starting with some example data:

```{python}
Expand Down
6 changes: 5 additions & 1 deletion docs/notes/dataviz/trendlines.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ execute:

# Charts with Trendlines





Consider the previous scatter plot example:


Expand Down Expand Up @@ -64,7 +68,7 @@ fig.show()
Under the hood, `plotly` uses the `statsmodels` package to calculate the trend, so you may have to install that package as well.
:::

In addition to "ols" trend, which is an Ordinary Least Squares linear trend, we can use a "lowess" trend which is non-parametric:
In addition to \"ols\" trend, which is an Ordinary Least Squares linear trend, we can use a \"lowess\" trend which is a [non-parametric method](https://www.investopedia.com/terms/n/nonparametric-statistics.asp) that can be a better fit for non-linear relationships:

```{python}
from plotly.express import scatter
Expand Down
1 change: 1 addition & 0 deletions docs/notes/fetching-data/apis.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# APIs
28 changes: 28 additions & 0 deletions docs/notes/fetching-data/csv-data.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Fetching CSV Data


If the data you want to fetch is in CSV format, we can use the `pandas` package to fetch and process it.

First we note the URL of where the data resides. Then we pass that as a parameter to the [`read_csv` function](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) from the `pandas` package, to issue an HTTP GET request:

```{python}
from pandas import read_csv
# the URL of some CSV data we stored online:
request_url = "https://raw.githubusercontent.com/prof-rossetti/python-for-finance/main/docs/data/gradebook.csv"
df = read_csv(request_url)
print(type(df))
df
```

The resulting data is a [`DataFrame` object](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) from `pandas`. We will return to working with dataframes in more detail in the future. But as some foreshadowing, if we wanted to work with the column of grades, we could access them like this:

```{python}
df["final_grade"]
```

```{python}
print(df["final_grade"].mean())
print(df["final_grade"].median())
```
Loading

0 comments on commit 4824928

Please sign in to comment.