Fetching Data (#4)

prof-rossetti · Jun 28, 2024 · 4824928 · 4824928
1 parent e25b826
commit 4824928
Show file tree

Hide file tree

Showing 20 changed files with 338 additions and 72 deletions.
diff --git a/Makefile b/Makefile
@@ -1,5 +1,9 @@
 
 
+
+pip:
+	pip install -r docs/requirements.txt
+
 build:
 	quarto render docs/
 	open docs/_build/index.html

diff --git a/docs/_quarto.yml b/docs/_quarto.yml
@@ -47,22 +47,22 @@ website:
         contents:
           - section:
             href: notes/dev-tools/google-colab/overview.qmd
-            #contents:
-            #  - section:
-            #    href: notes/dev-tools/google-colab/table-of-contents.qmd
-            #    text: "Table of Contents"
-            #  - section:
-            #    href: notes/dev-tools/google-colab/filesystem.ipynb
-            #    text: "Accessing the Filesystem"
-            #  - section:
-            #    href: notes/dev-tools/google-colab/form-inputs.ipynb
-            #    text: "Forms and Inputs"
-            #  - section:
-            #    href: notes/dev-tools/google-colab/notebook-secrets.qmd
-            #    text: "Notebook Secrets"
-            #  - section:
-            #    href: notes/dev-tools/google-colab/advanced-integrations.ipynb
-            #    text: "Advanced Integrations"
+            contents:
+              - section:
+                href: notes/dev-tools/google-colab/table-of-contents.qmd
+                text: "Table of Contents"
+              - section:
+                href: notes/dev-tools/google-colab/filesystem.ipynb
+                text: "Accessing the Filesystem"
+              - section:
+                href: notes/dev-tools/google-colab/form-inputs.ipynb
+                text: "Forms and Inputs"
+              - section:
+                href: notes/dev-tools/google-colab/notebook-secrets.qmd
+                text: "Notebook Secrets"
+              - section:
+                href: notes/dev-tools/google-colab/advanced-integrations.ipynb
+                text: "Advanced Integrations"
           - section:
             href: notes/dev-tools/pip.ipynb
             text: "Installing Packages with Pip"
@@ -91,11 +91,6 @@ website:
                     href: notes/python-lang/basic-datatypes/numbers.qmd
                   - section:
                     href: notes/python-lang/basic-datatypes/strings.qmd
-                  # container datatypes here:
-                  #- section:
-                  #  href: notes/python-lang/container-datatypes/lists.qmd
-                  #- section:
-                  #  href: notes/python-lang/container-datatypes/dictionaries.qmd
 
               - section:
                 href: notes/python-lang/python-operators.qmd
@@ -116,20 +111,19 @@ website:
                   #  href: notes/python-lang/control-flow/errors.qmd
                   - section:
                     href: notes/python-lang/control-flow/while-loops.qmd
-                    #text: "While Loops, Counters, and Accumulators"
 
               - section:
-                #href: notes/python-lang/container-datatypes/index.qmd
+                href: notes/python-lang/container-datatypes/index.qmd
                 text: "Container Datatypes"
                 contents:
                   - section:
                     href: notes/python-lang/container-datatypes/lists.qmd
                   - section:
                     href: notes/python-lang/container-datatypes/dictionaries.qmd
 
-              #- section:
-              #  href: notes/python-modules/index.qmd
-              #  text: "Python Modules"
+              - section:
+                href: notes/python-modules/index.qmd
+                text: "Python Modules"
               #  contents:
               #    - section:
               #      href: notes/python-modules/datetime.qmd
@@ -145,9 +139,9 @@ website:
             href: notes/data-processing/index.qmd
             text: "Data Processing"
             contents:
-              #- section:
-              #  href: notes/data-processing/looping.qmd
-              #  text: "List Iteration and Looping"
+              - section:
+                href: notes/data-processing/for-loops.qmd
+                text: "List Iteration and Looping"
               - section:
                 href: notes/data-processing/sorting.qmd
                 text: "Sorting Lists"
@@ -180,11 +174,17 @@ website:
           - section:
             href: notes/fetching-data/overview.qmd
             text: "Fetching Data from the Internet"
-            #contents:
-            #  - section:
-            #    href: notes/data-processing/sorting.qmd
-            #    text: "Sorting Lists"
-
+            contents:
+              - section:
+                href: notes/fetching-data/json-data.qmd
+              - section:
+                href: notes/fetching-data/csv-data.qmd
+              - section:
+                href: notes/fetching-data/html-web-scraping.qmd
+                #text: "HTML Data (Web Scraping)"
+              #- section:
+              #  href: notes/fetching-data/apis.qmd
+              #  text: "APIs"
 
 
 
@@ -196,8 +196,22 @@ website:
       - section:
         href: courses/applied-ds.qmd
         text: "II. Applied Data Science" #"II. Applied Data Science for Finance in Python" #"II. Applied Data Science"
-        #contents:
-        #  - applied-ds/unit2.qmd
+        contents:
+          #
+          # PANDAS PACKAGE OVERVIEW
+          #
+          #- section:
+          #  href: notes/pandas/overview.ipynb
+          #  text: "Pandas Package Overview"
+          #  contents:
+          #    - section:
+          #      href: notes/pandas/dataframes.qmd
+          #      text: "Dataframes"
+
+
+
+
+
 
       - "---------"
       - section:
@@ -212,27 +226,27 @@ website:
       #
       # APPENDICES ???
       #
-      - "---------"
-      - text: "Appendices"
       #- "---------"
-      - section:
-        text: "A. Google Colab Extras" # In Depth
-        contents:
-          - section:
-            href: notes/dev-tools/google-colab/table-of-contents.qmd
-            text: "Table of Contents"
-          - section:
-            href: notes/dev-tools/google-colab/filesystem.ipynb
-            text: "Accessing the Filesystem"
-          - section:
-            href: notes/dev-tools/google-colab/form-inputs.ipynb
-            text: "Forms and Inputs"
-          - section:
-            href: notes/dev-tools/google-colab/notebook-secrets.qmd
-            text: "Notebook Secrets"
-          - section:
-            href: notes/dev-tools/google-colab/advanced-integrations.ipynb
-            text: "Advanced Integrations"
+      #- text: "Appendices"
+      ##- "---------"
+      #- section:
+      #  text: "A. Google Colab Extras" # In Depth
+      #  contents:
+      #    - section:
+      #      href: notes/dev-tools/google-colab/table-of-contents.qmd
+      #      text: "Table of Contents"
+      #    - section:
+      #      href: notes/dev-tools/google-colab/filesystem.ipynb
+      #      text: "Accessing the Filesystem"
+      #    - section:
+      #      href: notes/dev-tools/google-colab/form-inputs.ipynb
+      #      text: "Forms and Inputs"
+      #    - section:
+      #      href: notes/dev-tools/google-colab/notebook-secrets.qmd
+      #      text: "Notebook Secrets"
+      #    - section:
+      #      href: notes/dev-tools/google-colab/advanced-integrations.ipynb
+      #      text: "Advanced Integrations"
 
 format:
   # https://quarto.org/docs/reference/formats/html.html#table-of-contents

diff --git a/docs/notes/data-processing/filtering.qmd b/docs/notes/data-processing/filtering.qmd
@@ -12,9 +12,9 @@ execute:
 A **filter operation** applies a filter condition to arrive at a subset of the data, where only the items that match the filter condition are retained.
 
 
-We saw we can access a particular item in a list by its numeric position, but a filter condition will allow us to access only the items that match some condition.
+We saw we can access a particular item in a list by its numeric position, and we can access a sequential subset using list slicing, but a filter condition will allow us to access only the items that match some condition.
 
-The simplest way to implement this is by introducing an "if" statement into the scope of the loop:
+The simplest way to implement this is by introducing an \"if\" statement into the scope of the loop:
 
 ```{python}
 my_numbers = [1, 2, 3, 4, 5, 6, 7]
@@ -29,7 +29,7 @@ for n in my_numbers:
 
 We see we are only printing numbers that match the condition.
 
-However in this case we lose access to the matching items. To retain access for later, we can implement a familiar collection operation using the `append` function, similar to the [mapping](./mapping.qmd) operation:
+However in this case we lose access to the matching items. To retain access for later, we can implement a familiar collection operation using the `append` method, similar to the [mapping](./mapping.qmd) operation:
 
 ```{python}
 matching_nums = []

diff --git a/docs/notes/data-processing/for-loops.qmd b/docs/notes/data-processing/for-loops.qmd
@@ -25,7 +25,6 @@ We can use a **\"for\" loop** to access each item one at a time:
 ```{python}
 print("TOP")
 
-symbols = [""]
 for item in symbols:
     print("--------")
     print(item)
@@ -60,4 +59,4 @@ for symbol in symbols:
 print("BOTTOM")
 ```
 
-Loops are essentially important and foundational. They will form the basis of more advanced operations, such as [mapping](./mapping.qmd) and [filtering](./filtering.qmd).
+Loops are essential and foundational. They will form the basis of more advanced operations, such as [mapping](./mapping.qmd) and [filtering](./filtering.qmd).
diff --git a/docs/notes/data-processing/index.qmd b/docs/notes/data-processing/index.qmd
@@ -2,8 +2,8 @@
 
 Let's work with lists in much more detail, as we study list-based data processing techniques:
 
-   + [List Sorting](./sorting.qmd)
+   + [Sorting Lists](./sorting.qmd)
    + [List Iteration and Looping](./for-loops.qmd)
-   + [List Mapping](./mapping.qmd)
-   + [List Filtering](./filtering.qmd)
+   + [Mapping Lists](./mapping.qmd)
+   + [Filtering Lists](./filtering.qmd)
    + [List Comprehensions](./list-comprehensions.qmd)
diff --git a/docs/notes/data-processing/mapping.qmd b/docs/notes/data-processing/mapping.qmd
@@ -27,7 +27,7 @@ for symbol in symbols:
 
 To retain a list of the transformed items, we'll need to store them for later.
 
-In practice, to perform a mapping operation, we start with an empty list that will contain the transformed values. Then we loop through the original list as normal, but within the loop we can collect or append each transformed item into the new list.
+In practice, to perform a mapping operation, we start with an empty list that will contain the transformed values. Then we loop through the original list as normal, but within the loop we can collect or append each transformed item into the new list. Then when the loop is finished, our new list will be full.
 
 ```{python}
 new_list = []
@@ -39,7 +39,7 @@ for symbol in symbols:
 print(new_list)
 ```
 
-To illustrate the iterative collection of items, we can print the full list within the scope of the loop, and see it iteratively grow with each passing of the loop, however we will seldom do this in practice:
+To illustrate the iterative collection of items, we can print the full list within the scope of the loop, and see it incrementally grow with each passing of the loop (however we will seldom do this in practice):
 
 
 
@@ -51,5 +51,7 @@ for symbol in symbols:
     new_list.append(symbol.lower()) # COLLECT FOR LATER
     print(new_list) # JUST FOR ILLUSTRATIVE PURPOSES
 
+print("---------")
+print("BOTTOM")
 print(new_list)
 ```
diff --git a/docs/notes/data-processing/sorting.qmd b/docs/notes/data-processing/sorting.qmd
@@ -57,12 +57,19 @@ sorted(symbols, reverse=True) # DESCENDING ORDER
 
 We see that if we have a simple list, such as a list of numbers or list of strings, the `sorted` function will know how to sort the items. It understands numeric order in which 2 is greater than 1, and it understands alphabetical order in which "b" is greater than "a".
 
+
 ```{python}
 print(1 < 2)
 print("a" < "b")
 ```
+But it is not possible to compare dictionaries using greater than or less than operators:
+
+```{python}
+# print({} < {}) # INVALID
+#> TypeError: '<' not supported between instances of 'dict' and 'dict'
+```
 
-However if we have a complex list, such as a list of dictionaries, we need to specify on the basis of which key we should sort on.
+So if we have a complex list, such as a list of dictionaries, we need to specify on the basis of which key we should sort on.
 
 
 ```{python}

diff --git a/docs/notes/dataviz/candlestick-charts.qmd b/docs/notes/dataviz/candlestick-charts.qmd
@@ -9,10 +9,10 @@ execute:
 
 # Candlestick Charts with `plotly`
 
-In financial applications, we often have access to OHLC data (containing the open, high, low, and close price. We can use a candlestick chart can help us see the movement of the price within each day.
+In financial applications, we often have access to OHLC data (containing the open, high, low, and close price on each day). We can use a candlestick chart can help us see the movement of the price within each day.
 
 
-Let's make a [candlestick chart](ttps://plotly.com/python/candlestick-charts/), using the [`Candlestick` class](https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Candlestick.html) from plotly's Graph Objects sub-library.
+To implement a [candlestick chart](ttps://plotly.com/python/candlestick-charts/), we can use the [`Candlestick` class](https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Candlestick.html) from plotly's Graph Objects sub-library.
 
 We start with some OHLC data:
 

diff --git a/docs/notes/dataviz/overview.qmd b/docs/notes/dataviz/overview.qmd
@@ -138,6 +138,8 @@ fig.show()
 
 ### Scatter Plots
 
+We can use a scatter plot to examine the relationship between two variables (`x` and `y`).
+
 Starting with some example data:
 
 ```{python}

diff --git a/docs/notes/dataviz/trendlines.qmd b/docs/notes/dataviz/trendlines.qmd
@@ -9,6 +9,10 @@ execute:
 
 # Charts with Trendlines
 
+
+
+
+
 Consider the previous scatter plot example:
 
 
@@ -64,7 +68,7 @@ fig.show()
 Under the hood, `plotly` uses the `statsmodels` package to calculate the trend, so you may have to install that package as well.
 :::
 
-In addition to "ols" trend, which is an Ordinary Least Squares linear trend, we can use a "lowess" trend which is non-parametric:
+In addition to \"ols\" trend, which is an Ordinary Least Squares linear trend, we can use a \"lowess\" trend which is a [non-parametric method](https://www.investopedia.com/terms/n/nonparametric-statistics.asp) that can be a better fit for non-linear relationships:
 
 ```{python}
 from plotly.express import scatter

diff --git a/docs/notes/fetching-data/apis.qmd b/docs/notes/fetching-data/apis.qmd
@@ -0,0 +1 @@
+# APIs
diff --git a/docs/notes/fetching-data/csv-data.qmd b/docs/notes/fetching-data/csv-data.qmd
@@ -0,0 +1,28 @@
+# Fetching CSV Data
+
+
+If the data you want to fetch is in CSV format, we can use the `pandas` package to fetch and process it.
+
+First we note the URL of where the data resides. Then we pass that as a parameter to the [`read_csv` function](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) from the `pandas` package, to issue an HTTP GET request:
+
+```{python}
+from pandas import read_csv
+
+# the URL of some CSV data we stored online:
+request_url = "https://raw.githubusercontent.com/prof-rossetti/python-for-finance/main/docs/data/gradebook.csv"
+
+df = read_csv(request_url)
+print(type(df))
+df
+```
+
+The resulting data is a [`DataFrame` object](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) from `pandas`. We will return to working with dataframes in more detail in the future. But as some foreshadowing, if we wanted to work with the column of grades, we could access them like this:
+
+```{python}
+df["final_grade"]
+```
+
+```{python}
+print(df["final_grade"].mean())
+print(df["final_grade"].median())
+```