new content (#26)

* up to date (#1) * Cleanup * Use latest version of docker * Remove Docker first * Remove more of Docker * Start Docker after installing * fixed misspecification * breaking apart the workshop
CrunchyData · May 17, 2019 · 781849f · 781849f
1 parent 5e54e84
commit 781849f
Show file tree

Hide file tree

Showing 20 changed files with 418 additions and 4 deletions.
diff --git a/basic-postgresql-devel-pathway.json b/basic-postgresql-devel-pathway.json
@@ -1,4 +1,15 @@
 {
   "title": "CrunchyData Basic PostgreSQL for Developers",
+<<<<<<< HEAD
+  "courses": [
+    {
+      "external_link": "https://crunchydata.katacoda.com/basic-postgresql-devel/runcontainers/",
+      "course_id": "runcontainers",
+      "title": "Quick Intro. To PostgreSQL in Containers
+    }
+
+  ]
+=======
   "courses": []
+>>>>>>> 2930e32889613de3534aabf3bf13b86bb8514a27
 }
diff --git a/basic-postgresql-devel/.gitkeep b/basic-postgresql-devel/.gitkeep
diff --git a/basic-postgresql-devel/runcontainers/01-running-in-containers.md b/basic-postgresql-devel/runcontainers/01-running-in-containers.md
@@ -0,0 +1,119 @@
+In this exercise we will introduce containers (which you may know as Docker) and then spin up our PostgreSQL instance 
+using containers. 
+
+## A Little Background on Containers
+
+Containers have quite a long history in computing before Docker. At a simplistic level, containers "package up" applications
+and their dependencies to run with everything that is needed above the kernel OS. This allows for a cleaner separation 
+of dependencies as the container has all the things it needs to run except the kernel. Here is 
+[good introduction](https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b) 
+to Docker containers. Be aware that there are [other](https://containerd.io/) container runtimes and specifications besides
+Docker.
+
+Containers are spun up from a container image. In this class we will use "container" to denote the running container 
+and "image" to denote the binary used to spin up the container.
+
+Another advantage of images is that not only do they container the binaries for the application but they also are configured 
+and ready to run. With a container you can skip most of the configuration and just do some version of "container run"  
+
+In this class we will be using a image that contains Postgresql, PostGIS, embedded R, and some other extensions. If 
+you have ever tried to install all these pieces you know what a hassle it can be. Let's see how easy it can be with containers. 
+
+## Running PostgreSQL in Containers
+
+Crunchy Data has produced a full [suite of containers](https://github.com/CrunchyData/crunchy-containers) to make PostgreSQL
+simpler and easier to run in containerized environments. Today we will be using a container that was purposefully built for 
+developers. The container makes some tradeoffs
+1. It has the most used extensions already included in the binaries
+1. It only requires one environment variable - a password. Everything else is optional
+1. It doesn't require any volume mappings but allows for optional ones
+1. Its target user is a developer on their primary development machine
+1. Its not supported or intended for production use 
+1. It does not support replication or high availability scenarios
+
+It's goal is to get you up and running quickly and easily for your development work.  
+
+#### Simplest method
+
+Let's start with the quickest and easiest way to start up PostgreSQL using a container.
+
+`docker run -e PG_PASSWORD=password thesteve0/postgres-appdev`{{execute}}
+
+If you click the little check mark in the box above it will execute the command in the terminal window. 
+What you are doing is telling docker to run image 
+[_thesteve0/postgres-appdev_](https://cloud.docker.com/u/thesteve0/repository/docker/thesteve0/postgres-appdev) and pass 
+in the environment variable for what you want the password to be for both the standard user and the postgres (DBAdmin) user. 
+
+1. The default name for the primary database will be: mydb
+1. The default username is: rnduser2w3
+1. The default port will be: 5432
+1. And the postgres user password will be equal to the user password which you set in the command.
+
+**CONGRATULATIONS you just spun up a fully working PostgreSQL database with a bunch of functionality!**  
+
+But this is a pretty simplistic way to start PostgreSQL - great if you wanna just "get going quickly". 
+
+Because we didn't run the container in "detached" mode we never got our prompt back. Detached mode allows the container 
+to run in the background and give us back our prompt. To shut down the container click on tab  titled "Terminal 2" and 
+find out information on our running container:
+
+`docker ps`{{execute}}
+
+![dockerps](assets/docker_ps.jpg)
+
+Please note either the name or the ID of your running container (highlighted in red above). Now in the same terminal type 
+in the following command:     
+
+`docker kill <id or name of your container>`
+
+Docker kill is the way to stop your running container - it send the shutdown signal to the running container which should 
+kill the primary process in the container (in this case the PostgreSQL server process). 
+If you go back to the first tab, "Terminal" you will see that you get your prompt back. Let's start PostgreSQL more 
+appropriately for your daily work. 
+
+#### Better way to start the container
+
+Let's set a new username, give the container a fixed (rather than random) name, expose port 5432 from the container 
+into the VM we are running, and have it detach so we can get our prompt back. 
+
+`docker run -d -p 5432:5432 -e PG_USER=groot -e PG_PASSWORD=password -e PG_DATABASE=workshop --name=pgsql thesteve0/postgres-appdev`{{execute}}
+
+And with that we have now spun up PostgreSQL with
+1. The ability to connect from our VM to the instance running in the container
+1. Username: groot
+1. Password: password
+1. A database named: workshop
+1. A container named: pgsql
+
+If you want to now log into that running instance of PostgreSQL you can do:
+
+`psql -U groot -h localhost workshop`
+
+We don't need the port mapping because the psql cli assumes PostgreSQL to be running on port 5432.
+
+#### A little container management
+
+The good part about naming the container is that we can do things like - stop the container 
+
+`docker kill pgsql`{{execute}}
+
+and the start it again with all the same setting as last time
+
+`docker start pgsql`{{execute}}
+
+or 
+
+`docker restart pgsql`{{execute}}
+
+Not only will this retain the setting but all the data you added before will be there when you restart the container. 
+
+If you wanted to have PostgreSQL instances with different data or even different versions you could start up images into 
+containers with different names. This way you could spin them up and down as needed.
+
+If you want to see all the images on your machine just do the following command:
+
+`docker images`{{execute}}
+
+
+
+
diff --git a/basic-postgresql-devel/runcontainers/assets/docker_ps.jpg b/basic-postgresql-devel/runcontainers/assets/docker_ps.jpg
diff --git a/basic-postgresql-devel/runcontainers/env-init.sh b/basic-postgresql-devel/runcontainers/env-init.sh
@@ -0,0 +1,3 @@
+#!/usr/bin/bash
+
+# nothing for now
diff --git a/basic-postgresql-devel/runcontainers/finish.md b/basic-postgresql-devel/runcontainers/finish.md
@@ -0,0 +1,10 @@
+# Final Notes 
+
+The container used in this class is available [in Dockerhub](https://cloud.docker.com/u/thesteve0/repository/docker/thesteve0/postgres-appdev). 
+As long as you have Docker on your machine you can use the same version of PostgreSQL as the workshop. There is also data for [playing with at] (https://github.com/CrunchyData/crunchy-demo-data/releases/tag/v0.1) workshop was intentionally chosen 
+from public domain or permissive licenses so that you can use it for commercial and non-commercial purposes. Feel free 
+to download it and play some more at your own pace on your own machine.
+
+Now you have a quick and easy way to spin up PostgreSQL without installing binaries, compiling software, or any other 
+administrative tasks. And, if your whole team uses the same images to start their containers, you will all be running PostgreSQL
+the exact same way, making it easier to share knowledge.
diff --git a/basic-postgresql-devel/runcontainers/index.json b/basic-postgresql-devel/runcontainers/index.json
@@ -0,0 +1,30 @@
+{
+  "title": "Quick Intro. to Running PostgreSQL in Containers",
+  "description": "A brief introduction to running a developer-centric container of PostgreSQL",
+  "difficulty": "beginner",
+  "time": "10 minutes",
+  "details": {
+    "steps": [
+      {"title": "Running in Containers", "text": "01-running-in-containers.md"}
+    ],
+    "intro": {
+      "courseData": "env-init.sh",
+      "code": "set-env.sh",
+      "text": "intro.md",
+      "credits": ""
+    },
+    "finish": {
+      "text": "finish.md"
+    }
+  },
+  "environment": {
+    "uilayout": "terminal",
+    "uimessage1": "\u001b[32mYour Interactive Bash Terminal.\u001b[m\r\n",
+    "terminals": [
+      {"name": "Terminal 2", "target": "host01"}
+    ]
+  },
+  "backend": {
+    "imageid": "crunchydata-single1"
+  }
+}
diff --git a/basic-postgresql-devel/runcontainers/intro.md b/basic-postgresql-devel/runcontainers/intro.md
@@ -0,0 +1,10 @@
+# Using PostgreSQL In Containers
+
+This class is going to give you a quick introduction into running a PostgreSQL container geared towards the needs of application 
+developers. You probably know of containers by the name Docker, but they are actually a technology that has been around for 
+a while. 
+
+The goal of this class is to teach a little about containers, introduce you to the containers produced by CrunchyData, and
+show you how to use the container speficially built to make the lives of application developers easier.
+
+Enjoy!
diff --git a/basic-postgresql-devel/runcontainers/set-env.sh b/basic-postgresql-devel/runcontainers/set-env.sh
@@ -0,0 +1,3 @@
+#!/usr/bin/bash
+
+# Nothing Yet
diff --git a/homepage-pathway.json b/homepage-pathway.json
@@ -8,14 +8,14 @@
       "action": "Coming Soon"
     },
     {
-      "external_link": "https://crunchydata.katacoda.com/comingsoon",
+      "external_link": "https://crunchydata.katacoda.com/basic-postgresql-devel",
       "title": "Basic PostgreSQL for Developers",
-      "action": "Coming Soon"
+      "action": "Start Course"
     },
     {
-      "external_link": "https://crunchydata.katacoda.com/comingsoon",
+      "external_link": "https://crunchydata.katacoda.com/postgis",
       "title": "PostGIS",
-      "action": "Coming Soon"
+      "action": "Start Course"
     },
     {
       "external_link": "https://crunchydata.katacoda.com/comingsoon",

diff --git a/learning-resources.txt b/learning-resources.txt
@@ -1,3 +1,28 @@
+#Katacoda instructions for CrunchyData
+
+The top level of our site has courses - courses are made up of scenarios - scenarios have a json file that defines the intro,
+the instructional pages, the final page, the shell scripts to run when the scenario starts, which katacoda image to run, and
+what the scenario layout will look like in terms of terminals and web pages.
+
+## Here is how the site "works"
+1. homepage-pathway.json controls what shows up on the home page on crunchydata.katacoda.com. It lists all the courses on our site
+    * first item is the url of the course which is "https://crunchydata.katacoda.com/<name of the pathway minus the name pathway>"
+        An example "external_link": "https://crunchydata.katacoda.com/workshops/" means there will be a workshops-pathway.json
+        Inside that JSON file will be the layout for which scenarios will be in the course.
+    * The title of the Course you want to show up to the end user
+    * And then the action - which should be "Start Course" if you have content in the directory
+2.
+
+## Setting up a future workshop
+1. Go to https:://dashboard.katacoda.com
+2. Log in using the account and credentials that you used to log in to katacoda
+3.
+
+The list of available trainings comes from the pathways in the training directory. Whatever pathway you choose
+will determine the home page that attendees will see when they sign in
+
+
+
 For learning more on what you can do with Katacoda
 
 The main doc site

diff --git a/postgis-pathway.json b/postgis-pathway.json
@@ -1,4 +1,14 @@
 {
   "title": "CrunchyData PostGIS",
+<<<<<<< HEAD
+  "courses": [
+    {
+      "external_link": "https://crunchydata.katacoda.com/postgis/qpostgisinto/",
+      "course_id": "qpostgisintro",
+      "title": "Quick Intro. To PostGIS"
+    }
+  ]
+=======
   "courses": []
+>>>>>>> 2930e32889613de3534aabf3bf13b86bb8514a27
 }
diff --git a/postgis/.gitkeep b/postgis/.gitkeep
diff --git a/postgis/qpostgisinto/01-spatial-data.md b/postgis/qpostgisinto/01-spatial-data.md
@@ -0,0 +1,128 @@
+# Working with Spatial Data in PostGIS
+
+PostgreSQL has the Gold Standard in spatial extensions for any RDBMs on the market - PostGIS. If you have data that has 
+direct spatial information, like coordinates, or indirect, such as an address, you can leverage the power of spatial 
+analysis to enhance the insights into your dataIn the workshopwe will barely be scratching the surface of what you can 
+do with PostGIS so please don't consider this exhaustive in the slightest.
+
+Final note before we dig in, remember that usually to work with spatial data you need to 
+
+```CREATE EXTENSION postgis;```
+
+in your database to enable all the functionality. We don't have to do it in the workshop because we already enabled the 
+extension when we created the DB in the container. 
+
+#### Spatial Tables
+
+Let's go ahead and log in to our PostgreSQL database:
+
+```psql -U groot -h localhost workshop```{{execute}}
+
+Remember that the password is the word 'password'.
+
+Now if you do:
+
+`\d county_geometry`{{execute}}
+
+PostgreSQL will show you a full description of the county_geometry table. To see all the \ commands in PostgreSQL just do 
+`\?` (though don't do it right now).
+
+You will see two spatial columns: 
+```
+interior_pnt | geography(Point,4326)        |
+the_geom     | geography(MultiPolygon,4326) |           
+
+```
+You can tell they are spatial because we declared them as type Geography with the type of spatial feature in the parentheses. 
+The other spatial type is Geometry. I give a brief discussion at the end of the page about the difference between the two types. 
+Just know that Geography is perfect for data coming from most GPSs or if you want to deal with data on a continent scale.
+
+You can also see we made indices for these two columns:
+```
+"countygeom_interiorpt_indx" gist (interior_pnt)
+"countygeom_the_geom_indx" gist (the_geom)
+
+```
+
+Making GiST indices on spatial data allows for efficient querying of the data by creating bounding rectangles and putting them
+in the index. The database can then use these simple rectangles to quickly filter out which features are not in the area of interest because
+geometric operations on rectangles is much quicker than complex shapes. 
+
+#### Simple spatial query
+
+Let's start with one of the simplest queries, a distance query. Let's find the 3 counties closest to the geographic center 
+of the United States (including Alaska and Hawaii):  44.967244 Latitude, -103.771555 Longitude. 
+
+Now let's select the id, county names, and distance from the 3 closest counties using the 
+[ST_Distance function](https://postgis.net/docs/manual-2.5/ST_Distance.html):
+
+```SELECT id, county_name, ST_Distance('POINT(-103.771555 44.967244)'::geography, the_geom) FROM county_geometry ORDER BY ST_Distance('POINT(-103.771555 44.967244)'::geography, the_geom) LIMIT 3;```{{execute}} 
+
+This result may take a while to return because we are calculating the distance between all the counties in the U.S. and 
+that point so we can ORDER the results on distance from the point. But I mentioned that PostGIS was the gold standard for a reason. It has a 
+method to handle our use case. It's called a K Nearest Neighbor Search (KNN) with 
+[its own operator](http://postgis.net/workshops/postgis-intro/knn.html).
+
+```SELECT id, county_name, ST_Distance('POINT(-103.771555 44.967244)'::geography, the_geom) FROM county_geometry ORDER BY the_geom <-> 'POINT(-103.771555 44.967244)'::geography LIMIT 3;```{{execute}}
+
+When we have a spatial index on the column, set a relatively small limit (X) on the return,  and we use the <-> operator, the database 
+"knows" to find the first X spatially closest results and THEN calculate everything else on them. Spatial indices to the
+rescue!
+
+#### Spatial Join
+
+This next query will demonstrate joining data based on spatial co-incidence rather than a shared primary key-foreign key 
+relationship. Our example will be to join the storm location data to the county geometry data, given us the county of the 
+incident eventhough it is not in our original file.
+
+```select geo.statefp, geo.county_name, geo.aland, se.event_id, se.location from county_geometry as geo, se_locations as se where ST_Covers(geo.the_geom, se.the_geom) limit 10;```{{execute}}
+
+The spatial operator we use is [ST_Covers](https://postgis.net/docs/ST_Covers.html) which return a boolean if the second geometry is completely withing the first geometry. 
+We set the limit to 5 because we don't want to wait for all the results to return for over 48K rows. The results also show 
+the [state fips](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code) code which tells us 
+the state name given the number. This way we can check if their is a county with the name in that state as well as a 
+location and if they two overlap.
+
+#### Spatial buffer and then select
+
+Finally let's do more complicated query that you could not do without sophisticated spatial operations. Suppose we were 
+trying to put together emergency response centers in counties with high potential for storms. We are going to buffer 12 KM (about 8 miles) 
+off a storm even center point and then select all the counties that intersect that buffered circle. We will use a grouping 
+query to do a count of the storms circles per county.
+
+First is the query returning all the counties with 22.5KM of a storm event location:
+
+```select geo.statefp, geo.county_name, se.locationid from county_geometry as geo, se_locations as se where ST_intersects(geo.the_geom, ST_Buffer(se.the_geom, 12500.0))  limit 200;```{{execute}}
+
+
+and then do the grouping and counting:
+
+```sql
+with all_counties as (
+           select geo.statefp, geo.county_name, se.locationid from county_geometry as geo, se_locations as se where ST_intersects(geo.the_geom, ST_Buffer(se.the_geom, 12500.0)) limit 200
+   )
+   select statefp, county_name, count(*) from all_counties group by statefp, county_name order by statefp, count(*) DESC;
+
+```
+
+The _with x as ()_ syntax is called a [Common Table Expression](https://www.postgresql.org/docs/11/queries-with.html) 
+(CTE) in PostgreSQL and makes writing subqueries a lot easier. The CTE create a temporary table that exists for just 
+one query. The tradeoff is that they are an optimization boundary. In this case they are fine to use for the workshop but 
+if you use them in future work please dig deeper into the tradeoffs of CTEs. 
+
+## Final Note
+
+Today we worked with a Geography type for our spatial data because our geographic extent was the entire U.S. and it
+also made our calculations easier syntax wise as well as being accurate. This is also the data format you get natively from
+GPS units, such as from your phone.   
+
+In the future, if you deal with data of a geographic extent less than a province or state, you are going to need to learn 
+to work with coordinate systems and projections of your coordinates (basically the process to take a globe and make a map).
+
+The way you specify your coordinates would change, since you would now have to give the projection you are using. Local 
+governments, non-profits, and companies will using give you the coordinates in a projected system so you will need to learn 
+how to use it in PostGIS. The ideas above remain exactly the same, it's just the way you store your data will change.
+
+To learn more, there is a great, but slightly outdated, [discussion](http://postgis.net/workshops/postgis-intro/geography.html) in this other workshop content.
+
+
diff --git a/postgis/qpostgisinto/assets/docker_ps.jpg b/postgis/qpostgisinto/assets/docker_ps.jpg
diff --git a/postgis/qpostgisinto/env-init.sh b/postgis/qpostgisinto/env-init.sh
@@ -0,0 +1,3 @@
+#!/usr/bin/bash
+
+# nothing for now