Skip to content

Commit

Permalink
new content (#26)
Browse files Browse the repository at this point in the history
* up to date (#1)

* Cleanup

* Use latest version of docker

* Remove Docker first

* Remove more of Docker

* Start Docker after installing

* fixed misspecification

* breaking apart the workshop
  • Loading branch information
thesteve0 authored May 17, 2019
1 parent 5e54e84 commit 781849f
Show file tree
Hide file tree
Showing 20 changed files with 418 additions and 4 deletions.
11 changes: 11 additions & 0 deletions basic-postgresql-devel-pathway.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,15 @@
{
"title": "CrunchyData Basic PostgreSQL for Developers",
<<<<<<< HEAD
"courses": [
{
"external_link": "https://crunchydata.katacoda.com/basic-postgresql-devel/runcontainers/",
"course_id": "runcontainers",
"title": "Quick Intro. To PostgreSQL in Containers
}

]
=======
"courses": []
>>>>>>> 2930e32889613de3534aabf3bf13b86bb8514a27
}
Empty file removed basic-postgresql-devel/.gitkeep
Empty file.
119 changes: 119 additions & 0 deletions basic-postgresql-devel/runcontainers/01-running-in-containers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
In this exercise we will introduce containers (which you may know as Docker) and then spin up our PostgreSQL instance
using containers.

## A Little Background on Containers

Containers have quite a long history in computing before Docker. At a simplistic level, containers "package up" applications
and their dependencies to run with everything that is needed above the kernel OS. This allows for a cleaner separation
of dependencies as the container has all the things it needs to run except the kernel. Here is
[good introduction](https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b)
to Docker containers. Be aware that there are [other](https://containerd.io/) container runtimes and specifications besides
Docker.

Containers are spun up from a container image. In this class we will use "container" to denote the running container
and "image" to denote the binary used to spin up the container.

Another advantage of images is that not only do they container the binaries for the application but they also are configured
and ready to run. With a container you can skip most of the configuration and just do some version of "container run"

In this class we will be using a image that contains Postgresql, PostGIS, embedded R, and some other extensions. If
you have ever tried to install all these pieces you know what a hassle it can be. Let's see how easy it can be with containers.

## Running PostgreSQL in Containers

Crunchy Data has produced a full [suite of containers](https://github.com/CrunchyData/crunchy-containers) to make PostgreSQL
simpler and easier to run in containerized environments. Today we will be using a container that was purposefully built for
developers. The container makes some tradeoffs
1. It has the most used extensions already included in the binaries
1. It only requires one environment variable - a password. Everything else is optional
1. It doesn't require any volume mappings but allows for optional ones
1. Its target user is a developer on their primary development machine
1. Its not supported or intended for production use
1. It does not support replication or high availability scenarios

It's goal is to get you up and running quickly and easily for your development work.

#### Simplest method

Let's start with the quickest and easiest way to start up PostgreSQL using a container.

`docker run -e PG_PASSWORD=password thesteve0/postgres-appdev`{{execute}}

If you click the little check mark in the box above it will execute the command in the terminal window.
What you are doing is telling docker to run image
[_thesteve0/postgres-appdev_](https://cloud.docker.com/u/thesteve0/repository/docker/thesteve0/postgres-appdev) and pass
in the environment variable for what you want the password to be for both the standard user and the postgres (DBAdmin) user.

1. The default name for the primary database will be: mydb
1. The default username is: rnduser2w3
1. The default port will be: 5432
1. And the postgres user password will be equal to the user password which you set in the command.

**CONGRATULATIONS you just spun up a fully working PostgreSQL database with a bunch of functionality!**

But this is a pretty simplistic way to start PostgreSQL - great if you wanna just "get going quickly".

Because we didn't run the container in "detached" mode we never got our prompt back. Detached mode allows the container
to run in the background and give us back our prompt. To shut down the container click on tab titled "Terminal 2" and
find out information on our running container:

`docker ps`{{execute}}

![dockerps](assets/docker_ps.jpg)

Please note either the name or the ID of your running container (highlighted in red above). Now in the same terminal type
in the following command:

`docker kill <id or name of your container>`

Docker kill is the way to stop your running container - it send the shutdown signal to the running container which should
kill the primary process in the container (in this case the PostgreSQL server process).
If you go back to the first tab, "Terminal" you will see that you get your prompt back. Let's start PostgreSQL more
appropriately for your daily work.

#### Better way to start the container

Let's set a new username, give the container a fixed (rather than random) name, expose port 5432 from the container
into the VM we are running, and have it detach so we can get our prompt back.

`docker run -d -p 5432:5432 -e PG_USER=groot -e PG_PASSWORD=password -e PG_DATABASE=workshop --name=pgsql thesteve0/postgres-appdev`{{execute}}

And with that we have now spun up PostgreSQL with
1. The ability to connect from our VM to the instance running in the container
1. Username: groot
1. Password: password
1. A database named: workshop
1. A container named: pgsql

If you want to now log into that running instance of PostgreSQL you can do:

`psql -U groot -h localhost workshop`

We don't need the port mapping because the psql cli assumes PostgreSQL to be running on port 5432.

#### A little container management

The good part about naming the container is that we can do things like - stop the container

`docker kill pgsql`{{execute}}

and the start it again with all the same setting as last time

`docker start pgsql`{{execute}}

or

`docker restart pgsql`{{execute}}

Not only will this retain the setting but all the data you added before will be there when you restart the container.

If you wanted to have PostgreSQL instances with different data or even different versions you could start up images into
containers with different names. This way you could spin them up and down as needed.

If you want to see all the images on your machine just do the following command:

`docker images`{{execute}}




Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions basic-postgresql-devel/runcontainers/env-init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/bash

# nothing for now
10 changes: 10 additions & 0 deletions basic-postgresql-devel/runcontainers/finish.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Final Notes

The container used in this class is available [in Dockerhub](https://cloud.docker.com/u/thesteve0/repository/docker/thesteve0/postgres-appdev).
As long as you have Docker on your machine you can use the same version of PostgreSQL as the workshop. There is also data for [playing with at] (https://github.com/CrunchyData/crunchy-demo-data/releases/tag/v0.1) workshop was intentionally chosen
from public domain or permissive licenses so that you can use it for commercial and non-commercial purposes. Feel free
to download it and play some more at your own pace on your own machine.

Now you have a quick and easy way to spin up PostgreSQL without installing binaries, compiling software, or any other
administrative tasks. And, if your whole team uses the same images to start their containers, you will all be running PostgreSQL
the exact same way, making it easier to share knowledge.
30 changes: 30 additions & 0 deletions basic-postgresql-devel/runcontainers/index.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"title": "Quick Intro. to Running PostgreSQL in Containers",
"description": "A brief introduction to running a developer-centric container of PostgreSQL",
"difficulty": "beginner",
"time": "10 minutes",
"details": {
"steps": [
{"title": "Running in Containers", "text": "01-running-in-containers.md"}
],
"intro": {
"courseData": "env-init.sh",
"code": "set-env.sh",
"text": "intro.md",
"credits": ""
},
"finish": {
"text": "finish.md"
}
},
"environment": {
"uilayout": "terminal",
"uimessage1": "\u001b[32mYour Interactive Bash Terminal.\u001b[m\r\n",
"terminals": [
{"name": "Terminal 2", "target": "host01"}
]
},
"backend": {
"imageid": "crunchydata-single1"
}
}
10 changes: 10 additions & 0 deletions basic-postgresql-devel/runcontainers/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Using PostgreSQL In Containers

This class is going to give you a quick introduction into running a PostgreSQL container geared towards the needs of application
developers. You probably know of containers by the name Docker, but they are actually a technology that has been around for
a while.

The goal of this class is to teach a little about containers, introduce you to the containers produced by CrunchyData, and
show you how to use the container speficially built to make the lives of application developers easier.

Enjoy!
3 changes: 3 additions & 0 deletions basic-postgresql-devel/runcontainers/set-env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/bash

# Nothing Yet
8 changes: 4 additions & 4 deletions homepage-pathway.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@
"action": "Coming Soon"
},
{
"external_link": "https://crunchydata.katacoda.com/comingsoon",
"external_link": "https://crunchydata.katacoda.com/basic-postgresql-devel",
"title": "Basic PostgreSQL for Developers",
"action": "Coming Soon"
"action": "Start Course"
},
{
"external_link": "https://crunchydata.katacoda.com/comingsoon",
"external_link": "https://crunchydata.katacoda.com/postgis",
"title": "PostGIS",
"action": "Coming Soon"
"action": "Start Course"
},
{
"external_link": "https://crunchydata.katacoda.com/comingsoon",
Expand Down
25 changes: 25 additions & 0 deletions learning-resources.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,28 @@
#Katacoda instructions for CrunchyData

The top level of our site has courses - courses are made up of scenarios - scenarios have a json file that defines the intro,
the instructional pages, the final page, the shell scripts to run when the scenario starts, which katacoda image to run, and
what the scenario layout will look like in terms of terminals and web pages.

## Here is how the site "works"
1. homepage-pathway.json controls what shows up on the home page on crunchydata.katacoda.com. It lists all the courses on our site
* first item is the url of the course which is "https://crunchydata.katacoda.com/<name of the pathway minus the name pathway>"
An example "external_link": "https://crunchydata.katacoda.com/workshops/" means there will be a workshops-pathway.json
Inside that JSON file will be the layout for which scenarios will be in the course.
* The title of the Course you want to show up to the end user
* And then the action - which should be "Start Course" if you have content in the directory
2.

## Setting up a future workshop
1. Go to https:://dashboard.katacoda.com
2. Log in using the account and credentials that you used to log in to katacoda
3.

The list of available trainings comes from the pathways in the training directory. Whatever pathway you choose
will determine the home page that attendees will see when they sign in



For learning more on what you can do with Katacoda

The main doc site
Expand Down
10 changes: 10 additions & 0 deletions postgis-pathway.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
{
"title": "CrunchyData PostGIS",
<<<<<<< HEAD
"courses": [
{
"external_link": "https://crunchydata.katacoda.com/postgis/qpostgisinto/",
"course_id": "qpostgisintro",
"title": "Quick Intro. To PostGIS"
}
]
=======
"courses": []
>>>>>>> 2930e32889613de3534aabf3bf13b86bb8514a27
}
Empty file removed postgis/.gitkeep
Empty file.
128 changes: 128 additions & 0 deletions postgis/qpostgisinto/01-spatial-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Working with Spatial Data in PostGIS

PostgreSQL has the Gold Standard in spatial extensions for any RDBMs on the market - PostGIS. If you have data that has
direct spatial information, like coordinates, or indirect, such as an address, you can leverage the power of spatial
analysis to enhance the insights into your dataIn the workshopwe will barely be scratching the surface of what you can
do with PostGIS so please don't consider this exhaustive in the slightest.

Final note before we dig in, remember that usually to work with spatial data you need to

```CREATE EXTENSION postgis;```

in your database to enable all the functionality. We don't have to do it in the workshop because we already enabled the
extension when we created the DB in the container.

#### Spatial Tables

Let's go ahead and log in to our PostgreSQL database:

```psql -U groot -h localhost workshop```{{execute}}

Remember that the password is the word 'password'.

Now if you do:

`\d county_geometry`{{execute}}

PostgreSQL will show you a full description of the county_geometry table. To see all the \ commands in PostgreSQL just do
`\?` (though don't do it right now).

You will see two spatial columns:
```
interior_pnt | geography(Point,4326) |
the_geom | geography(MultiPolygon,4326) |
```
You can tell they are spatial because we declared them as type Geography with the type of spatial feature in the parentheses.
The other spatial type is Geometry. I give a brief discussion at the end of the page about the difference between the two types.
Just know that Geography is perfect for data coming from most GPSs or if you want to deal with data on a continent scale.

You can also see we made indices for these two columns:
```
"countygeom_interiorpt_indx" gist (interior_pnt)
"countygeom_the_geom_indx" gist (the_geom)
```

Making GiST indices on spatial data allows for efficient querying of the data by creating bounding rectangles and putting them
in the index. The database can then use these simple rectangles to quickly filter out which features are not in the area of interest because
geometric operations on rectangles is much quicker than complex shapes.

#### Simple spatial query

Let's start with one of the simplest queries, a distance query. Let's find the 3 counties closest to the geographic center
of the United States (including Alaska and Hawaii): 44.967244 Latitude, -103.771555 Longitude.

Now let's select the id, county names, and distance from the 3 closest counties using the
[ST_Distance function](https://postgis.net/docs/manual-2.5/ST_Distance.html):

```SELECT id, county_name, ST_Distance('POINT(-103.771555 44.967244)'::geography, the_geom) FROM county_geometry ORDER BY ST_Distance('POINT(-103.771555 44.967244)'::geography, the_geom) LIMIT 3;```{{execute}}

This result may take a while to return because we are calculating the distance between all the counties in the U.S. and
that point so we can ORDER the results on distance from the point. But I mentioned that PostGIS was the gold standard for a reason. It has a
method to handle our use case. It's called a K Nearest Neighbor Search (KNN) with
[its own operator](http://postgis.net/workshops/postgis-intro/knn.html).

```SELECT id, county_name, ST_Distance('POINT(-103.771555 44.967244)'::geography, the_geom) FROM county_geometry ORDER BY the_geom <-> 'POINT(-103.771555 44.967244)'::geography LIMIT 3;```{{execute}}

When we have a spatial index on the column, set a relatively small limit (X) on the return, and we use the <-> operator, the database
"knows" to find the first X spatially closest results and THEN calculate everything else on them. Spatial indices to the
rescue!

#### Spatial Join

This next query will demonstrate joining data based on spatial co-incidence rather than a shared primary key-foreign key
relationship. Our example will be to join the storm location data to the county geometry data, given us the county of the
incident eventhough it is not in our original file.

```select geo.statefp, geo.county_name, geo.aland, se.event_id, se.location from county_geometry as geo, se_locations as se where ST_Covers(geo.the_geom, se.the_geom) limit 10;```{{execute}}

The spatial operator we use is [ST_Covers](https://postgis.net/docs/ST_Covers.html) which return a boolean if the second geometry is completely withing the first geometry.
We set the limit to 5 because we don't want to wait for all the results to return for over 48K rows. The results also show
the [state fips](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code) code which tells us
the state name given the number. This way we can check if their is a county with the name in that state as well as a
location and if they two overlap.

#### Spatial buffer and then select

Finally let's do more complicated query that you could not do without sophisticated spatial operations. Suppose we were
trying to put together emergency response centers in counties with high potential for storms. We are going to buffer 12 KM (about 8 miles)
off a storm even center point and then select all the counties that intersect that buffered circle. We will use a grouping
query to do a count of the storms circles per county.

First is the query returning all the counties with 22.5KM of a storm event location:

```select geo.statefp, geo.county_name, se.locationid from county_geometry as geo, se_locations as se where ST_intersects(geo.the_geom, ST_Buffer(se.the_geom, 12500.0)) limit 200;```{{execute}}


and then do the grouping and counting:

```sql
with all_counties as (
select geo.statefp, geo.county_name, se.locationid from county_geometry as geo, se_locations as se where ST_intersects(geo.the_geom, ST_Buffer(se.the_geom, 12500.0)) limit 200
)
select statefp, county_name, count(*) from all_counties group by statefp, county_name order by statefp, count(*) DESC;

```

The _with x as ()_ syntax is called a [Common Table Expression](https://www.postgresql.org/docs/11/queries-with.html)
(CTE) in PostgreSQL and makes writing subqueries a lot easier. The CTE create a temporary table that exists for just
one query. The tradeoff is that they are an optimization boundary. In this case they are fine to use for the workshop but
if you use them in future work please dig deeper into the tradeoffs of CTEs.

## Final Note

Today we worked with a Geography type for our spatial data because our geographic extent was the entire U.S. and it
also made our calculations easier syntax wise as well as being accurate. This is also the data format you get natively from
GPS units, such as from your phone.

In the future, if you deal with data of a geographic extent less than a province or state, you are going to need to learn
to work with coordinate systems and projections of your coordinates (basically the process to take a globe and make a map).

The way you specify your coordinates would change, since you would now have to give the projection you are using. Local
governments, non-profits, and companies will using give you the coordinates in a projected system so you will need to learn
how to use it in PostGIS. The ideas above remain exactly the same, it's just the way you store your data will change.

To learn more, there is a great, but slightly outdated, [discussion](http://postgis.net/workshops/postgis-intro/geography.html) in this other workshop content.


Binary file added postgis/qpostgisinto/assets/docker_ps.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions postgis/qpostgisinto/env-init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/bash

# nothing for now
Loading

0 comments on commit 781849f

Please sign in to comment.