-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* up to date (#1) * Cleanup * Use latest version of docker * Remove Docker first * Remove more of Docker * Start Docker after installing * fixed misspecification * breaking apart the workshop
- Loading branch information
Showing
20 changed files
with
418 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,15 @@ | ||
{ | ||
"title": "CrunchyData Basic PostgreSQL for Developers", | ||
<<<<<<< HEAD | ||
"courses": [ | ||
{ | ||
"external_link": "https://crunchydata.katacoda.com/basic-postgresql-devel/runcontainers/", | ||
"course_id": "runcontainers", | ||
"title": "Quick Intro. To PostgreSQL in Containers | ||
} | ||
|
||
] | ||
======= | ||
"courses": [] | ||
>>>>>>> 2930e32889613de3534aabf3bf13b86bb8514a27 | ||
} |
Empty file.
119 changes: 119 additions & 0 deletions
119
basic-postgresql-devel/runcontainers/01-running-in-containers.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
In this exercise we will introduce containers (which you may know as Docker) and then spin up our PostgreSQL instance | ||
using containers. | ||
|
||
## A Little Background on Containers | ||
|
||
Containers have quite a long history in computing before Docker. At a simplistic level, containers "package up" applications | ||
and their dependencies to run with everything that is needed above the kernel OS. This allows for a cleaner separation | ||
of dependencies as the container has all the things it needs to run except the kernel. Here is | ||
[good introduction](https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b) | ||
to Docker containers. Be aware that there are [other](https://containerd.io/) container runtimes and specifications besides | ||
Docker. | ||
|
||
Containers are spun up from a container image. In this class we will use "container" to denote the running container | ||
and "image" to denote the binary used to spin up the container. | ||
|
||
Another advantage of images is that not only do they container the binaries for the application but they also are configured | ||
and ready to run. With a container you can skip most of the configuration and just do some version of "container run" | ||
|
||
In this class we will be using a image that contains Postgresql, PostGIS, embedded R, and some other extensions. If | ||
you have ever tried to install all these pieces you know what a hassle it can be. Let's see how easy it can be with containers. | ||
|
||
## Running PostgreSQL in Containers | ||
|
||
Crunchy Data has produced a full [suite of containers](https://github.com/CrunchyData/crunchy-containers) to make PostgreSQL | ||
simpler and easier to run in containerized environments. Today we will be using a container that was purposefully built for | ||
developers. The container makes some tradeoffs | ||
1. It has the most used extensions already included in the binaries | ||
1. It only requires one environment variable - a password. Everything else is optional | ||
1. It doesn't require any volume mappings but allows for optional ones | ||
1. Its target user is a developer on their primary development machine | ||
1. Its not supported or intended for production use | ||
1. It does not support replication or high availability scenarios | ||
|
||
It's goal is to get you up and running quickly and easily for your development work. | ||
|
||
#### Simplest method | ||
|
||
Let's start with the quickest and easiest way to start up PostgreSQL using a container. | ||
|
||
`docker run -e PG_PASSWORD=password thesteve0/postgres-appdev`{{execute}} | ||
|
||
If you click the little check mark in the box above it will execute the command in the terminal window. | ||
What you are doing is telling docker to run image | ||
[_thesteve0/postgres-appdev_](https://cloud.docker.com/u/thesteve0/repository/docker/thesteve0/postgres-appdev) and pass | ||
in the environment variable for what you want the password to be for both the standard user and the postgres (DBAdmin) user. | ||
|
||
1. The default name for the primary database will be: mydb | ||
1. The default username is: rnduser2w3 | ||
1. The default port will be: 5432 | ||
1. And the postgres user password will be equal to the user password which you set in the command. | ||
|
||
**CONGRATULATIONS you just spun up a fully working PostgreSQL database with a bunch of functionality!** | ||
|
||
But this is a pretty simplistic way to start PostgreSQL - great if you wanna just "get going quickly". | ||
|
||
Because we didn't run the container in "detached" mode we never got our prompt back. Detached mode allows the container | ||
to run in the background and give us back our prompt. To shut down the container click on tab titled "Terminal 2" and | ||
find out information on our running container: | ||
|
||
`docker ps`{{execute}} | ||
|
||
![dockerps](assets/docker_ps.jpg) | ||
|
||
Please note either the name or the ID of your running container (highlighted in red above). Now in the same terminal type | ||
in the following command: | ||
|
||
`docker kill <id or name of your container>` | ||
|
||
Docker kill is the way to stop your running container - it send the shutdown signal to the running container which should | ||
kill the primary process in the container (in this case the PostgreSQL server process). | ||
If you go back to the first tab, "Terminal" you will see that you get your prompt back. Let's start PostgreSQL more | ||
appropriately for your daily work. | ||
|
||
#### Better way to start the container | ||
|
||
Let's set a new username, give the container a fixed (rather than random) name, expose port 5432 from the container | ||
into the VM we are running, and have it detach so we can get our prompt back. | ||
|
||
`docker run -d -p 5432:5432 -e PG_USER=groot -e PG_PASSWORD=password -e PG_DATABASE=workshop --name=pgsql thesteve0/postgres-appdev`{{execute}} | ||
|
||
And with that we have now spun up PostgreSQL with | ||
1. The ability to connect from our VM to the instance running in the container | ||
1. Username: groot | ||
1. Password: password | ||
1. A database named: workshop | ||
1. A container named: pgsql | ||
|
||
If you want to now log into that running instance of PostgreSQL you can do: | ||
|
||
`psql -U groot -h localhost workshop` | ||
|
||
We don't need the port mapping because the psql cli assumes PostgreSQL to be running on port 5432. | ||
|
||
#### A little container management | ||
|
||
The good part about naming the container is that we can do things like - stop the container | ||
|
||
`docker kill pgsql`{{execute}} | ||
|
||
and the start it again with all the same setting as last time | ||
|
||
`docker start pgsql`{{execute}} | ||
|
||
or | ||
|
||
`docker restart pgsql`{{execute}} | ||
|
||
Not only will this retain the setting but all the data you added before will be there when you restart the container. | ||
|
||
If you wanted to have PostgreSQL instances with different data or even different versions you could start up images into | ||
containers with different names. This way you could spin them up and down as needed. | ||
|
||
If you want to see all the images on your machine just do the following command: | ||
|
||
`docker images`{{execute}} | ||
|
||
|
||
|
||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#!/usr/bin/bash | ||
|
||
# nothing for now |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Final Notes | ||
|
||
The container used in this class is available [in Dockerhub](https://cloud.docker.com/u/thesteve0/repository/docker/thesteve0/postgres-appdev). | ||
As long as you have Docker on your machine you can use the same version of PostgreSQL as the workshop. There is also data for [playing with at] (https://github.com/CrunchyData/crunchy-demo-data/releases/tag/v0.1) workshop was intentionally chosen | ||
from public domain or permissive licenses so that you can use it for commercial and non-commercial purposes. Feel free | ||
to download it and play some more at your own pace on your own machine. | ||
|
||
Now you have a quick and easy way to spin up PostgreSQL without installing binaries, compiling software, or any other | ||
administrative tasks. And, if your whole team uses the same images to start their containers, you will all be running PostgreSQL | ||
the exact same way, making it easier to share knowledge. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
{ | ||
"title": "Quick Intro. to Running PostgreSQL in Containers", | ||
"description": "A brief introduction to running a developer-centric container of PostgreSQL", | ||
"difficulty": "beginner", | ||
"time": "10 minutes", | ||
"details": { | ||
"steps": [ | ||
{"title": "Running in Containers", "text": "01-running-in-containers.md"} | ||
], | ||
"intro": { | ||
"courseData": "env-init.sh", | ||
"code": "set-env.sh", | ||
"text": "intro.md", | ||
"credits": "" | ||
}, | ||
"finish": { | ||
"text": "finish.md" | ||
} | ||
}, | ||
"environment": { | ||
"uilayout": "terminal", | ||
"uimessage1": "\u001b[32mYour Interactive Bash Terminal.\u001b[m\r\n", | ||
"terminals": [ | ||
{"name": "Terminal 2", "target": "host01"} | ||
] | ||
}, | ||
"backend": { | ||
"imageid": "crunchydata-single1" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Using PostgreSQL In Containers | ||
|
||
This class is going to give you a quick introduction into running a PostgreSQL container geared towards the needs of application | ||
developers. You probably know of containers by the name Docker, but they are actually a technology that has been around for | ||
a while. | ||
|
||
The goal of this class is to teach a little about containers, introduce you to the containers produced by CrunchyData, and | ||
show you how to use the container speficially built to make the lives of application developers easier. | ||
|
||
Enjoy! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#!/usr/bin/bash | ||
|
||
# Nothing Yet |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,14 @@ | ||
{ | ||
"title": "CrunchyData PostGIS", | ||
<<<<<<< HEAD | ||
"courses": [ | ||
{ | ||
"external_link": "https://crunchydata.katacoda.com/postgis/qpostgisinto/", | ||
"course_id": "qpostgisintro", | ||
"title": "Quick Intro. To PostGIS" | ||
} | ||
] | ||
======= | ||
"courses": [] | ||
>>>>>>> 2930e32889613de3534aabf3bf13b86bb8514a27 | ||
} |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
# Working with Spatial Data in PostGIS | ||
|
||
PostgreSQL has the Gold Standard in spatial extensions for any RDBMs on the market - PostGIS. If you have data that has | ||
direct spatial information, like coordinates, or indirect, such as an address, you can leverage the power of spatial | ||
analysis to enhance the insights into your dataIn the workshopwe will barely be scratching the surface of what you can | ||
do with PostGIS so please don't consider this exhaustive in the slightest. | ||
|
||
Final note before we dig in, remember that usually to work with spatial data you need to | ||
|
||
```CREATE EXTENSION postgis;``` | ||
|
||
in your database to enable all the functionality. We don't have to do it in the workshop because we already enabled the | ||
extension when we created the DB in the container. | ||
|
||
#### Spatial Tables | ||
|
||
Let's go ahead and log in to our PostgreSQL database: | ||
|
||
```psql -U groot -h localhost workshop```{{execute}} | ||
|
||
Remember that the password is the word 'password'. | ||
|
||
Now if you do: | ||
|
||
`\d county_geometry`{{execute}} | ||
|
||
PostgreSQL will show you a full description of the county_geometry table. To see all the \ commands in PostgreSQL just do | ||
`\?` (though don't do it right now). | ||
|
||
You will see two spatial columns: | ||
``` | ||
interior_pnt | geography(Point,4326) | | ||
the_geom | geography(MultiPolygon,4326) | | ||
``` | ||
You can tell they are spatial because we declared them as type Geography with the type of spatial feature in the parentheses. | ||
The other spatial type is Geometry. I give a brief discussion at the end of the page about the difference between the two types. | ||
Just know that Geography is perfect for data coming from most GPSs or if you want to deal with data on a continent scale. | ||
|
||
You can also see we made indices for these two columns: | ||
``` | ||
"countygeom_interiorpt_indx" gist (interior_pnt) | ||
"countygeom_the_geom_indx" gist (the_geom) | ||
``` | ||
|
||
Making GiST indices on spatial data allows for efficient querying of the data by creating bounding rectangles and putting them | ||
in the index. The database can then use these simple rectangles to quickly filter out which features are not in the area of interest because | ||
geometric operations on rectangles is much quicker than complex shapes. | ||
|
||
#### Simple spatial query | ||
|
||
Let's start with one of the simplest queries, a distance query. Let's find the 3 counties closest to the geographic center | ||
of the United States (including Alaska and Hawaii): 44.967244 Latitude, -103.771555 Longitude. | ||
|
||
Now let's select the id, county names, and distance from the 3 closest counties using the | ||
[ST_Distance function](https://postgis.net/docs/manual-2.5/ST_Distance.html): | ||
|
||
```SELECT id, county_name, ST_Distance('POINT(-103.771555 44.967244)'::geography, the_geom) FROM county_geometry ORDER BY ST_Distance('POINT(-103.771555 44.967244)'::geography, the_geom) LIMIT 3;```{{execute}} | ||
|
||
This result may take a while to return because we are calculating the distance between all the counties in the U.S. and | ||
that point so we can ORDER the results on distance from the point. But I mentioned that PostGIS was the gold standard for a reason. It has a | ||
method to handle our use case. It's called a K Nearest Neighbor Search (KNN) with | ||
[its own operator](http://postgis.net/workshops/postgis-intro/knn.html). | ||
|
||
```SELECT id, county_name, ST_Distance('POINT(-103.771555 44.967244)'::geography, the_geom) FROM county_geometry ORDER BY the_geom <-> 'POINT(-103.771555 44.967244)'::geography LIMIT 3;```{{execute}} | ||
|
||
When we have a spatial index on the column, set a relatively small limit (X) on the return, and we use the <-> operator, the database | ||
"knows" to find the first X spatially closest results and THEN calculate everything else on them. Spatial indices to the | ||
rescue! | ||
|
||
#### Spatial Join | ||
|
||
This next query will demonstrate joining data based on spatial co-incidence rather than a shared primary key-foreign key | ||
relationship. Our example will be to join the storm location data to the county geometry data, given us the county of the | ||
incident eventhough it is not in our original file. | ||
|
||
```select geo.statefp, geo.county_name, geo.aland, se.event_id, se.location from county_geometry as geo, se_locations as se where ST_Covers(geo.the_geom, se.the_geom) limit 10;```{{execute}} | ||
|
||
The spatial operator we use is [ST_Covers](https://postgis.net/docs/ST_Covers.html) which return a boolean if the second geometry is completely withing the first geometry. | ||
We set the limit to 5 because we don't want to wait for all the results to return for over 48K rows. The results also show | ||
the [state fips](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code) code which tells us | ||
the state name given the number. This way we can check if their is a county with the name in that state as well as a | ||
location and if they two overlap. | ||
|
||
#### Spatial buffer and then select | ||
|
||
Finally let's do more complicated query that you could not do without sophisticated spatial operations. Suppose we were | ||
trying to put together emergency response centers in counties with high potential for storms. We are going to buffer 12 KM (about 8 miles) | ||
off a storm even center point and then select all the counties that intersect that buffered circle. We will use a grouping | ||
query to do a count of the storms circles per county. | ||
|
||
First is the query returning all the counties with 22.5KM of a storm event location: | ||
|
||
```select geo.statefp, geo.county_name, se.locationid from county_geometry as geo, se_locations as se where ST_intersects(geo.the_geom, ST_Buffer(se.the_geom, 12500.0)) limit 200;```{{execute}} | ||
|
||
|
||
and then do the grouping and counting: | ||
|
||
```sql | ||
with all_counties as ( | ||
select geo.statefp, geo.county_name, se.locationid from county_geometry as geo, se_locations as se where ST_intersects(geo.the_geom, ST_Buffer(se.the_geom, 12500.0)) limit 200 | ||
) | ||
select statefp, county_name, count(*) from all_counties group by statefp, county_name order by statefp, count(*) DESC; | ||
|
||
``` | ||
|
||
The _with x as ()_ syntax is called a [Common Table Expression](https://www.postgresql.org/docs/11/queries-with.html) | ||
(CTE) in PostgreSQL and makes writing subqueries a lot easier. The CTE create a temporary table that exists for just | ||
one query. The tradeoff is that they are an optimization boundary. In this case they are fine to use for the workshop but | ||
if you use them in future work please dig deeper into the tradeoffs of CTEs. | ||
|
||
## Final Note | ||
|
||
Today we worked with a Geography type for our spatial data because our geographic extent was the entire U.S. and it | ||
also made our calculations easier syntax wise as well as being accurate. This is also the data format you get natively from | ||
GPS units, such as from your phone. | ||
|
||
In the future, if you deal with data of a geographic extent less than a province or state, you are going to need to learn | ||
to work with coordinate systems and projections of your coordinates (basically the process to take a globe and make a map). | ||
|
||
The way you specify your coordinates would change, since you would now have to give the projection you are using. Local | ||
governments, non-profits, and companies will using give you the coordinates in a projected system so you will need to learn | ||
how to use it in PostGIS. The ideas above remain exactly the same, it's just the way you store your data will change. | ||
|
||
To learn more, there is a great, but slightly outdated, [discussion](http://postgis.net/workshops/postgis-intro/geography.html) in this other workshop content. | ||
|
||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#!/usr/bin/bash | ||
|
||
# nothing for now |
Oops, something went wrong.