Skip to content

Commit

Permalink
Merge branch 'master' into doc_remove_kubeflow_pipelines
Browse files Browse the repository at this point in the history
  • Loading branch information
StanHatko authored Feb 7, 2023
2 parents 72e6fd9 + c55115f commit 1e2f11d
Show file tree
Hide file tree
Showing 17 changed files with 477 additions and 44 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ The AAW includes tools that allow data science users to open almost any file. Th
- sqlite
- many others... just ask :-)

### How much does the AAW cost?

#### CPU Only

Expand Down
92 changes: 49 additions & 43 deletions docs/en/1-Experiments/Kubeflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,11 @@ for your team.

## Image

You will need to choose an image. There are JupyterLab, RStudio, and Ubuntu remote
desktop images available. Select the drop down menu to select additional options
within these (for instance, CPU, PyTorch, and TensorFlow images for JupyterLab).
You will need to choose an image. There are JupyterLab, RStudio, Ubuntu remote
desktop, and SAS images available. The SAS image is only available for StatCan
employees (due to license limitations), the others are available for everyone.
Select the drop down menu to select additional options within these (for
instance, CPU, PyTorch, and TensorFlow images for JupyterLab).

Check the name of the images and choose one that matches what you want to do. Don't know
which one to choose? Check out your options [here](./Selecting-an-Image.md).
Expand All @@ -63,33 +65,43 @@ which one to choose? Check out your options [here](./Selecting-an-Image.md).

## CPU and Memory

- At the time of writing (December 23, 2021) there are two types of computers in
the cluster

- **CPU:** `D16s v3` (16 CPU cores, 64 GiB memory; for user use 15 CPU cores
and 48 GiB memory are available; 1 CPU core and 16 GiB memory reserved for
system use).
- **GPU:** `NC6s_v3` (6 CPU cores, 112 GiB memory, 1 GPU; for user use 96 GiB
memory are available; 16 GiB memory reserved for system use). The available
GPU is the NVIDIA Tesla V100 GPU with specs
[here](https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf).

When creating a notebook server, the system will limit you to the maximum
specifications above. For CPU notebook servers, you can specify the exact
amount of CPU and memory that you require. This allows you to meet your
compute needs while minimising cost. For a GPU notebook server, you will
always get the full server (6 CPU cores, 96 GiB accessible memory, and 1 GPU).
See below section on GPUs for information on how to select a GPU server.

In the future there may be larger machines available, so you may have looser
restrictions.
At the time of writing (December 23, 2021) there are two types of computers in
the cluster

- **CPU:** `D16s v3` (16 CPU cores, 64 GiB memory; for user use 15 CPU cores
and 48 GiB memory are available; 1 CPU core and 16 GiB memory reserved for
system use).
- **GPU:** `NC6s_v3` (6 CPU cores, 112 GiB memory, 1 GPU; for user use 96 GiB
memory are available; 16 GiB memory reserved for system use). The available
GPU is the NVIDIA Tesla V100 GPU with specs
[here](https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf).

When creating a notebook server, the system will limit you to the maximum
specifications above. For CPU notebook servers, you can specify the exact
amount of CPU and memory that you require. This allows you to meet your
compute needs while minimising cost. For a GPU notebook server, you will
always get the full server (6 CPU cores, 96 GiB accessible memory, and 1 GPU).
See below section on GPUs for information on how to select a GPU server.

In the advanced options, you can select a higher limit than the number of CPU cores and
RAM requested. The amount requested is the amount guaranteed to be available for your
notebook server and you will always pay for at least this much. If the limit is higher
than the amount requested, if additional RAM and CPU cores are available on that shared
server in the cluster your notebook server can use them as needed. One use case for this
is jobs that usually need only one CPU core but can benefit from multithreading to speed
up certain operations. By requesting one CPU core but a higher limit, you can pay much
less for the notebook server while allowing it to use spare unused CPU cores as needed
to speed up computations.

![Select CPU and RAM](../images/cpu-ram-select.png)

## GPUs

If you want a GPU server, select `1` as the number of GPUs and `NVIDIA` as the GPU
vendor (the create button will be greyed out until the GPU vendor is selected if
you have a GPU specified). Multi-GPU servers are not currently supported on the
AAW system.
you have a GPU specified). Multi-GPU servers are currently supported on the AAW
system only on a special on-request basis, please contact the AAW maintainers if
you would like a multi-GPU server.

![GPU Configuration](../images/kubeflow_gpu_selection.jpg)

Expand All @@ -110,10 +122,6 @@ are various configuration options available:

- You can specify the size of the workspace volume, from 4 GiB to 32 GiB.

- You can choose the option to not use persistent storage for home, in which case the
home folder will be deleted as soon as the notebook server is closed. Otherwise the
home folder will remain and can be used again for a new notebook server in the future.

![Create a Workspace Volume](../images/workspace-volume.PNG)

<!-- prettier-ignore -->
Expand All @@ -124,17 +132,23 @@ are various configuration options available:
## Data Volumes

You can also create data volumes that can be used to store additional data. Multiple
data volumes can be created. Click the add volume button to create a new volume and specify
its configuration. There are the following configuration parameters as for data volumes:

- **Type**: Create a new volume or use an existing volume.
data volumes can be created. Click the add new volume button to create a new volume and
specify its configuration. Click the attach existing volume button to mount an existing
data volume to the notebook server. There are the following configuration parameters for
data volumes:

- **Name**: Name of the volume.

- **Size in GiB**: From 4 GiB to 512 GiB.

- **Mount Point**: Path where the data volume can be accessed on the notebook server, by
default `/home/jovyan/<volume name>`.
- **Mount path**: Path where the data volume can be accessed on the notebook server, by
default `/home/jovyan/vol-1`, `/home/jovyan/vol-2`, etc. (incrementing counter per data
volume mounted).

When mounting an existing data volume, the name option becomes a drop-down list of the
existing data volumes. Only a volume not currently mounted to an existing notebook server
can be used. The mount path option remains user-configurable with the same defaults as
creating a new volume.

The garbage can icon on the right can be used to delete an existing or accidentally created
data volume.
Expand All @@ -152,14 +166,6 @@ There are currently three checkbox options available here:
access to any Protected B resources. Protected B notebook servers run with many
security restrictions and have access to separate MinIO instances specifically
designed for Protected B data.
- **Allow access to Kubeflow Pipelines**: This will allow the notebook server to
create and manage Kubeflow pipelines. Enable this if you want to use Kubeflow
pipelines.

## Affinity / Tolerations

<!-- prettier-ignore -->
!!! note "This section needs to be filled in."

## Miscellaneous Settings

Expand Down
5 changes: 5 additions & 0 deletions docs/en/3-Pipelines/Kubeflow-Pipelines.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Overview

<!-- prettier-ignore -->
!!! warning "Kubeflow pipelines are in the process of being removed from AAW."
No new development should use Kubeflow pipelines. If you have questions
about this removal, please speak with the AAW maintainers.

[Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/overview/) are in the process of being removed from AAW.

They will be replaced by [Argo Workflows](https://argoproj.github.io/argo-workflows/), which will be implemented on AAW soon.
190 changes: 190 additions & 0 deletions docs/en/4-Collaboration/Geospatial-Analytical-Environment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Geospatial Analytical Environment (GAE) - Cross Platform Access

<!-- prettier-ignore -->
??? danger "Unprotected data only; SSI coming soon:"
At this time, our Geospatial server can only host and provide access to non-sensitive statistical information.

## Getting Started

<!-- prettier-ignore -->
??? success "Prerequisites"
1. An onboarded project with access to DAS GAE ArcGIS Portal
2. An ArcGIS Portal Client Id (API Key)

The ArcGIS Enterprise Portal can be accessed in either the AAW or CAE using the API, from any service which leverages the Python programming language.

For example, in AAW and the use of [Jupyter Notebooks](https://statcan.github.io/daaas/en/1-Experiments/Jupyter/) within the space, or in CAE the use of [Databricks](https://statcan.github.io/cae-eac/en/DataBricks/), DataFactory, etc.

[The DAS GAE ArcGIS Enterprise Portal can be accessed directly here](https://geoanalytics.cloud.statcan.ca/portal)

[For help with self-registering as a DAS Geospatial Portal user](https://statcan.github.io/daaas-dads-geo/english/portal/)

<hr>

## Using the ArcGIS API for Python

### Connecting to ArcGIS Enterprise Portal using ArcGIS API

1. Install packages:

```python
conda install -c esri arcgis
```

or using Artifactory

```python3333
conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/api/conda/esri-remote arcgis
```

2. Import the necessary libraries that you will need in the Notebook.
```python
from arcgis.gis import GIS
from arcgis.gis import Item
```

3. Access the Portal
Your project group will be provided with a Client ID upon onboarding. Paste the Client ID in between the quotations ```client_id='######'```.

```python
gis = GIS("https://geoanalytics.cloud.statcan.ca/portal", client_id=' ')
print("Successfully logged in as: " + gis.properties.user.username)
```

4. - The output will redirect you to a login Portal.
- Use the StatCan Azure Login option, and your Cloud ID
- After successful login, you will receive a code to sign in using SAML.
- Paste this code into the output.


![OAuth2 Approval](../images/OAuth2Key.png)

<hr>

### Display user information
Using the 'me' function, we can display various information about the user logged in.
```python
me = gis.users.me
username = me.username
description = me.description
display(me)
```

<hr>

### Search for Content
Search for the content you have hosted on the DAaaS Geo Portal. Using the 'me' function we can search for all of the hosted content on the account. There are multiple ways to search for content. Two different methods are outlined below.

**Search all of your hosted itmes in the DAaaS Geo Portal.**
```python
my_content = me.items()
my_content
```
**Search for specific content you own in the DAaaS Geo Portal.**

This is similar to the example above, however if you know the title of they layer you want to use, you can save it as a function.
```python
my_items = me.items()
for items in my_items:
print(items.title, " | ", items.type)
if items.title == "Flood in Sorel-Tracy":
flood_item = items

else:
continue
print(flood_item)
```

**Search all content you have access to, not just your own.**

```python
flood_item = gis.content.search("tags: flood", item_type ="Feature Service")
flood_item
```

<hr>

### Get Content
We need to get the item from the DAaaS Geo Portal in order to use it in the Jupyter Notebook. This is done by providing the unique identification number of the item you want to use. Three examples are outlined below, all accessing the identical layer.
```python
item1 = gis.content.get(my_content[5].id) #from searching your content above
display(item1)

item2 = gis.content.get(flood_item.id) #from example above -searching for specific content
display(item2)

item3 = gis.content.get('edebfe03764b497f90cda5f0bfe727e2') #the actual content id number
display(item3)
```

<hr>

### Perform Analysis
Once the layers are brought into the Jupyter notebook, we are able to perform similar types of analysis you would expect to find in a GIS software such as ArcGIS. There are many modules containing many sub-modules of which can perform multiple types of analyses.
<br/>

Using the arcgis.features module, import the use_proximity submodule ```from arcgis.features import use_proximity```. This submodule allows us to '.create_buffers' - areas of equal distance from features. Here, we specify the layer we want to use, distance, units, and output name (you may also specify other characteristics such as field, ring type, end type, and others). By specifying an output name, after running the buffer command, a new layer will be automatically uploaded into the DAaaS GEO Portal containing the new feature you just created.
<br/>

```python
buffer_lyr = use_proximity.create_buffers(item1, distances=[1],
units = "Kilometers",
output_name='item1_buffer')

display(item1_buffer)
```

Some users prefer to work with Open-Source packages. Translating from ArcGIS to Spatial Dataframes is simple.
```python
# create a Spatially Enabled DataFrame object
sdf = pd.DataFrame.spatial.from_layer(feature_layer)
```

<hr>

### Update Items
By getting the item as we did similar to the example above, we can use the '.update' function to update exisiting item within the DAaaS GEO Portal. We can update item properties, data, thumbnails, and metadata.
```python
item1_buffer = gis.content.get('c60c7e57bdb846dnbd7c8226c80414d2')
item1_buffer.update(item_properties={'title': 'Enter Title'
'tags': 'tag1, tag2, tag3, tag4',
'description': 'Enter description of item'}
```

<hr>

### Visualize Your Data on an Interactive Map

**Example: MatplotLib Library**
In the code below, we create an ax object, which is a map style plot. We then plot our data ('Population Change') change column on the axes
```python
import matplotlib.pyplot as plt
ax = sdf.boundary.plot(figsize=(10, 5))
shape.plot(ax=ax, column='Population Change', legend=True)
plt.show()
```

**Example: ipyleaflet Library**
In this example we will use the library 'ipyleaflet' to create an interactive map. This map will be centered around Toronto, ON. The data being used will be outlined below.
Begin by pasting ```conda install -c conda-forge ipyleaflet``` allowing you to install ipyleaflet libraries in the Python environment.
<br/>
Import the necessary libraries.
```python
import ipyleaflet
from ipyleaflet import *
```
Now that we have imported the ipyleaflet module, we can create a simple map by specifying the latitude and longitude of the location we want, zoom level, and basemap [(more basemaps)](https://ipyleaflet.readthedocs.io/en/latest/map_and_basemaps/basemaps.html). Extra controls have been added such as layers and scale.
```python
toronto_map = Map(center=[43.69, -79.35], zoom=11, basemap=basemaps.Esri.WorldStreetMap)

toronto_map.add_control(LayersControl(position='topright'))
toronto_map.add_control(ScaleControl(position='bottomleft'))
toronto_map
```
<br/>

##Learn More about the ArcGIS API for Python
[Full documentation for the ArGIS API can be located here](https://developers.arcgis.com/python/)

##Learn More about DAS Geospatial Analytical Environment (GAE) and Services
[GAE Help Guide](https://statcan.github.io/daaas-dads-geo/)
3 changes: 3 additions & 0 deletions docs/en/5-Storage/MinIO.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ S3 storage). Buckets are good at three things:

## MinIO Mounted Folders on a Notebook Server

<!-- prettier-ignore -->
!!! warning "MinIO mounts are not currently working on Protected B servers."

Your MinIO storage are mounted as directories if you select the option
Expand Down Expand Up @@ -104,6 +105,7 @@ This lets you browse, upload/download, delete, or share files.

## Browse Datasets

<!-- prettier-ignore -->
!!! warning "The link below is not currently working."

Browse some [datasets](https://datasets.covid.cloud.statcan.ca) here. These data
Expand Down Expand Up @@ -218,6 +220,7 @@ send to a collaborator!

## Get MinIO Credentials

<!-- prettier-ignore -->
!!! warning "The methods below have not been tested recently, since certain MinIO changes. These may require adjustment."

<!-- prettier-ignore -->
Expand Down
Loading

0 comments on commit 1e2f11d

Please sign in to comment.