Merge branch 'master' into doc_remove_kubeflow_pipelines

StatCan · Feb 7, 2023 · 1e2f11d · 1e2f11d
2 parents 72e6fd9 + c55115f
commit 1e2f11d
Show file tree

Hide file tree

Showing 17 changed files with 477 additions and 44 deletions.
diff --git a/README.md b/README.md
@@ -24,6 +24,7 @@ The AAW includes tools that allow data science users to open almost any file. Th
 - sqlite
 - many others... just ask :-)
 
+### How much does the AAW cost?
 
 #### CPU Only
 

diff --git a/docs/en/1-Experiments/Kubeflow.md b/docs/en/1-Experiments/Kubeflow.md
@@ -52,9 +52,11 @@ for your team.
 
 ## Image
 
-You will need to choose an image. There are JupyterLab, RStudio, and Ubuntu remote
-desktop images available. Select the drop down menu to select additional options
-within these (for instance, CPU, PyTorch, and TensorFlow images for JupyterLab).
+You will need to choose an image. There are JupyterLab, RStudio, Ubuntu remote
+desktop, and SAS images available. The SAS image is only available for StatCan
+employees (due to license limitations), the others are available for everyone.
+Select the drop down menu to select additional options within these (for
+instance, CPU, PyTorch, and TensorFlow images for JupyterLab).
 
 Check the name of the images and choose one that matches what you want to do. Don't know
 which one to choose? Check out your options [here](./Selecting-an-Image.md).
@@ -63,33 +65,43 @@ which one to choose? Check out your options [here](./Selecting-an-Image.md).
 
 ## CPU and Memory
 
-- At the time of writing (December 23, 2021) there are two types of computers in
-  the cluster
-
-  - **CPU:** `D16s v3` (16 CPU cores, 64 GiB memory; for user use 15 CPU cores
-    and 48 GiB memory are available; 1 CPU core and 16 GiB memory reserved for
-    system use).
-  - **GPU:** `NC6s_v3` (6 CPU cores, 112 GiB memory, 1 GPU; for user use 96 GiB
-    memory are available; 16 GiB memory reserved for system use). The available
-    GPU is the NVIDIA Tesla V100 GPU with specs 
-    [here](https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf).
-
-  When creating a notebook server, the system will limit you to the maximum
-  specifications above. For CPU notebook servers, you can specify the exact
-  amount of CPU and memory that you require. This allows you to meet your
-  compute needs while minimising cost. For a GPU notebook server, you will
-  always get the full server (6 CPU cores, 96 GiB accessible memory, and 1 GPU).
-  See below section on GPUs for information on how to select a GPU server.
-
-  In the future there may be larger machines available, so you may have looser
-  restrictions.
+At the time of writing (December 23, 2021) there are two types of computers in
+the cluster
+
+ - **CPU:** `D16s v3` (16 CPU cores, 64 GiB memory; for user use 15 CPU cores
+   and 48 GiB memory are available; 1 CPU core and 16 GiB memory reserved for
+   system use).
+ - **GPU:** `NC6s_v3` (6 CPU cores, 112 GiB memory, 1 GPU; for user use 96 GiB
+   memory are available; 16 GiB memory reserved for system use). The available
+   GPU is the NVIDIA Tesla V100 GPU with specs
+   [here](https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf).
+
+When creating a notebook server, the system will limit you to the maximum
+specifications above. For CPU notebook servers, you can specify the exact
+amount of CPU and memory that you require. This allows you to meet your
+compute needs while minimising cost. For a GPU notebook server, you will
+always get the full server (6 CPU cores, 96 GiB accessible memory, and 1 GPU).
+See below section on GPUs for information on how to select a GPU server.
+
+In the advanced options, you can select a higher limit than the number of CPU cores and
+RAM requested. The amount requested is the amount guaranteed to be available for your
+notebook server and you will always pay for at least this much. If the limit is higher
+than the amount requested, if additional RAM and CPU cores are available on that shared
+server in the cluster your notebook server can use them as needed. One use case for this
+is jobs that usually need only one CPU core but can benefit from multithreading to speed
+up certain operations. By requesting one CPU core but a higher limit, you can pay much
+less for the notebook server while allowing it to use spare unused CPU cores as needed
+to speed up computations.
+
+![Select CPU and RAM](../images/cpu-ram-select.png)
 
 ## GPUs
 
 If you want a GPU server, select `1` as the number of GPUs and `NVIDIA` as the GPU
 vendor (the create button will be greyed out until the GPU vendor is selected if
-you have a GPU specified). Multi-GPU servers are not currently supported on the
-AAW system.
+you have a GPU specified). Multi-GPU servers are currently supported on the AAW
+system only on a special on-request basis, please contact the AAW maintainers if
+you would like a multi-GPU server.
 
 ![GPU Configuration](../images/kubeflow_gpu_selection.jpg)
 
@@ -110,10 +122,6 @@ are various configuration options available:
 
 - You can specify the size of the workspace volume, from 4 GiB to 32 GiB.
 
-- You can choose the option to not use persistent storage for home, in which case the
-  home folder will be deleted as soon as the notebook server is closed. Otherwise the
-  home folder will remain and can be used again for a new notebook server in the future.
-
 ![Create a Workspace Volume](../images/workspace-volume.PNG)
 
 <!-- prettier-ignore -->
@@ -124,17 +132,23 @@ are various configuration options available:
 ## Data Volumes
 
 You can also create data volumes that can be used to store additional data. Multiple
-data volumes can be created. Click the add volume button to create a new volume and specify 
-its configuration. There are the following configuration parameters as for data volumes:
-
-- **Type**: Create a new volume or use an existing volume.
+data volumes can be created. Click the add new volume button to create a new volume and
+specify its configuration. Click the attach existing volume button to mount an existing
+data volume to the notebook server. There are the following configuration parameters for
+data volumes:
 
 - **Name**: Name of the volume.
 
 - **Size in GiB**: From 4 GiB to 512 GiB.
 
-- **Mount Point**: Path where the data volume can be accessed on the notebook server, by
-  default `/home/jovyan/<volume name>`.
+- **Mount path**: Path where the data volume can be accessed on the notebook server, by
+  default `/home/jovyan/vol-1`, `/home/jovyan/vol-2`, etc. (incrementing counter per data
+  volume mounted).
+
+When mounting an existing data volume, the name option becomes a drop-down list of the
+existing data volumes. Only a volume not currently mounted to an existing notebook server
+can be used. The mount path option remains user-configurable with the same defaults as
+creating a new volume.
 
 The garbage can icon on the right can be used to delete an existing or accidentally created
 data volume.
@@ -152,14 +166,6 @@ There are currently three checkbox options available here:
   access to any Protected B resources. Protected B notebook servers run with many
   security restrictions and have access to separate MinIO instances specifically
   designed for Protected B data.
-- **Allow access to Kubeflow Pipelines**: This will allow the notebook server to
-  create and manage Kubeflow pipelines. Enable this if you want to use Kubeflow
-  pipelines.
-
-## Affinity / Tolerations
-
-<!-- prettier-ignore -->
-!!! note "This section needs to be filled in."
 
 ## Miscellaneous Settings
 

diff --git a/docs/en/3-Pipelines/Kubeflow-Pipelines.md b/docs/en/3-Pipelines/Kubeflow-Pipelines.md
@@ -1,5 +1,10 @@
 # Overview
 
+<!-- prettier-ignore -->
+!!! warning "Kubeflow pipelines are in the process of being removed from AAW." 
+    No new development should use Kubeflow pipelines. If you have questions
+    about this removal, please speak with the AAW maintainers.
+
 [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/overview/) are in the process of being removed from AAW.
 
 They will be replaced by [Argo Workflows](https://argoproj.github.io/argo-workflows/), which will be implemented on AAW soon.
diff --git a/docs/en/4-Collaboration/Geospatial-Analytical-Environment.md b/docs/en/4-Collaboration/Geospatial-Analytical-Environment.md
@@ -0,0 +1,190 @@
+# Geospatial Analytical Environment (GAE) - Cross Platform Access
+
+<!-- prettier-ignore -->
+??? danger "Unprotected data only; SSI coming soon:"
+	At this time, our Geospatial server can only host and provide access to non-sensitive statistical information.  
+
+## Getting Started
+
+<!-- prettier-ignore -->
+??? success "Prerequisites"
+	1. An onboarded project with access to DAS GAE ArcGIS Portal 	
+	2. An ArcGIS Portal Client Id (API Key)
+
+The ArcGIS Enterprise Portal can be accessed in either the AAW or CAE using the API, from any service which leverages the Python programming language. 
+
+For example, in AAW and the use of [Jupyter Notebooks](https://statcan.github.io/daaas/en/1-Experiments/Jupyter/) within the space, or in CAE the use of [Databricks](https://statcan.github.io/cae-eac/en/DataBricks/), DataFactory, etc.
+
+[The DAS GAE ArcGIS Enterprise Portal can be accessed directly here](https://geoanalytics.cloud.statcan.ca/portal)
+
+[For help with self-registering as a DAS Geospatial Portal user](https://statcan.github.io/daaas-dads-geo/english/portal/)
+
+<hr>
+
+## Using the ArcGIS API for Python
+
+### Connecting to ArcGIS Enterprise Portal using ArcGIS API
+
+1. Install packages:
+
+	```python
+	conda install -c esri arcgis
+	```
+
+	or using Artifactory
+
+	```python3333
+	conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/api/conda/esri-remote arcgis
+	```
+
+2. Import the necessary libraries that you will need in the Notebook.
+	```python
+	from arcgis.gis import GIS
+	from arcgis.gis import Item
+	```
+
+3. Access the Portal
+	Your project group will be provided with a Client ID upon onboarding. Paste the Client ID in between the quotations ```client_id='######'```. 
+
+	```python
+	gis = GIS("https://geoanalytics.cloud.statcan.ca/portal", client_id=' ')
+	print("Successfully logged in as: " + gis.properties.user.username)
+	```
+
+4. - The output will redirect you to a login Portal.
+	- Use the StatCan Azure Login option, and your Cloud ID 
+	- After successful login, you will receive a code to sign in using SAML. 
+	- Paste this code into the output. 
+
+
+	![OAuth2 Approval](../images/OAuth2Key.png)
+
+<hr>
+
+### Display user information
+Using the 'me' function, we can display various information about the user logged in.
+```python
+me = gis.users.me
+username = me.username
+description = me.description
+display(me)
+```
+
+<hr>
+
+### Search for Content
+Search for the content you have hosted on the DAaaS Geo Portal. Using the 'me' function we can search for all of the hosted content on the account. There are multiple ways to search for content. Two different methods are outlined below.
+
+**Search all of your hosted itmes in the DAaaS Geo Portal.**
+```python
+my_content = me.items()
+my_content
+```
+**Search for specific content you own in the DAaaS Geo Portal.**
+
+This is similar to the example above, however if you know the title of they layer you want to use, you can save it as a function.
+```python
+my_items = me.items()
+for items in my_items:
+    print(items.title, " | ", items.type)
+    if items.title == "Flood in Sorel-Tracy":
+        flood_item = items
+
+    else:
+        continue
+print(flood_item)
+```
+
+**Search all content you have access to, not just your own.**
+
+```python
+flood_item = gis.content.search("tags: flood", item_type ="Feature Service")
+flood_item
+```
+
+<hr>
+
+### Get Content
+We need to get the item from the DAaaS Geo Portal in order to use it in the Jupyter Notebook. This is done by providing the unique identification number of the item you want to use. Three examples are outlined below, all accessing the identical layer.
+```python
+item1 = gis.content.get(my_content[5].id) #from searching your content above
+display(item1)
+
+item2 = gis.content.get(flood_item.id) #from example above -searching for specific content
+display(item2)
+
+item3 = gis.content.get('edebfe03764b497f90cda5f0bfe727e2') #the actual content id number
+display(item3)
+```
+
+<hr>
+
+### Perform Analysis
+Once the layers are brought into the Jupyter notebook, we are able to perform similar types of analysis you would expect to find in a GIS software such as ArcGIS. There are many modules containing many sub-modules of which can perform multiple types of analyses.
+<br/>
+
+Using the arcgis.features module, import the use_proximity submodule ```from arcgis.features import use_proximity```. This submodule allows us to '.create_buffers' - areas of equal distance from features. Here, we specify the layer we want to use, distance, units, and output name (you may also specify other characteristics such as field, ring type, end type, and others). By specifying an output name, after running the buffer command, a new layer will be automatically uploaded into the DAaaS GEO Portal containing the new feature you just created.
+<br/>
+
+```python
+buffer_lyr = use_proximity.create_buffers(item1, distances=[1], 
+                                          units = "Kilometers", 
+                                          output_name='item1_buffer')
+
+display(item1_buffer)
+```
+
+Some users prefer to work with Open-Source packages.  Translating from ArcGIS to Spatial Dataframes is simple.
+```python
+# create a Spatially Enabled DataFrame object
+sdf = pd.DataFrame.spatial.from_layer(feature_layer)
+```
+
+<hr>
+
+### Update Items
+By getting the item as we did similar to the example above, we can use the '.update' function to update exisiting item within the DAaaS GEO Portal. We can update item properties, data, thumbnails, and metadata.
+```python
+item1_buffer = gis.content.get('c60c7e57bdb846dnbd7c8226c80414d2')
+item1_buffer.update(item_properties={'title': 'Enter Title'
+									 'tags': 'tag1, tag2, tag3, tag4',
+                                     'description': 'Enter description of item'}
+```
+
+<hr>
+
+### Visualize Your Data on an Interactive Map
+
+**Example: MatplotLib Library**
+In the code below, we create an ax object, which is a map style plot. We then plot our data ('Population Change') change column on the axes
+```python
+import matplotlib.pyplot as plt
+ax = sdf.boundary.plot(figsize=(10, 5))
+shape.plot(ax=ax, column='Population Change', legend=True)
+plt.show()
+```
+
+**Example: ipyleaflet Library**
+In this example we will use the library 'ipyleaflet' to create an interactive map. This map will be centered around Toronto, ON. The data being used will be outlined below.
+Begin by pasting ```conda install -c conda-forge ipyleaflet``` allowing you to install ipyleaflet libraries in the Python environment.
+<br/>
+Import the necessary libraries.
+```python
+import ipyleaflet 
+from ipyleaflet import *
+```
+Now that we have imported the ipyleaflet module, we can create a simple map by specifying the latitude and longitude of the location we want, zoom level, and basemap [(more basemaps)](https://ipyleaflet.readthedocs.io/en/latest/map_and_basemaps/basemaps.html). Extra controls have been added such as layers and scale.
+```python
+toronto_map = Map(center=[43.69, -79.35], zoom=11, basemap=basemaps.Esri.WorldStreetMap)
+
+toronto_map.add_control(LayersControl(position='topright'))
+toronto_map.add_control(ScaleControl(position='bottomleft'))
+toronto_map
+```
+<br/>
+
+##Learn More about the ArcGIS API for Python
+[Full documentation for the ArGIS API can be located here](https://developers.arcgis.com/python/)
+
+##Learn More about DAS Geospatial Analytical Environment (GAE) and Services
+[GAE Help Guide](https://statcan.github.io/daaas-dads-geo/)
diff --git a/docs/en/5-Storage/MinIO.md b/docs/en/5-Storage/MinIO.md
@@ -16,6 +16,7 @@ S3 storage). Buckets are good at three things:
 
 ## MinIO Mounted Folders on a Notebook Server
 
+<!-- prettier-ignore -->
 !!! warning "MinIO mounts are not currently working on Protected B servers."
 
 Your MinIO storage are mounted as directories if you select the option
@@ -104,6 +105,7 @@ This lets you browse, upload/download, delete, or share files.
 
 ## Browse Datasets
 
+<!-- prettier-ignore -->
 !!! warning "The link below is not currently working."
 
 Browse some [datasets](https://datasets.covid.cloud.statcan.ca) here. These data
@@ -218,6 +220,7 @@ send to a collaborator!
 
 ## Get MinIO Credentials
 
+<!-- prettier-ignore -->
 !!! warning "The methods below have not been tested recently, since certain MinIO changes. These may require adjustment."
 
 <!-- prettier-ignore -->