Skip to content

Commit

Permalink
Merge branch 'develop' into 3799-noninteractive_installer #3799
Browse files Browse the repository at this point in the history
  • Loading branch information
pdurbin committed May 2, 2017
2 parents 0f69ad5 + e15eafb commit 03c0dd0
Show file tree
Hide file tree
Showing 22 changed files with 467 additions and 90 deletions.
6 changes: 6 additions & 0 deletions doc/sphinx-guides/source/_static/util/robots.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
User-agent: *
Allow: /$
Allow: /dataverse.xhtml
Allow: /dataset.xhtml
Disallow: /
Crawl-delay: 20
28 changes: 27 additions & 1 deletion doc/sphinx-guides/source/admin/geoconnect-worldmap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,33 @@ Geoconnect and WorldMap

.. contents:: :local:

One of the optional components listed under "Architecture and Components" in the :doc:`/installation/prep` section of the Installation Guide is `Geoconnect <https://github.com/IQSS/geoconnect>`_, piece of middleware that allows Dataverse users to create maps in `WorldMap <http://worldmap.harvard.edu>`_ based on geospatial data stored in Dataverse. For more details on the feature from the user perspective, see the :doc:`/user/data-exploration/worldmap` section of the User Guide.
One of the optional components listed under "Architecture and Components" in the :doc:`/installation/prep` section of the Installation Guide is `Geoconnect <https://github.com/IQSS/geoconnect>`_, a piece of middleware that allows Dataverse users to create maps in `WorldMap <http://worldmap.harvard.edu>`_ based on geospatial data stored in Dataverse. For more details on the feature from the user perspective, see the :doc:`/user/data-exploration/worldmap` section of the User Guide.

Update "mapitlink"
------------------

SQL commands to point a Dataverse installation at different Geoconnect servers:


**Geoconnect Production** *geoconnect.datascience.iq.harvard.edu*

.. code-block:: sql
update worldmapauth_tokentype set mapitlink = 'https://geoconnect.datascience.iq.harvard.edu/shapefile/map-it', hostname='geoconnect.datascience.iq.harvard.edu' where name = 'GEOCONNECT';
**Heroku Test** *geoconnect-dev.herokuapp.com*

.. code-block:: sql
update worldmapauth_tokentype set mapitlink = 'https://geoconnect-dev.herokuapp.com/shapefile/map-it', hostname='geoconnect-dev.herokuapp.com' where name = 'GEOCONNECT';
**View Current Settings**

.. code-block:: sql
SELECT * from worldmapauth_tokentype;
Removing Dead Explore Links
---------------------------
Expand Down
39 changes: 25 additions & 14 deletions doc/sphinx-guides/source/developers/dev-environment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ Additional Tools

Please see also the :doc:`/developers/tools` page, which lists additional tools that very useful but not essential.

Setting up your dev environment
Setting Up Your Dev Environment
-------------------------------

SSH keys
~~~~~~~~
Set Up SSH Keys
~~~~~~~~~~~~~~~

You can use git with passwords over HTTPS, but it's much nicer to set up SSH keys. https://github.com/settings/ssh is the place to manage the ssh keys GitHub knows about for you. That page also links to a nice howto: https://help.github.com/articles/generating-ssh-keys

Expand Down Expand Up @@ -135,7 +135,7 @@ Once Solr is up and running you should be able to see a "Solr Admin" dashboard a

Once some dataverses, datasets, and files have been created and indexed, you can experiment with searches directly from Solr at http://localhost:8983/solr/#/collection1/query and look at the JSON output of searches, such as this wildcard search: http://localhost:8983/solr/collection1/select?q=*%3A*&wt=json&indent=true . You can also get JSON output of static fields Solr knows about: http://localhost:8983/solr/schema/fields

Run installer
Run Installer
~~~~~~~~~~~~~

Once you install Glassfish and PostgreSQL, you need to configure the environment for the Dataverse app - configure the database connection, set some options, etc. We have a new installer script that should do it all for you. Again, assuming that the clone on the Dataverse repository was retrieved using NetBeans and that it is saved in the path ~/NetBeansProjects:
Expand All @@ -150,10 +150,26 @@ The script is a variation of the old installer from DVN 3.x that calls another s

All the future changes to the configuration that are Glassfish-specific and can be done through ``asadmin`` should now go into ``scripts/install/glassfish-setup.sh``.

Rebuilding Your Dev Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you have an old copy of the database and old Solr data and want to start fresh, here are the recommended steps:

- drop your old database
- clear out your existing Solr index: ``scripts/search/clear``
- run the installer script above - it will create the db, deploy the app, populate the db with reference data and run all the scripts that create the domain metadata fields. You no longer need to perform these steps separately.
- confirm you are using the latest Dataverse-specific Solr schema.xml per the "Installing and Running Solr" section of this guide
- confirm http://localhost:8080 is up
- If you want to set some dataset-specific facets, go to the root dataverse (or any dataverse; the selections can be inherited) and click "General Information" and make choices under "Select Facets". There is a ticket to automate this: https://github.com/IQSS/dataverse/issues/619

You may also find https://github.com/IQSS/dataverse/blob/develop/scripts/deploy/phoenix.dataverse.org/deploy and related scripts interesting because they demonstrate how we have at least partially automated the process of tearing down a Dataverse installation and having it rise again, hence the name "phoenix." See also "Fresh Reinstall" in the :doc:`/installation/installation-main` section of the Installation Guide.

Shibboleth and OAuth
--------------------

If you are working on anything related to users, please keep in mind that your changes will likely affect Shibboleth and OAuth users. Rather than setting up Shibboleth on your laptop, developers are advised to simply add a value to their database to enable Shibboleth "dev mode" like this:
If you are working on anything related to users, please keep in mind that your changes will likely affect Shibboleth and OAuth users. For some background on user accounts in Dataverse, see "Auth Modes: Local vs. Remote vs. Both" in the :doc:`/installation/config` section of the Installation Guide.

Rather than setting up Shibboleth on your laptop, developers are advised to simply add a value to their database to enable Shibboleth "dev mode" like this:

``curl http://localhost:8080/api/admin/settings/:DebugShibAccountType -X PUT -d RANDOM``

Expand All @@ -171,14 +187,9 @@ For a list of possible values, please "find usages" on the settings key above an

Now when you go to http://localhost:8080/oauth2/firstLogin.xhtml you should be prompted to create a Shibboleth account.

Rebuilding your dev environment
-------------------------------
Geoconnect
----------

If you have an old copy of the database and old Solr data and want to start fresh, here are the recommended steps:
Geoconnect works as a middle layer, allowing geospatial data files in Dataverse to be visualized with Harvard WorldMap. To set up a Geoconnect development environment, you can follow the steps outlined in the `local_setup.md <https://github.com/IQSS/geoconnect/blob/master/local_setup.md>`_ guide. You will need Python and a few other prerequisites.

- drop your old database
- clear out your existing Solr index: ``scripts/search/clear``
- run the installer script above - it will create the db, deploy the app, populate the db with reference data and run all the scripts that create the domain metadata fields. You no longer need to perform these steps separately.
- confirm you are using the latest Dataverse-specific Solr schema.xml per the "Installing and Running Solr" section of this guide
- confirm http://localhost:8080 is up
- If you want to set some dataset-specific facets, go to the root dataverse (or any dataverse; the selections can be inherited) and click "General Information" and make choices under "Select Facets". There is a ticket to automate this: https://github.com/IQSS/dataverse/issues/619
As mentioned under "Architecture and Components" in the :doc:`/installation/prep` section of the Installation Guide, Geoconnect is an optional component of Dataverse, so this section is only necessary to follow it you are working on an issue related to this feature.
174 changes: 174 additions & 0 deletions doc/sphinx-guides/source/developers/geospatial.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
===============
Geospatial Data
===============

How Dataverse Ingests Shapefiles
--------------------------------

A shapefile is a set of files, often uploaded/transferred in ``.zip`` format. This set may contain up to fifteen files. A minimum of three specific files (``.shp``, ``.shx``, ``.dbf``) are needed to be a valid shapefile and a fourth file (``.prj``) is required for WorldMap -- or any type of meaningful visualization.

For ingest and connecting to WorldMap, four files are the minimum required:

- ``.shp`` - shape format; the feature geometry itself
- ``.shx`` - shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
- ``.dbf`` - attribute format; columnar attributes for each shape, in dBase IV format
- ``.prj`` - projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format

Ingest
~~~~~~

When uploaded to Dataverse, the ``.zip`` is unpacked (same as all ``.zip`` files). Shapefile sets are recognized by the same base name and specific extensions. These individual files constitute a shapefile set. The first four are the minimum required (``.shp``, ``.shx``, ``.dbf``, ``.prj``)

For example:

- bicycles.shp (required extension)
- bicycles.shx (required extension)
- bicycles.prj (required extension)
- bicycles.dbf (required extension)
- bicycles.sbx (NOT required extension)
- bicycles.sbn (NOT required extension)

Upon recognition of the four required files, Dataverse will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set:

- Required: ``.shp``, ``.shx``, ``.dbf``, ``.prj``
- Optional: ``.sbn``, ``.sbx``, ``.fbn``, ``.fbx``, ``.ain``, ``.aih``, ``.ixs``, ``.mxs``, ``.atx``, ``.cpg``, ``shp.xml``

Then Dataverse creates a new ``.zip`` with mimetype as a shapefile. The shapefile set will persist as this new ``.zip``.

Example
~~~~~~~

**1a.** Original ``.zip`` contents:

A file named ``bikes_and_subways.zip`` is uploaded to the Dataverse. This ``.zip`` contains the following files.

- ``bicycles.shp`` (shapefile set #1)
- ``bicycles.shx`` (shapefile set #1)
- ``bicycles.prj`` (shapefile set #1)
- ``bicycles.dbf`` (shapefile set #1)
- ``bicycles.sbx`` (shapefile set #1)
- ``bicycles.sbn`` (shapefile set #1)
- ``bicycles.txt``
- ``the_bikes.md``
- ``readme.txt``
- ``subway_line.shp`` (shapefile set #2)
- ``subway_line.shx`` (shapefile set #2)
- ``subway_line.prj`` (shapefile set #2)
- ``subway_line.dbf`` (shapefile set #2)

**1b.** Dataverse unzips and re-zips files:

Upon ingest, Dataverse unpacks the file ``bikes_and_subways.zip``. Upon recognizing the shapefile sets, it groups those files together into new ``.zip`` files:

- files making up the "bicycles" shapefile become a new ``.zip``
- files making up the "subway_line" shapefile become a new ``.zip``
- remaining files will stay as they are

To ensure that a shapefile set remains intact, individual files such as ``bicycles.sbn`` are kept in the set -- even though they are not used for mapping.

**1c.** Dataverse final file listing:

- ``bicycles.zip`` (contains shapefile set #1: ``bicycles.shp``, ``bicycles.shx``, ``bicycles.prj``, ``bicycles.dbf``, ``bicycles.sbx``, ``bicycles.sbn``)
- ``bicycles.txt`` (separate, not part of a shapefile set)
- ``the_bikes.md`` (separate, not part of a shapefile set)
- ``readme.txt`` (separate, not part of a shapefile set)
- ``subway_line.zip`` (contains shapefile set #2: ``subway_line.shp``, ``subway_line.shx``, ``subway_line.prj``, ``subway_line.dbf``)

For two "final" shapefile sets, ``bicycles.zip`` and ``subway_line.zip``, a new mimetype is used:

- Mimetype: ``application/zipped-shapefile``
- Mimetype Label: "Shapefile as ZIP Archive"

WorldMap JoinTargets + API Endpoint
-----------------------------------

WorldMap supplies target layers -- or JoinTargets -- that a tabular file may be mapped against. A JSON description of these `CGA <http://gis.harvard.edu>`_-curated JoinTargets may be retrieved via API at ``http://worldmap.harvard.edu/datatables/api/jointargets/``. Please note: login is required. You may use any WorldMap account credentials via HTTP Basic Auth.

Example of JoinTarget information returned via the API:

.. code-block:: json
{
"data":[
{
"layer":"geonode:census_tracts_2010_boston_6f6",
"name":"Census Tracts, Boston (GEOID10: State+County+Tract)",
"geocode_type_slug":"us-census-tract",
"geocode_type":"US Census Tract",
"attribute":{
"attribute":"CT_ID_10",
"type":"xsd:string"
},
"abstract":"As of the 2010 census, Boston, MA contains 7,288 city blocks [truncated for example]",
"title":"Census Tracts 2010, Boston (BARI)",
"expected_format":{
"expected_zero_padded_length":-1,
"is_zero_padded":false,
"description":"Concatenation of state, county and tract for 2010 Census Tracts. Reference: https://www.census.gov/geo/maps-data/data/tract_rel_layout.html\r\n\r\nNote: Across the US, this can be a zero-padded \"string\" but the original Boston layer has this column as \"numeric\" ",
"name":"2010 Census Boston GEOID10 (State+County+Tract)"
},
"year":2010,
"id":28
},
{
"layer":"geonode:addresses_2014_boston_1wr",
"name":"Addresses, Boston",
"geocode_type_slug":"boston-administrative-geography",
"geocode_type":"Boston, Administrative Geography",
"attribute":{
"attribute":"LocationID",
"type":"xsd:int"
},
"abstract":"Unique addresses present in the parcels data set, which itself is derived from [truncated for example]",
"title":"Addresses 2015, Boston (BARI)",
"expected_format":{
"expected_zero_padded_length":-1,
"is_zero_padded":false,
"description":"Boston, Administrative Geography, Boston Address Location ID. Example: 1, 2, 3...nearly 120000",
"name":"Boston Address Location ID (integer)"
},
"year":2015,
"id":18
},
{
"layer":"geonode:bra_neighborhood_statistical_areas_2012__ug9",
"name":"BRA Neighborhood Statistical Areas, Boston",
"geocode_type_slug":"boston-administrative-geography",
"geocode_type":"Boston, Administrative Geography",
"attribute":{
"attribute":"BOSNA_R_ID",
"type":"xsd:double"
},
"abstract":"BRA Neighborhood Statistical Areas 2015, Boston. Provided by [truncated for example]",
"title":"BRA Neighborhood Statistical Areas 2015, Boston (BARI)",
"expected_format":{
"expected_zero_padded_length":-1,
"is_zero_padded":false,
"description":"Boston, Administrative Geography, Boston BRA Neighborhood Statistical Area ID (integer). Examples: 1, 2, 3, ... 68, 69",
"name":"Boston BRA Neighborhood Statistical Area ID (integer)"
},
"year":2015,
"id":17
}
],
"success":true
}
How Geoconnect Uses Join Target Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When a user attempts to map a tabular file, the application looks in the Geoconnect database for ``JoinTargetInformation``. If this information is more than 10 minutes* old, the application will retrieve fresh information and save it to the db.

(* Change the timing via the Django settings variable ``JOIN_TARGET_UPDATE_TIME``.)

This JoinTarget info is used to populate HTML forms used to match a tabular file column to a JoinTarget column. Once a JoinTarget is chosen, the JoinTarget ID is an essential piece of information used to make an API call to the WorldMap and attempt to map the file.

Retrieving Join Target Information from WorldMap API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``get_join_targets()`` function in ``dataverse_layer_services.py`` uses the WorldMap API, retrieves a list of available tabular file JointTargets. (See the `dataverse_layer_services code in GitHub <https://github.com/IQSS/geoconnect/blob/master/gc_apps/worldmap_connect/dataverse_layer_services.py#L275>`_.)

Saving Join Target Information to Geoconnect Database
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``get_latest_jointarget_information()`` in ``utils.py`` retrieves recent JoinTarget Information from the database. (See the `utils code in GitHub <https://github.com/IQSS/geoconnect/blob/master/gc_apps/worldmap_connect/utils.py#L16>`_.)
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ Contents:
making-releases
tools
unf/index
geospatial
selinux
Loading

0 comments on commit 03c0dd0

Please sign in to comment.