Merge branch 'develop' into 3575-usernames #3575

IQSS · May 15, 2019 · cb95216 · cb95216
2 parents 49486ed + 5c79354
commit cb95216
Show file tree

Hide file tree

Showing 91 changed files with 171,181 additions and 687 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,6 +2,9 @@ nb-configuration.xml
 target
 infer-out
 nbactions.xml
+.settings
+.classpath
+.project
 michael-local
 GPATH
 GTAGS

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -57,7 +57,7 @@ If you are interested in working on the main Dataverse code, great! Before you s
 
 Please read http://guides.dataverse.org/en/latest/developers/version-control.html to understand how we use the "git flow" model of development and how we will encourage you to create a GitHub issue (if it doesn't exist already) to associate with your pull request. That page also includes tips on making a pull request.
 
-After making your pull request, your goal should be to help it advance through our kanban board at https://waffle.io/IQSS/dataverse . If no one has moved your pull request to the code review column in a timely manner, please reach out. Thanks!
+After making your pull request, your goal should be to help it advance through our kanban board at https://github.com/orgs/IQSS/projects/2 . If no one has moved your pull request to the code review column in a timely manner, please reach out. Thanks!
 
 [dataverse-community Google Group]: https://groups.google.com/group/dataverse-community
 [Community Call]: https://dataverse.org/community-calls

diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md
@@ -4,7 +4,7 @@ Welcome! New contributors should at least glance at [CONTRIBUTING.md](/CONTRIBUT
 
 ## Related Issues
 
-- connects to #ISSUE_NUMBER: ISSUE_TITLE
+- #ISSUE_NUMBER: ISSUE_TITLE
 
 ## Pull Request Checklist
 

diff --git a/README.md b/README.md
@@ -17,8 +17,6 @@ Dataverse is a trademark of President and Fellows of Harvard College and is regi
 
 [![Dataverse Project logo](src/main/webapp/resources/images/dataverseproject_logo.jpg?raw=true "Dataverse Project")](http://dataverse.org)
 
-[![Waffle.io - Columns and their card count](https://badge.waffle.io/IQSS/dataverse.svg?columns=all)](https://waffle.io/IQSS/dataverse)
-
 [![Build Status](https://travis-ci.org/IQSS/dataverse.svg?branch=develop)](https://travis-ci.org/IQSS/dataverse) [![Coverage Status](https://coveralls.io/repos/IQSS/dataverse/badge.svg?branch=develop&service=github)](https://coveralls.io/github/IQSS/dataverse?branch=develop)
 
 [dataverse.org]: https://dataverse.org

diff --git a/doc/sphinx-guides/source/admin/dashboard.rst b/doc/sphinx-guides/source/admin/dashboard.rst
@@ -29,3 +29,7 @@ Users
 
 This dashboard tool allows you to search a list of all users of your Dataverse installation. You can remove roles from user accounts and assign or remove superuser status. See the :doc:`user-administration` section for more details.
 
+Move Data
+---------
+
+This tool allows you to move datasets. To move dataverses, see the :doc:`dataverses-datasets` section.
diff --git a/doc/sphinx-guides/source/admin/dataverses-datasets.rst b/doc/sphinx-guides/source/admin/dataverses-datasets.rst
@@ -46,7 +46,9 @@ Datasets
 Move a Dataset
 ^^^^^^^^^^^^^^
 
-Moves a dataset whose id is passed to a dataverse whose alias is passed. If the moved dataset has a guestbook or a dataverse link that is not compatible with the destination dataverse, you will be informed and given the option to force the move and remove the guestbook or link. Only accessible to users with permission to publish the dataset in the original and destination dataverse. ::
+Superusers can move datasets using the dashboard. See also :doc:`dashboard`.
+
+Moves a dataset whose id is passed to a dataverse whose alias is passed. If the moved dataset has a guestbook or a dataverse link that is not compatible with the destination dataverse, you will be informed and given the option to force the move (with ``forceMove=true`` as a query parameter) and remove the guestbook or link (or both). Only accessible to users with permission to publish the dataset in the original and destination dataverse. ::
 
     curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/$id/move/$alias
 
@@ -64,8 +66,23 @@ Removes a link between a dataset and a dataverse. Only accessible to superusers.
 
     curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE http://$SERVER/api/datasets/$linked-dataset-id/deleteLink/$linking-dataverse-alias
 
-Mint new PID for a Dataset
-^^^^^^^^^^^^^^^^^^^^^^^^^^
+Mint a PID for a File That Does Not Have One
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the following example, the database id of the file is 42::
+
+    export FILE_ID=42
+    curl http://localhost:8080/api/admin/$FILE_ID/registerDataFile
+
+Mint PIDs for Files That Do Not Have Them
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you have a large number of files, you might want to consider miniting PIDs for files individually using the ``registerDataFile`` endpoint above in a for loop, sleeping between each registration::
+
+    curl http://localhost:8080/api/admin/registerDataFileAll
+
+Mint a New DOI for a Dataset with a Handle
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Mints a new identifier for a dataset previously registered with a handle. Only accessible to superusers. ::
 

diff --git a/doc/sphinx-guides/source/admin/harvestserver.rst b/doc/sphinx-guides/source/admin/harvestserver.rst
@@ -14,6 +14,14 @@ harvesting protocol. Note that the terms "Harvesting Server" and "OAI
 Server" are being used interchangeably throughout this guide and in
 the inline help text.
 
+If you want to learn more about OAI-PMH, you could take a look at
+`DataCite OAI-PMH guide <https://support.datacite.org/docs/datacite-oai-pmh>`_
+or the `OAI-PMH protocol definition <https://www.openarchives.org/OAI/openarchivesprotocol.html>`_.
+
+You might consider adding your OAI-enabled production instance of Dataverse to
+`this shared list <https://docs.google.com/spreadsheets/d/12cxymvXCqP_kCsLKXQD32go79HBWZ1vU_tdG4kvP5S8/>`_
+of such instances.
+
 How does it work? 
 -----------------
 
@@ -28,6 +36,10 @@ Harvesting server can be enabled or disabled on the "Harvesting
 Server" page accessible via the :doc:`dashboard`. Harvesting server is by
 default disabled on a brand new, "out of the box" Dataverse.
 
+The OAI-PMH endpoint can be accessed at ``http(s)://<Your Dataverse FQDN>/oai``.
+If you want other services to harvest your repository, point them to this URL.
+*Example URL for 'Identify' verb*: `demo.dataverse.org OAI <https://demo.dataverse.org/oai?verb=Identify>`_
+
 OAI Sets
 --------
 

diff --git a/doc/sphinx-guides/source/admin/metadataexport.rst b/doc/sphinx-guides/source/admin/metadataexport.rst
@@ -7,14 +7,7 @@ Metadata Export
 Automatic Exports
 -----------------
 
-Publishing a dataset automatically starts a metadata export job, that will run in the background, asynchronously. Once completed, it will make the dataset metadata exported and cached in all the supported formats:
-
-- Dublin Core
-- Data Documentation Initiative (DDI)
-- DataCite 4
-- native JSON (Dataverse-specific)
-- OAI_ORE
-- Schema.org JSON-LD
+Publishing a dataset automatically starts a metadata export job, that will run in the background, asynchronously. Once completed, it will make the dataset metadata exported and cached in all the supported formats listed under :ref:`Supported Metadata Export Formats <metadata-export-formats>` in the :doc:`/user/dataset-management` section of the User Guide.
 
 A scheduled timer job that runs nightly will attempt to export any published datasets that for whatever reason haven't been exported yet. This timer is activated automatically on the deployment, or restart, of the application. So, again, no need to start or configure it manually. (See the "Application Timers" section of this guide for more information)
 

diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst
@@ -291,7 +291,7 @@ Export Metadata of a Dataset in Various Formats
 
     GET http://$SERVER/api/datasets/export?exporter=ddi&persistentId=$persistentId
 
-.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite`` and ``dataverse_json``.
+.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite``, ``oai_datacite`` and ``dataverse_json``.
 
 Schema.org JSON-LD
 ^^^^^^^^^^^^^^^^^^
@@ -570,7 +570,7 @@ Optionally, you can check if there's a lock of a specific type on the dataset::
 
     curl "$SERVER_URL/api/datasets/{database_id}/locks?type={lock_type}
 
-Currently implemented lock types are ``Ingest, Workflow, InReview, DcmUpload and pidRegister``. 
+Currently implemented lock types are ``Ingest, Workflow, InReview, DcmUpload, pidRegister, and EditInProgress``. 
 
 The API will output the list of locks, for example:: 
 

diff --git a/doc/sphinx-guides/source/conf.py b/doc/sphinx-guides/source/conf.py
@@ -65,9 +65,9 @@
 # built documents.
 #
 # The short X.Y version.
-version = '4.13'
+version = '4.14'
 # The full version, including alpha/beta/rc tags.
-release = '4.13'
+release = '4.14'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.

diff --git a/doc/sphinx-guides/source/developers/coding-style.rst b/doc/sphinx-guides/source/developers/coding-style.rst
@@ -89,11 +89,18 @@ Generally speaking you should use ``fine`` for everything that you don't want to
 
 When adding logging, do not simply add ``System.out.println()`` lines because the logging level cannot be controlled.
 
-Avoid Hard-Coding Strings
-~~~~~~~~~~~~~~~~~~~~~~~~~
+Avoid Hard-Coding Strings (Use Constants)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Special strings should be defined as public constants. For example, ``DatasetFieldConstant.java`` contains a field for "title" and it's used in many places in the code (try "Find Usages" in Netbeans). This is better than writing the string "title" in all those places.
 
+Avoid Hard-Coding User-Facing Messaging in English
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There is an ongoing effort to translate Dataverse into various languages. Look for "lang" or "languages" in the :doc:`/installation/config` section of the Installation Guide for details if you'd like to help or play around with this feature.
+
+The translation effort is hampered if you hard code user-facing messages in English in the Java code. Put English strings in ``Bundle.properties`` and use ``BundleUtil`` to pull them out. This is especially important for messages that appear in the UI. We are aware that the API has many, many hard coded English strings in it. If you touch a method in the API and notice English strings, you are strongly encouraged to used that opportunity to move the English to ``Bundle.properties``.
+
 Type Safety
 ~~~~~~~~~~~
 

diff --git a/doc/sphinx-guides/source/developers/intro.rst b/doc/sphinx-guides/source/developers/intro.rst
@@ -34,7 +34,7 @@ For the Dataverse development roadmap, please see https://dataverse.org/goals-ro
 Kanban Board
 ------------
 
-You can get a sense of what's currently in flight (in dev, in QA, etc.) by looking at https://waffle.io/IQSS/dataverse
+You can get a sense of what's currently in flight (in dev, in QA, etc.) by looking at https://github.com/orgs/IQSS/projects/2
 
 Issue Tracker
 -------------

diff --git a/doc/sphinx-guides/source/developers/version-control.rst b/doc/sphinx-guides/source/developers/version-control.rst
@@ -62,7 +62,7 @@ For guidance on which issue to work on, please ask! Also, see https://github.com
 
 Let's say you want to tackle https://github.com/IQSS/dataverse/issues/3728 which points out a typo in a page of Dataverse's documentation.
 
-If you tell us your GitHub username we are happy to add you to the "read only" team at https://github.com/orgs/IQSS/teams/dataverse-readonly/members so that we can assign the issue to you while you're working on it. You can also tell us if you'd like to be added to the `Dataverse Community Contributors spreadsheet <https://docs.google.com/spreadsheets/d/1o9DD-MQ0WkrYaEFTD5rF_NtyL8aUISgURsAXSL7Budk/edit?usp=sharing>`_ and the `Dev Efforts by the Dataverse Community spreadsheet <https://groups.google.com/d/msg/dataverse-community/X2diSWYll0w/ikp1TGcfBgAJ>`_.
+If you tell us your GitHub username we are happy to add you to the "read only" team at https://github.com/orgs/IQSS/teams/dataverse-readonly/members so that we can assign the issue to you while you're working on it. You can also tell us if you'd like to be added to the `Dataverse Community Contributors spreadsheet <https://docs.google.com/spreadsheets/d/1o9DD-MQ0WkrYaEFTD5rF_NtyL8aUISgURsAXSL7Budk/edit?usp=sharing>`_.
 
 Create a New Branch off the develop Branch
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -82,14 +82,14 @@ Push your feature branch to your fork of Dataverse. Your git command may look so
 Make a Pull Request
 ~~~~~~~~~~~~~~~~~~~
 
-Make a pull request to get approval to merge your changes into the develop branch. Feedback on the pull request template we use is welcome! The "connects to #3728" syntax is important because it's used at https://waffle.io/IQSS/dataverse to associate pull requests with issues.
+Make a pull request to get approval to merge your changes into the develop branch. Feedback on the pull request template we use is welcome!
 
 Here's an example of a pull request for issue #3728: https://github.com/IQSS/dataverse/pull/3827
 
 Make Sure Your Pull Request Has Been Advanced to Code Review
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Now that you've made your pull request, your goal is to make sure it appears in the "Code Review" column at https://waffle.io/IQSS/dataverse 
+Now that you've made your pull request, your goal is to make sure it appears in the "Code Review" column at https://github.com/orgs/IQSS/projects/2
 
 Look at https://github.com/IQSS/dataverse/blob/master/CONTRIBUTING.md for various ways to reach out to developers who have enough access to the GitHub repo to move your issue and pull request to the "Code Review" column.
 

diff --git a/doc/sphinx-guides/source/installation/prerequisites.rst b/doc/sphinx-guides/source/installation/prerequisites.rst
@@ -124,9 +124,7 @@ PostgreSQL
 Installing PostgreSQL
 =======================
 
-Version 9.3 is required. Previous versions have not been tested.
-
-Version 9.6 is strongly recommended::
+Version 9.6 is strongly recommended because it is the version developers and QA test with::
 
 	# yum install -y https://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.6-3.noarch.rpm
 	# yum makecache fast

diff --git a/doc/sphinx-guides/source/user/dataset-management.rst b/doc/sphinx-guides/source/user/dataset-management.rst
@@ -20,7 +20,20 @@ A dataset contains three levels of metadata:
 
 For more details about what Citation and Domain Specific Metadata is supported please see our :ref:`user-appendix`.
 
-Note that once a dataset has been published its metadata may be exported. A button on the dataset page's metadata tab will allow a user to export the metadata of the most recently published version of the dataset. Currently supported export formats are DDI, Dublin Core, Datacite 4, OAI_ORE, Schema.org JSON-LD, and Dataverse's native JSON format.
+.. _metadata-export-formats:
+
+Supported Metadata Export Formats
+---------------------------------
+
+Once a dataset has been published its metadata is exported in a variety of formats. A button on the dataset page's metadata tab will allow a user to export the metadata of the most recently published version of the dataset. Currently supported export formats are:
+
+- Dublin Core
+- DDI (Data Documentation Initiative)
+- DataCite 4
+- JSON (native Dataverse format)
+- OAI_ORE
+- OpenAIRE
+- Schema.org JSON-LD
 
 Adding a New Dataset
 ====================
@@ -510,4 +523,4 @@ If you deaccession the most recently published version of the dataset but not al
 .. |file-upload-prov-window| image:: ./img/prov1.png
    :class: img-responsive
 .. |image-file-tree-view| image:: ./img/file-tree-view.png
-   :class: img-responsive
+   :class: img-responsive
diff --git a/doc/sphinx-guides/source/user/find-use-data.rst b/doc/sphinx-guides/source/user/find-use-data.rst
@@ -121,7 +121,7 @@ rsync is typically used for synchronizing files and directories between two diff
 
 rsync-enabled Dataverse installations offer a new file download process that differs from traditional browser-based downloading. Instead of multiple files, each dataset uploaded via rsync contains a single "Dataverse Package". When you download this package you will receive a folder that contains all files from the dataset, arranged in the exact folder structure in which they were originally uploaded.
 
-In a dataset containing a Dataverse Package, at the bottom of the dataset page, under the **Data Access** tab, instead of a download button you will find the information you need in order to download the Dataverse Package using rsync. If the data is locally available to you (on a shared drive, for example) then you can find it at the folder path under **Local Access**. Otherwise, to download the Dataverse Package you will have to use one of the rsync commands under **Download Access**. There may be multiple commands listed, each corresponding to a different mirror that hosts the Dataverse Package. Go outside your browser and open a terminal (AKA command line) window on your computer. Use the terminal to run the command that corresponds with the mirror of your choice. It's usually best to choose the mirror that is geographically closest to you. Running this command will initiate the download process.
+In a dataset containing a Dataverse Package, the information to download and/or access is in two places. You can find it on the **dataset page** under the **Files** tab, and on the **file page** under the **Data Access** tab. If the data is locally available to you (on a shared drive, for example) you will find the folder path to access the data locally. To download, use one of the rsync commands provided. There may be multiple commands, each corresponding to a different mirror that hosts the Dataverse Package. Go outside your browser and open a terminal (AKA command line) window on your computer. Use the terminal to run the command that corresponds with the mirror of your choice. It’s usually best to choose the mirror that is geographically closest to you. Running this command will initiate the download process.
 
 After you've downloaded the Dataverse Package, you may want to double-check that your download went perfectly. Under **Verify Data**, you'll find a command that you can run in your terminal that will initiate a checksum to ensure that the data you downloaded matches the data in Dataverse precisely. This way, you can ensure the integrity of the data you're working with. 
 

diff --git a/doc/sphinx-guides/source/versions.rst b/doc/sphinx-guides/source/versions.rst
@@ -6,8 +6,10 @@ Dataverse Guides Versions
 
 This list provides a way to refer to previous versions of the Dataverse guides, which we still host. In order to learn more about the updates delivered from one version to another, visit the `Releases <https://github.com/IQSS/dataverse/releases>`__ page in our GitHub repo.
 
-- 4.13
+- 4.14
 
+
+- `4.13 </en/4.13/>`__
 - `4.12 </en/4.12/>`__
 - `4.11 </en/4.11/>`__
 - `4.10.1 </en/4.10/>`__

diff --git a/pom.xml b/pom.xml
@@ -7,7 +7,7 @@
     -->
     <groupId>edu.harvard.iq</groupId>
     <artifactId>dataverse</artifactId>
-    <version>4.13</version>
+    <version>4.14</version>
     <packaging>war</packaging>
     <name>dataverse</name>
     <properties>
@@ -599,6 +599,12 @@
             <artifactId>tika-parsers</artifactId>
             <version>1.19</version>
         </dependency>
+        <!-- Named Entity Recognition -->
+        <dependency>
+            <groupId>org.apache.opennlp</groupId>
+            <artifactId>opennlp-tools</artifactId>
+            <version>1.9.1</version>
+        </dependency>
     </dependencies>
     <build>
         <!--        <testResources>
@@ -632,6 +638,7 @@
                 <includes>
                     <include>**/*.sql</include>
                     <include>**/*.xml</include>
+                    <include>**/firstNames/*.*</include>
                 </includes>
             </resource>
         </resources>

diff --git a/src/main/java/edu/harvard/iq/dataverse/Dataset.java b/src/main/java/edu/harvard/iq/dataverse/Dataset.java
@@ -42,8 +42,10 @@
                query = "SELECT d FROM Dataset d WHERE d.identifier=:identifier"),
     @NamedQuery(name = "Dataset.findByIdentifierAuthorityProtocol",
                query = "SELECT d FROM Dataset d WHERE d.identifier=:identifier AND d.protocol=:protocol AND d.authority=:authority"),
-    @NamedQuery(name = "Dataset.findIdByOwnerId", 
+    @NamedQuery(name = "Dataset.findIdentifierByOwnerId", 
                 query = "SELECT o.identifier FROM Dataset o WHERE o.owner.id=:ownerId"),
+    @NamedQuery(name = "Dataset.findIdByOwnerId", 
+                query = "SELECT o.id FROM Dataset o WHERE o.owner.id=:ownerId"),
     @NamedQuery(name = "Dataset.findByOwnerId", 
                 query = "SELECT o FROM Dataset o WHERE o.owner.id=:ownerId"),
 })