Cleaning up some documentation.

RTICWDT · Nov 6, 2019 · ed6c920 · ed6c920
1 parent 017782b
commit ed6c920
Show file tree

Hide file tree

Showing 14 changed files with 62 additions and 293 deletions.
diff --git a/API.md b/API.md
@@ -21,21 +21,19 @@ Each query is expressed as a URL, containing:
  * The **API Version String**. Currently the only supported version string is: `v1`
  * The **Endpoint** representing a particular dataset, e.g. `schools`. Endpoint
  names are usually plural.
- * The **Format** for the result data. The default output format is JSON ([JavaScript Object Notation](http://json.org/)); CSV is
- also available.
+
  * The **Query String** containing a set of named key-value pairs that
  represent the query, which incude
    * **Field Parameters**, specifying a value (or set of values) to match
    against a particular field, and
-   * **Option Parameters**, which affect the filtering and output of the
-   entire query. Option Parameter names are prefixed with an underscore (`_`).
+
 
 ### Query Example
 
 Here's an example query URL:
 
 ```
-https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=2,3&_fields=id,school.name,2013.student.size
+https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=2,3&fields=id,school.name,2013.student.size
 ```
 
 In this query URL:
@@ -134,7 +132,7 @@ When failing to execute a query, Open Data Maker will attempt to return a JSON e
 
 ## Field Parameters
 
-Parameter names _without_ an underscore prefix are assumed to be field names in the dataset. Supplying a value to a field parameter acts as a query filter, and only returns records where the given field exactly matches the given value.
+Parameter names are assumed to be field names in the dataset. Supplying a value to a field parameter acts as a query filter, and only returns records where the given field exactly matches the given value.
 
 For example: Use the parameter `school.region_id=6` to only fetch records with a `school.region_id` value of `6`.
 
@@ -176,7 +174,6 @@ For example: `2013.student.size__range=100..500` matches on schools which had be
 
 Open-ended ranges can be performed by omitting one side of the range. For example: `2013.student.size__range=1000..` matches on schools which had over 1000 students.
 
-You can even supply a list of ranges, separated by commas. For example, For example: `2013.student.size__range=..100,1000..2000,5000..` matches on schools which had under 100 students, between 1000 and 2000 students, or over 5000 students.
 
 #### Additional Notes on Ranges
 
@@ -186,39 +183,48 @@ You can even supply a list of ranges, separated by commas. For example, For exam
 
 ## Option Parameters
 
-You can perform extra refinement and organisation of search results using **option parameters**. These special parameters have names beginning with an underscore character (`_`).
+You can perform extra refinement and organisation of search results using **option parameters**. These special parameters are listed below.
 
-### Limiting Returned Fields with `_fields`
+### Limiting Returned Fields with `fields`
 
-By default, records returned in the query response include all their stored fields. However, you can limit the fields returned with the `_fields` option parameter. This parameter takes a comma-separated list of field names. For example: `_fields=id,school.name,school.state` will return result records that only contain those three fields.
+By default, records returned in the query response include all their stored fields. However, you can limit the fields returned with the `fields` option parameter. This parameter takes a comma-separated list of field names. For example: `fields=id,school.name,school.state` will return result records that only contain those three fields.
 
 Requesting specific fields in the response will significantly improve performance and reduce JSON traffic, and is recommended.
 
-### Pagination with `_page` and `_per_page`
+### Pagination with `page` and `per_page`
 
-By default, results are returned in pages of 20 records at a time. To retrieve pages after the first, set the `_page` option parameter to the number of the page you wish to retrieve. Page numbers start at zero; so, to return records 21 through 40, use `_page=1`. Remember that the total number of records available for a given query is given in the `total` field of the top-level `metadata` object.
+By default, results are returned in pages of 20 records at a time. To retrieve pages after the first, set the `page` option parameter to the number of the page you wish to retrieve. Page numbers start at zero; so, to return records 21 through 40, use `page=1`. Remember that the total number of records available for a given query is given in the `total` field of the top-level `metadata` object.
 
-You can also change the number of records returned per page using the `_per_page` option parameter, up to a maximum of 100 records. Bear in mind, however, that large result pages will increase the amount of JSON returned and reduce the performance of the API.
+You can also change the number of records returned per page using the `per_page` option parameter, up to a maximum of 100 records. Bear in mind, however, that large result pages will increase the amount of JSON returned and reduce the performance of the API.
 
-### Sorting with `_sort`
+### Sorting with `sort`
 
-To sort results by a given field, use the `_sort` option parameter. For example, `_sort=2015.student.size` will return records sorted by 2015 student size, in ascending order.
+To sort results by a given field, use the `sort` option parameter. For example, `sort=2015.student.size` will return records sorted by 2015 student size, in ascending order.
 
-By default, using the `_sort_` option returns records sorted into ascending order, but you can specify ascending or descending order by appending `:asc` or `:desc` to the field name. For example: `_sort=2015.student.size:desc`
+By default, using the `sort` option returns records sorted into ascending order, but you can specify ascending or descending order by appending `:asc` or `:desc` to the field name. For example: `sort=2015.student.size:desc`
 
-**Note:** Sorting is only availble on fields with the data type `integer`, `float`, `autocomplete` or `name`.
+**Note:** Sorting is only available on fields with the data type `integer`, `float`, `autocomplete` or `name`.
 
 **Note:** Make sure the sort parameter is a field in the data set. For more information, please take a look at [data dictionary](https://collegescorecard.ed.gov/assets/CollegeScorecardDataDictionary.xlsx)
 
-### Geographic Filtering with `_zip` and `_distance`
+### Geographic Filtering with `zip` and `distance`
 
 When the dataset includes a `location` at the root level (`location.lat` and
-`location.lon`) then the documents will be indexed geographically. You can use the `_zip` and `_distance` options to narrow query results down to those within a geographic area. For example, `_zip=12345&_distance=10mi` will return only those results within 10 miles of the center of the given zip code.
+`location.lon`) then the documents will be indexed geographically. You can use the `zip` and `distance` options to narrow query results down to those within a geographic area. For example, `zip=12345&distance=10mi` will return only those results within 10 miles of the center of the given zip code.
 
-Additionally, you can request `location.lat` and `location.lon` in a search that includes a `_fields` filter and it will return the record(s) with respective lat and/or lon coordinates.
+Additionally, you can request `location.lat` and `location.lon` in a search that includes a `fields` filter and it will return the record(s) with respective lat and/or lon coordinates.
 
 #### Additional Notes on Geographic Filtering
 
-* By default, any number passed in the `_distance` parameter is treated as a number of miles, but you can specify miles or kilometers by appending `mi` or `km` respectively.
+* By default, any number passed in the `distance` parameter is treated as a number of miles, but you can specify miles or kilometers by appending `mi` or `km` respectively.
 * Distances are calculated from the center of the given zip code, not the boundary.
 * Only U.S. zip codes are supported.
+
+
+# New for Version 1.7
+
+With the inclusion of the Department of Education's Field of Study data, there are a number of new improvements that have been incorporated into Open Data Maker. 
+
+* The field of study data is included as an array of objects nested under a specified key. These objects may be queried just like any other data. However, there is an additional parameters to add to your API call to manage what is returned. By default, if specifying a search parameter, only objects of the array that match that parameter will be returned. You can pass `&all_programs_nested=true` to return all the items in the array instead of just those that match. 
+* When specifying specific fields to be returned from the API, the default response is to have a dotted string of the path to the field returned. As of verison 1.7, you can pass the parameter `keys_nested=true` get back a true json object instead of the dotted string. 
+* Lastly, wildcard fields are now possible with version 1.7. If you want to get back data for just the latest available data, it is now possible to specify a query such as `fields=id,school,latest` which will return the ID field, the School object and the Latest object and all the nested objects contained within each. 
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,7 +1,7 @@
 ## Contributing
 
 We aspire to create a welcoming environment for collaboration on this project.
-To that end, we follow the [18F Code of Conduct](https://github.com/18F/code-of-conduct/blob/master/code-of-conduct.md) and ask that all contributors do the same.
+
 
 ### Public domain
 
@@ -15,11 +15,7 @@ with this waiver of copyright interest.
 
 ## Communication
 
-There are a few ways to communicate with other folks working on this project:
-
-* For general questions, discussion and announcements, please join [Google Group]
-* For noisy, informal chatter, you can join us on the [open-data-maker-pub Slack Channel](https://chat.18f.gov).  Notifications from github are posted here.
-* For bug reports, please [file an issue](https://github.com/18F/open-data-maker/issues).
+For bug reports, please [file an issue](https://github.com/18F/open-data-maker/issues).
 
 ## About the Tech
 
@@ -46,7 +42,7 @@ This project follows the [git flow](http://nvie.com/posts/a-successful-git-branc
 for review by our design and product folks, then to master.  
 
 This project is in alpha, so things are fast moving! We hope you consider it
-a fun time to get involved.  In the near term, we have a very specific focus for this app, but we expect it will be generally useful for other projects as well.  If you are thinking about deploying this app at your agency or organization, please let us know by introducing yourself in the [Google Group] and telling us a bit about your project or idea. 
+a fun time to get involved.  In the near term, we have a very specific focus for this app, but we expect it will be generally useful for other projects as well. 
 
 ### Testing
 
@@ -98,7 +94,7 @@ chances of your issue being dealt with quickly:
 ### Submitting a Pull Request
 Before you submit your pull request consider the following guidelines:
 
-* Search [GitHub](https://github.com/18F/open-data-maker/pulls) for an open or closed Pull Request that relates to your submission. You don't want to duplicate effort.
+* Search [GitHub](https://github.com/RTICWDT/open-data-maker/pulls) for an open or closed Pull Request that relates to your submission. You don't want to duplicate effort.
 * Make your changes in a new git branch
 
      ```shell
@@ -137,37 +133,7 @@ That's it! Thank you for your contribution!
 
 #### After your pull request is merged
 
-After your pull request is merged, you can safely delete your branch and pull the changes from the main (upstream) repository:
-
-* Check out the dev branch:
-
-    ```shell
-    git checkout dev -f
-    ```
-
-* Delete the local branch:
-
-    ```shell
-    git branch -D dev-my-fix
-    ```
-
-* Update with the latest upstream version:
-
-    ```shell
-    git pull --ff upstream dev
-    ```
-  Note: this assumes that you have already added the `upstream` remote repository, using this command:
-
-    ```shell
-    git remote add upstream https://github.com/18F/open-data-maker.git
-    ```
-
-
-* For folks with write access to the repo: delete the remote branch on GitHub either through the GitHub web UI or your local shell as follows:
-
-    ```shell
-    git push origin --dev-my-fix
-    ```
+After your pull request is merged, you can safely delete your branch and pull the changes from the main (upstream) repository
 
 ### Reviewing Pull Requests
 
@@ -183,5 +149,3 @@ someone has looked at it. For larger commits, we like to have a +1 from someone
 else on the core team and/or from other contributor(s). Please note if you
 reviewed the code or tested locally -- a +1 by itself will typically be
 interpreted as your thinking its a good idea, but not having reviewed in detail.
-
-[Google Group]: https://groups.google.com/d/forum/open-data-maker
diff --git a/DICTIONARY.md b/DICTIONARY.md
@@ -1,3 +1,22 @@
+# Data
+
+Details about the data are specified by DATA_PATH/data.yaml.  
+Where DATA_PATH is an environment variable, which may be:
+
+* `s3://username:password@bucket_name/path`
+* `s3://bucket_name/path`
+* `s3://bucket_name`
+* a local path like: `./data`
+
+
+This file is loaded the first time it is needed and then stored in memory.  The contents of `data.yaml` are stored as JSON in Elasticsearch in a single document of type `config` with id `1`.  
+
+The version field of this document is checked at startup. If the new config has a new version, then we delete the whole index and re-index all of the files referred to in the `data.yaml` files section.
+
+If no data.yml or data.yaml file is found, then all CSV files in `DATA_PATH` will be loaded, and all fields in their headers will be used.
+
+For an example data file, visit https://collegescorecard.ed.gov/data/ and download the full data package. A data.yaml file will be included in the ZIP file download. 
+
 # Dictionary Format
 
 The data dictionary format may be (optionally) specified in the `data.yaml` file.  If unspecified, all columns are imported as strings.

diff --git a/INSTALL.md b/INSTALL.md
@@ -19,22 +19,14 @@ To run Open Data Maker, you will need to have the following software installed o
 * [Elasticsearch] 1.7.3
 * [Ruby] 2.2.2
 
-**NOTE: Open Data Maker does not currently work with Elasticsearch versions 2.x and above.**
-You can follow or assist our progress towards 2.x compatibility [at this GitHub issue](https://github.com/18F/open-data-maker/issues/248).
+**NOTE: Open Data Maker indexing currently is very slow on ES2.x, however, an index created on 1.x can be restored to 2.x.
 
 ### Mac OS X
 
-On a Mac, we recommend installing Ruby 2.2.2 via [RVM], and Elasticsearch 1.7.3 via
-[Homebrew].  If you don't want to use the bootstrap script above, you can install
-elasticsearch 1.7 with brew using the following command:
-
-```
-brew install elasticsearch17
-```
+On a Mac, we recommend installing [RVM].
 
 If you are contributing to development, you will also need [Git].
-If you don't already have these tools, the 18F [laptop] script will install
-them for you.
+
 
 ## Get the Source Code
 
@@ -48,14 +40,6 @@ cd open-data-maker
 
 ## Run the App
 
-### Make sure Elasticsearch is up and running
-If you just ran `script/bootstrap`, then Elasticsearch should already be
-running. But if you stopped it or restarted your computer, you'll need to
-start it back up. Assuming you installed Elasticsearch via our `bootstrap`
-script, you can restart it with this command:
-
-```brew services restart elasticsearch```
-
 
 ### Import the data
 
@@ -116,24 +100,24 @@ rake es:delete[_all]
 The data directory can optionally include a file called `data.yaml` (see [the sample one](sample-data/data.yaml) for its schema) that references one or more `.csv` files and specifies data types,
 field name mapping, and other support data.
 
-## Experimental web UI for indexing
 
-Optionally, you can enable indexing from webapp, but this option is still experimental:
-* `export INDEX_APP=enable`
-* in your browser, go to /index/reindex
+## Debugging
+
+`ES_DEBUG` environment variable will turn on verbose tracer in the Elasticsearch client
+
+optional performance profiling for rake import: `rake import[profile=true]`
 
-the old index (if present) will be deleted and re-created from source files at DATA_PATH.
 
 ## Want to help?
 
 See [Contribution Guide](CONTRIBUTING.md)
 
-Read additional [implementation notes](NOTES.md)
-
 [Elasticsearch]: https://www.elastic.co/products/elasticsearch
 [Homebrew]: http://brew.sh/
 [RVM]: https://github.com/wayneeseguin/rvm
 [rbenv]: https://github.com/sstephenson/rbenv
 [Ruby]: https://www.ruby-lang.org/en/
 [Git]: https://git-scm.com/
-[laptop]: https://github.com/18F/laptop
+
+
+
diff --git a/NOTES.md b/NOTES.md
diff --git a/README.md b/README.md
@@ -84,21 +84,6 @@ options:
 ```
 
 
-
-## Help Wanted
-
-1. Try out importing multiple data sets with different endpoints and data.yaml configuration
-2. Take a look at our [open issues](https://github.com/18F/open-data-maker/issues) and our [Contribution Guide](CONTRIBUTING.md)
-
-## More Info
-
-Here's how it might look in the future:
-
-![overview of data types, prompt to download data, create a custom data set, or look at API docs](/doc/data-overview.png)
-
-
-![Download all the data or make choices to create a csv with a subset](/doc/csv-download.png)
-
 ### Acknowledgements
 Zipcode latitude and longitude provided by [GeoNames](http://www.geonames.org/) under under a [Creative Commons Attribution 3.0 License](http://creativecommons.org/licenses/by/3.0/).
 

diff --git a/doc/csv-download.png b/doc/csv-download.png
diff --git a/doc/data-overview.png b/doc/data-overview.png
diff --git a/manifest-dev.yml b/manifest-dev.yml
diff --git a/manifest-ex.yml b/manifest-ex.yml
diff --git a/manifest-indexing.yml b/manifest-indexing.yml
diff --git a/manifest-production.yml b/manifest-production.yml