Skip to content

Commit

Permalink
shortened one sentence in the csv doc (#3767)
Browse files Browse the repository at this point in the history
  • Loading branch information
landreev committed Aug 2, 2017
1 parent 310b86b commit 145608c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/user/tabulardataingest/csv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ In character strings, an empty value (a comma followed by another comma, or the
Any non-Latin characters are allowed in character string values, **as long as the encoding is UTF8**.


**Note:** When the ingest recognizes a CSV columns as a numeric vector, or as a date/time value, this is information is reflected and saved in the database as the *data variable metadata*. To inspect that metadata, click on the *Download* button next to a tabular data file, and select *Variable Metadata*. This will export the variable records in the DDI XML format. (Alternatively, this metadata fragment can be downloaded via the Data Access API; for example: ``http://localhost:8080/api/access/datafile/<FILEID>/metadata/ddi``). The most immediate implication is in the calculation of the UNF signatures for the data vectors, as different normalization rules are applied to numeric, character and date/time values. (see the :doc:`/developers/unf/index` section for more information). If it is important to you that the UNF checksums of your data are accurately calculated, check that the numeric and date/time columns in your file were recognized as such (as ``type=numeric`` and ``type=character, category=date(time)``, respectively). If, for example, a column that was supposed to be numeric is recognized as a vector of character values (strings), double-check that the formatting of the values is consistent. Remember, a single value in the column that prevents it from being parsed as a number (for example, the letter O instead of 0, the numeric zero) will render the entire column a vector of character strings, and result in a different UNF. Fix any formatting errors you find, delete the file from the dataset, and try to ingest it again.
**Note:** When the ingest recognizes a CSV columns as a numeric vector, or as a date/time value, this is information is reflected and saved in the database as the *data variable metadata*. To inspect that metadata, click on the *Download* button next to a tabular data file, and select *Variable Metadata*. This will export the variable records in the DDI XML format. (Alternatively, this metadata fragment can be downloaded via the Data Access API; for example: ``http://localhost:8080/api/access/datafile/<FILEID>/metadata/ddi``). The most immediate implication is in the calculation of the UNF signatures for the data vectors, as different normalization rules are applied to numeric, character and date/time values. (see the :doc:`/developers/unf/index` section for more information). If it is important to you that the UNF checksums of your data are accurately calculated, check that the numeric and date/time columns in your file were recognized as such (as ``type=numeric`` and ``type=character, category=date(time)``, respectively). If, for example, a column that was supposed to be numeric is recognized as a vector of character values (strings), double-check that the formatting of the values is consistent. Remember, a single improperly-formatted value in the column will turn it into a vector of character strings, and result in a different UNF. Fix any formatting errors you find, delete the file from the dataset, and try to ingest it again.


Tab-delimited Data Files:
Expand Down

0 comments on commit 145608c

Please sign in to comment.