-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4 from poseidon-framework/discoverPacs
Better .janno discovery options
- Loading branch information
Showing
146 changed files
with
788 additions
and
170 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,8 @@ | ||
- V 1.0.0.0: | ||
- Added more data input options for .janno files (d(), da(), j(), .janno). | ||
- Reorganized golden tests and added new ones. | ||
- Switched to the PVP versioning scheme. | ||
- Switched to a new ghc and stackage resolver version. | ||
- Added a source_file column to the tables upon reading into the SQLite database. | ||
- V 1.0.1: Structural refactoring: To get a proper test coverage report it was necessary to split library, executable and test code. No user-facing changes in qjanno. | ||
- V 1.0.0: Initial version: Fork of qsh with adjustments to directly incorporate .janno files. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
### V 1.0.0.0 | ||
|
||
This release marks the switch to [Haskell's Package Versioning Policy](https://pvp.haskell.org/). Under the hood we also switched to a new GHC version (9.4.7) and a new Stackage resolver version (21.17). | ||
|
||
Feature-wise in this version we changed the semantics of the `d()` pseudo-function to load `.janno` files, introduced a set of additional mechanisms to load specific `.janno` files more conveniently and finally added automatically generated columns (`package_title`, `package_version` and `source_file`) for the SQL tables to distinguish source files in derived queries. We also implemented sorting of the `.janno` columns according to the suggested order in the Poseidon schema. | ||
|
||
#### New and modified pseudo-functions to crawl `.janno` files | ||
|
||
In previous versions `qjanno` included a single method to specify `.janno` files for loading and merging in the `FROM` instruction of the SQL query: The `d(<path_to_directory1>,<path_to_directory2>,...)` pseudo-function. When used in a query, `qjanno` crawled all directories for files with the extension `.janno`, to read them, row-bind them and load them as a table into the SQLite database for querying. This specific functionality is now accessible with a new pseudo-function `j()`. Beyond that, various additional methods are available now for searching and selecting `.janno` files. | ||
|
||
Here is how the updated `FROM` field gets parsed and interpreted: | ||
|
||
- `d(<path_to_directory1>,<path_to_directory2>,...)`: With `d()`, qjanno (recursively) searches all package-defining `POSEIDON.yml` files in all listed directories and reads them to determine the latest package version. It then reads the `.janno` files associated with these latest package versions. | ||
- `da(<path_to_directory1>,<path_to_directory2>,...)`: `da()` behaves just as `d()`, but it does not filter for the latest package version: It loads all packaged `.janno` files. | ||
- `j(<path_to_directory1>,<path_to_directory2>,...)`: `j()` simply searches for files with the extension `.janno` in all listed directories and loads them regardless of whether they are part of a Poseidon package or not. | ||
- `<path_to_one_janno_file>.janno`: Specific `.janno` files can be listed individually. | ||
|
||
Multiple of these methods can be combined in the `FROM` field as a comma-separated list. Each respective mechanism then yields a list of `.janno` files, and the list of lists is flattened to a simple list of `.janno` files. `qjanno` then reads all `.janno` files in this combined list, merges them and makes them available for querying in the in-memory SQLite database. | ||
|
||
This means the following syntax is now valid: | ||
|
||
```bash | ||
qjanno "SELECT Poseidon_ID,Country FROM d(2018_Lamnidis_Fennoscandia,2012_MeyerScience),2010_RasmussenNature/2010_RasmussenNature.janno" | ||
``` | ||
|
||
This loads the `.janno` files in `2018_Lamnidis_Fennoscandia` and `2012_MeyerScience`, and the additional file `2010_RasmussenNature/2010_RasmussenNature.janno`. | ||
|
||
#### Additional columns to distinguish source files | ||
|
||
From this version onwards `qjanno` prepends information about the source of a given observation in the form of three additional columns `package_title`, `package_version` and `source_file` to each SQL table row. | ||
|
||
`package_title`: The title of the source package of a given `.janno` row. | ||
`package_version`: The package version of the source package. | ||
`source_file`: The relative path to the source file. | ||
|
||
The former two can only be added for `.janno` files loaded via the `d()` and `da()` mechanisms, because only they have the necessary information about the source package of a given `.janno` file. | ||
|
||
`source_file` works for all files, including `.janno` files loaded directly or via `d()`, `da()` or `j()`. | ||
|
||
#### Sorting of the `.janno` table columns | ||
|
||
In the process of reading `.janno` files, `qjanno` now not only row-binds them, but also orders their columns according to the specification in the Poseidon schema [here](https://github.com/poseidon-framework/poseidon-schema/blob/master/janno_columns.tsv). | ||
|
||
The just introduced source columns `package_title`, `package_version` and `source_file` are kept at the beginning in this order. Additional columns not specified in the Poseidon schema are appended at the end in alphabetical order. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
packages: ./*.cabal | ||
with-compiler: ghc-9.2.7 | ||
with-compiler: ghc-9.4.7 | ||
allow-newer: table-layout:base |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.