Skip to content

Commit

Permalink
Merge pull request #4 from poseidon-framework/discoverPacs
Browse files Browse the repository at this point in the history
Better .janno discovery options
  • Loading branch information
nevrome authored Nov 1, 2023
2 parents 4041ec0 + 359ccef commit 34184bb
Show file tree
Hide file tree
Showing 146 changed files with 788 additions and 170 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
strategy:
matrix:
stack: ["latest"]
ghc: ["9.2.7"]
ghc: ["9.4.7"]

steps:
# setup and loading cache
Expand Down
16 changes: 5 additions & 11 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
jobs:
create_release:
name: Create Github Release
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v4
Expand All @@ -29,9 +29,8 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-20.04]
cabal: ["3.6"]
ghc: ["9.2.7"]
os: [ubuntu-20.04] # old version is on purpose: to compile with old libc
ghc: ["9.4.7"]

steps:
- name: Check out code
Expand All @@ -49,7 +48,6 @@ jobs:
id: setup-haskell-cabal
with:
ghc-version: ${{ matrix.ghc }}
cabal-version: ${{ matrix.cabal }}

- name: Freeze
run: cabal freeze
Expand Down Expand Up @@ -90,8 +88,7 @@ jobs:
strategy:
matrix:
os: [macOS-latest]
cabal: ["3.6"]
ghc: ["9.2.7"]
ghc: ["9.4.7"]

steps:
- name: Check out code
Expand All @@ -112,7 +109,6 @@ jobs:
id: setup-haskell-cabal
with:
ghc-version: ${{ matrix.ghc }}
cabal-version: ${{ matrix.cabal }}

- name: Freeze
run: |
Expand Down Expand Up @@ -155,8 +151,7 @@ jobs:
strategy:
matrix:
os: [windows-latest]
cabal: ["3.6"]
ghc: ["9.2.7"]
ghc: ["9.4.7"]

steps:
- name: Check out code
Expand All @@ -174,7 +169,6 @@ jobs:
id: setup-haskell-cabal
with:
ghc-version: ${{ matrix.ghc }}
cabal-version: ${{ matrix.cabal }}

- name: Freeze
run: |
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
- V 1.0.0.0:
- Added more data input options for .janno files (d(), da(), j(), .janno).
- Reorganized golden tests and added new ones.
- Switched to the PVP versioning scheme.
- Switched to a new ghc and stackage resolver version.
- Added a source_file column to the tables upon reading into the SQLite database.
- V 1.0.1: Structural refactoring: To get a proper test coverage report it was necessary to split library, executable and test code. No user-facing changes in qjanno.
- V 1.0.0: Initial version: Fork of qsh with adjustments to directly incorporate .janno files.
44 changes: 44 additions & 0 deletions CHANGELOGRELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
### V 1.0.0.0

This release marks the switch to [Haskell's Package Versioning Policy](https://pvp.haskell.org/). Under the hood we also switched to a new GHC version (9.4.7) and a new Stackage resolver version (21.17).

Feature-wise in this version we changed the semantics of the `d()` pseudo-function to load `.janno` files, introduced a set of additional mechanisms to load specific `.janno` files more conveniently and finally added automatically generated columns (`package_title`, `package_version` and `source_file`) for the SQL tables to distinguish source files in derived queries. We also implemented sorting of the `.janno` columns according to the suggested order in the Poseidon schema.

#### New and modified pseudo-functions to crawl `.janno` files

In previous versions `qjanno` included a single method to specify `.janno` files for loading and merging in the `FROM` instruction of the SQL query: The `d(<path_to_directory1>,<path_to_directory2>,...)` pseudo-function. When used in a query, `qjanno` crawled all directories for files with the extension `.janno`, to read them, row-bind them and load them as a table into the SQLite database for querying. This specific functionality is now accessible with a new pseudo-function `j()`. Beyond that, various additional methods are available now for searching and selecting `.janno` files.

Here is how the updated `FROM` field gets parsed and interpreted:

- `d(<path_to_directory1>,<path_to_directory2>,...)`: With `d()`, qjanno (recursively) searches all package-defining `POSEIDON.yml` files in all listed directories and reads them to determine the latest package version. It then reads the `.janno` files associated with these latest package versions.
- `da(<path_to_directory1>,<path_to_directory2>,...)`: `da()` behaves just as `d()`, but it does not filter for the latest package version: It loads all packaged `.janno` files.
- `j(<path_to_directory1>,<path_to_directory2>,...)`: `j()` simply searches for files with the extension `.janno` in all listed directories and loads them regardless of whether they are part of a Poseidon package or not.
- `<path_to_one_janno_file>.janno`: Specific `.janno` files can be listed individually.

Multiple of these methods can be combined in the `FROM` field as a comma-separated list. Each respective mechanism then yields a list of `.janno` files, and the list of lists is flattened to a simple list of `.janno` files. `qjanno` then reads all `.janno` files in this combined list, merges them and makes them available for querying in the in-memory SQLite database.

This means the following syntax is now valid:

```bash
qjanno "SELECT Poseidon_ID,Country FROM d(2018_Lamnidis_Fennoscandia,2012_MeyerScience),2010_RasmussenNature/2010_RasmussenNature.janno"
```

This loads the `.janno` files in `2018_Lamnidis_Fennoscandia` and `2012_MeyerScience`, and the additional file `2010_RasmussenNature/2010_RasmussenNature.janno`.

#### Additional columns to distinguish source files

From this version onwards `qjanno` prepends information about the source of a given observation in the form of three additional columns `package_title`, `package_version` and `source_file` to each SQL table row.

`package_title`: The title of the source package of a given `.janno` row.
`package_version`: The package version of the source package.
`source_file`: The relative path to the source file.

The former two can only be added for `.janno` files loaded via the `d()` and `da()` mechanisms, because only they have the necessary information about the source package of a given `.janno` file.

`source_file` works for all files, including `.janno` files loaded directly or via `d()`, `da()` or `j()`.

#### Sorting of the `.janno` table columns

In the process of reading `.janno` files, `qjanno` now not only row-binds them, but also orders their columns according to the specification in the Poseidon schema [here](https://github.com/poseidon-framework/poseidon-schema/blob/master/janno_columns.tsv).

The just introduced source columns `package_title`, `package_version` and `source_file` are kept at the beginning in this order. Additional columns not specified in the Poseidon schema are appended at the end in alphabetical order.
2 changes: 1 addition & 1 deletion cabal.project
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
packages: ./*.cabal
with-compiler: ghc-9.2.7
with-compiler: ghc-9.4.7
allow-newer: table-layout:base
6 changes: 5 additions & 1 deletion qjanno-hs.cabal
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: qjanno-hs
version: 1.0.1
version: 1.0.0.0
author: itchyny <https://github.com/itchyny>, Clemens Schmid
maintainer: Clemens Schmid <clemens@nevrome.de>
license: MIT
Expand Down Expand Up @@ -31,6 +31,9 @@ library
, text
, directory
, filepath
, parsec
, aeson
, yaml

executable qjanno
hs-source-dirs: src-executables
Expand All @@ -43,6 +46,7 @@ executable qjanno
, optparse-applicative
, sqlite-simple
, table-layout
, directory
other-modules: Paths_qjanno_hs

test-suite spec
Expand Down
88 changes: 54 additions & 34 deletions src-executables/Main.hs
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@ module Main where

import Control.Monad (forM, forM_, unless, when)
import Data.Char (isSpace)
import Data.List (intercalate, isPrefixOf,
transpose)
import Data.List (intercalate, transpose)
import qualified Data.Map as Map
import Data.Maybe
import Data.Set ((\\))
Expand All @@ -26,6 +25,7 @@ import qualified Qjanno.Option as Option
import qualified Qjanno.Parser as Parser
import qualified Qjanno.SQL as SQL
import qualified Qjanno.SQLType as SQLType
import System.Directory (doesFileExist)

main :: IO ()
main = do
Expand Down Expand Up @@ -82,15 +82,15 @@ runQuery opts conn (query, tableMap) = do
else do
putStrLn $ tableString colSpecs asciiRoundS (titlesH tableH) [rowsG tableB]


fetchQuery :: Option.Option -> IO String
fetchQuery opts = do
when (isJust (Option.query opts) && isJust (Option.queryFile opts)) $ do
hPutStrLn stderr "Can't provide both a query file and a query on the command line."
exitFailure
query <- fromMaybe "" <$> case Option.query opts of
Just q -> return (Just q)
Nothing -> mapM readFile (Option.queryFile opts)
query <- fromMaybe "" <$>
case Option.query opts of
Just q -> return (Just q)
Nothing -> mapM readFile (Option.queryFile opts)
when (all isSpace query) $ do
hPutStrLn stderr "Query cannot be empty."
hPutStrLn stderr "See, qjanno -h for help."
Expand Down Expand Up @@ -119,34 +119,54 @@ readFilesCreateTables :: Option.Option -> SQLite.Connection -> Parser.TableNameM
readFilesCreateTables opts conn tableMap = do
forM (Map.toList tableMap) $ \(path, name) -> do
let path' = unquote path
if "d(" `isPrefixOf` path'
then do
let baseDirs = Janno.extractBaseDirs path'
allJannoPaths <- concat <$> mapM Janno.findAllJannoFiles baseDirs
let jannoOpts = opts {Option.tabDelimited = True}
allJannoHandles <- mapM (\p -> openFile p ReadMode) allJannoPaths
allJannos <- mapM (File.readFromFile jannoOpts) allJannoHandles
let (columns, body) = Janno.mergeJannos allJannos
createTable conn name path columns body
-- returns all columns for the --showColumns feature
return (path, columns)
else do
handle <- openFile (if path' == "-" then "/dev/stdin" else path') ReadMode
(columns, body) <- File.readFromFile opts handle
when (length columns == 0) $ do
hPutStrLn stderr $ if Option.noHeader opts
then "Warning - data is empty"
else "Header line is expected but missing in file " ++ path
exitFailure
when (any (elem ',') columns) $ do
hPutStrLn stderr "Column name cannot contain commas"
exitFailure
when (length columns >= 1) $
createTable conn name path columns body
hClose handle
return (path, columns)
where unquote (x:xs@(_:_)) | x `elem` "\"'`" && x == last xs = init xs
unquote xs = xs
case Parser.readFROM path' of
Left s -> do
hPutStrLn stderr "Invalid FROM string: "
hPutStrLn stderr s
exitFailure
Right (Parser.Jannos j) -> do
allJannosWithContext <- concat <$> mapM Janno.findJannos j
when (null allJannosWithContext) $ do
hPutStrLn stderr "No .janno files found."
exitFailure
forM_ allJannosWithContext $ \(Janno.JannoWithContext p _) -> do
fileExists <- doesFileExist p
unless fileExists $ do
hPutStrLn stderr $ "File expected, but does not exist: " ++ p
exitFailure
let jannoOpts = opts {Option.tabDelimited = True}
allJannos <- mapM (File.readFromJanno jannoOpts) allJannosWithContext
let (columns, body) = Janno.reorderJannoColumns $ Janno.mergeJannos allJannos
createTable conn name path columns body
-- returns all columns for the --showColumns feature
return (path, columns)
Right Parser.StdIn -> do
makeDBFromNormalFile name "/dev/stdin"
Right (Parser.AnyFile _) -> do
fileExists <- doesFileExist path'
unless fileExists $ do
hPutStrLn stderr $ "File does not exist: " ++ path'
exitFailure
makeDBFromNormalFile name path'
where
unquote (x:xs@(_:_)) | x `elem` "\"'`" && x == last xs = init xs
unquote xs = xs
makeDBFromNormalFile :: String -> FilePath -> IO (String, [String])
makeDBFromNormalFile name path = do
(columns, body) <- File.readFromFile opts path
when (length columns == 0) $ do
if Option.noHeader opts
then hPutStrLn stderr "Warning - data is empty."
else hPutStrLn stderr $ "Header line is expected but missing in file " ++ path
exitFailure
when (any (elem ',') columns) $ do
hPutStrLn stderr "Column name cannot contain commas."
exitFailure
when (length columns >= 1) $
createTable conn name path columns body
return (path, columns)



createTable :: SQLite.Connection -> String -> String -> [String] -> [[String]] -> IO ()
createTable conn name path columns bodyRaw = do
Expand Down
28 changes: 24 additions & 4 deletions src/Qjanno/File.hs
Original file line number Diff line number Diff line change
@@ -1,16 +1,33 @@
{-# LANGUAGE BangPatterns #-}

module Qjanno.File where

import Control.Applicative ((<|>))
import Control.Monad (guard, when)
import Data.Char (isSpace)
import Data.Version (Version, showVersion)
import System.Exit (exitFailure)
import System.IO

import qualified Qjanno.Janno as Janno
import qualified Qjanno.Option as Option

readFromFile :: Option.Option -> Handle -> IO ([String], [[String]])
readFromFile opts handle = do
contents <- joinMultiLines <$> lines <$> hGetContents handle
readFromJanno :: Option.Option -> Janno.JannoWithContext -> IO ([String], [[String]])
readFromJanno opts (Janno.JannoWithContext p Nothing) = do
readFromFile opts p
readFromJanno opts (Janno.JannoWithContext p (Just (_, Janno.PoseidonYml pacTitle pacVersion _))) = do
(columns, body) <- readFromFile opts p
let columnsWithPacTitleAndVersion = "package_title" : "package_version" : columns
let bodyWithPacTitleAndVersion = map (\x -> pacTitle : renderPacVersion pacVersion : x) body
return (columnsWithPacTitleAndVersion, bodyWithPacTitleAndVersion)
where
renderPacVersion :: Maybe Version -> String
renderPacVersion Nothing = ""
renderPacVersion (Just x) = showVersion x

readFromFile :: Option.Option -> FilePath -> IO ([String], [[String]])
readFromFile opts path = do
!contents <- joinMultiLines <$> lines <$> readFile path
let contentList = contents ++ [ "", "" ]
headLine = contentList !! 0
secondLine = contentList !! 1
Expand All @@ -27,7 +44,10 @@ readFromFile opts handle = do
let skipLine = if Option.noHeader opts then id else tail
let stripSpaces = dropWhile isSpace
let body = filter (not . null) $ map (map stripSpaces . splitFixedSize splitter size) (skipLine contents)
return (columns, body)
-- add file path column
let columnsWithSourceFile = "source_file" : columns
let bodyWithSourceFile = map (path:) body
return (columnsWithSourceFile, bodyWithSourceFile)
where joinMultiLines (cs:ds:css) | valid True cs = cs : joinMultiLines (ds:css)
| otherwise = joinMultiLines $ (cs ++ "\n" ++ ds) : css
where valid False ('"':'"':xs) = valid False xs
Expand Down
Loading

0 comments on commit 34184bb

Please sign in to comment.