Added command aggregateintervals
. Documentation here
-
Import from PLINK binary files (must be in SNP-major mode), GEN files, and BGEN files see docs for details
-
Export to GEN file see docs for details
- Overhaul of
table
commands that parse text files.- optional type imputation from the first twenty lines.
- option
-s, --samplecol
inannotatesamples table
has been replaced by-e, --sample-expr
. This no longer takes a column name, but instead takes an expr language transformation of the column names to produce the sample key. This could still be a column name e.g.-e Sample
, but it could also be something like-e FID.split("_")[1]
. - option
-v, --vcolumns
inannotatevariants table
has been replaced by-e, --variant-expr
. This no longer takes a column name, but instead takes an expr language transformation of the column names to produce the variant key. This expression will use the new expr variant constructor,variant(args)
. As before, this can take an imputedVariant
column, one string columnCHR:POS:REF:ALT
(which will look likeVariant(VariantColumn)
) or it can take four arguments of typeString, Int, String, (String || Array[String])
. In the second case, this argument will look something likeVariant(Chr, Pos.toInt, Ref, Alt)
. - added a few more options to all
table
commands: --comment
: optional argument. Will skip any line starting in the given string.--no-header
: indicates that the file(s) don't have a header, and columns will be read as "_0", "_1", ... "_N". You can use the other arguments with these columns, e.g.--sample-expr _2
.--impute
: turns on type imputation
importannotations json
andannotatevariants json
are gone, because you can replicate the functionality withtable
commands- option
--code
added to alltable
commands, as well asannotatevariants vds
andannotatevariants vcf
. This can be used as an alternative to--root
to have more control over which annotations are selected and where they're going. - new command-level documentation for all these commands. See the launch pages for
annotatevariants
,annotatesamples
, andannotateglobal
Added expr function fet
to calculate p-values using Fisher's Exact Test. Invoke this with fet(count1, count2, count3, count4)
see docs for details
Added struct operations merge
, drop
, and select
.
Usage:
merge(struct1, struct2)
will result in one struct with every field from the two. Duplicate field identifiers throws an error.drop
andselect
take the formatselect(struct, identifier1, identifier2, ...)
. These methods return a subset of the struct. One could, for example, remove the horribleCSQ
from the info field of a vds withannotatevariants expr -c 'va.info = drop(va.info, CSQ)
. One can select a subset of fields from a table usingselect(va.EIGEN, field1, field2, field3)
- Added
imputesex
which imputes the sex from variant data using the same method as PLINK. see docs for details
- renamed MAC -> AC and MAF -> AF in the
variantqc
module
- The
count
module no longer counts genotypes by default. Use option--genotypes
to do this. - Changed sample id column header in several modules. They are now all "Sample".
- Added expr function 'hwe'. Invoke this with
hwe(homref_count, het_count, homvar_count)
-
Added several new pieces of functionality to expr language, which can be viewed in the docs here
- array slicing, python-style
- dicts, python-style (maps keyed by string)
- function
index
, which takes anArray[Struct]
and converts it to a Dict. Ifglobal.genes
is anArray[Struct]
with struct typeStruct {geneID: String, PLI: Double, ExAC_LOFs: Int}
, then the functionindex(global.genes, geneID)
will return aDict[Struct]
where the value has struct typeStruct {PLI: Double, ExAC_LOFs: Int}
(the key was pulled out) - Array and struct constructors:
[1, 2]
will give you anArray[Int]
, and{"gene": "SCN1A", "PLI": 0.999, "ExAC_LOFs": 5}
will give you aStruct
. - Added a way to declare values missing:
NA: Type
. For example, you could sayif (condition) 5 else NA: Int
- Support JSON in annotation import.
importannotations
is nowimportannotations table
. Addedimportannotations json
,annotatevariants json
andannotatesamples json
.
- Fixed some bad error messages, now errors will write a full stacktrace in the
hail.log
-
added
annotateglobal table
which reads a text file with a header and stores it as anArray[Struct]
. see docs for details -
renamed all
tsv
modules totable
. We support arbitrary delimiters, so the name should be more general
-
split
annotateglobal
intoannotateglobal expr
andannotateglobal list
. The latter can be used to load a text file as anArray[String]
orSet[String]
to the global annotations, which can then be used to do things likefiltervariants expr
on genes in a gene list. -
exposed
global
annotations throughout the expr language. With aggregators, this allows you to do things like filter samples more than 5 standard deviations from the mean of a metric or the gene list filtering above
-
added aggregators! see what you can do with them here! As a consequence, removed some fields computed by
variantqc
andsampleqc
because these are easy to compute with aggregators. -
added
annotateglobal
, which lets you create global annotations with expressions and aggregators. See docs for details. -
renamed
showannotations
toprintschema
, and addedshowglobals
to print out the current global annotations as JSON. -
filtervariants and filtersamples have been reworked. Much like the annotate commands, you will need to specify
filtersamples expr
orfiltervariants intervals
. See the updated filtering documentation for details. -
filtervariants now supports filtering on .variant_list with lines of the form contig:pos:ref:alt1,alt2,...altN.
-
added new functions in expr language on arrays/sets:
filter
,map
,min
,max
,sum
. -
added new function in expr language
str(x)
which gives a string representation of a thing (comma-delimited for arrays which can be changed witharr.mkString(delim)
, and produces json for structs) -
isNotMissing
renamed toisDefined
.isMissing
andisDefined
are now single-argument functions and can be used with structs, e.g.,isMissing(x)
instead ofx.isMissing
. -
Fixed precedence bug in parser. You can now write
!g.isHet
instead of!(g.isNet)
. -
count:
nLocalSamples
is gone.nSamples
reports the number of samples currently in the dataset.