Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add htsget 1.2.0, OpenAPI v3.0.2 spec #385

Closed
wants to merge 10 commits into from
Closed

Add htsget 1.2.0, OpenAPI v3.0.2 spec #385

wants to merge 10 commits into from

Conversation

brainstorm
Copy link
Contributor

@brainstorm brainstorm commented Feb 21, 2019

As mentioned over twitter:

https://twitter.com/braincode/status/1097808553723670528

I've tried to comply with the official htsget spec, from the human-readable official spec:

https://samtools.github.io/hts-specs/htsget.html

But please let me know if there's something off. Things that concern me:

  1. OpenAPI v3 only seems to have signed int32, while on the spec an unsigned int32 is required by the htsget spec. Granted, this is just for code scaffolding/stubs, but I hope nobody implements it wrong by accident.
  2. I could write in the OAuth aspects in it, but the htsget spec explicitly keeps it out since it's "beyond
    the scope of this [htsget] specification". I feel inclined to add it regardless to ease code generation, thoughts?
    . Just went ahead and added it.
  3. I've added all fields as shown in the spec for BAM, but for VCF, I just picked up some (most used ones?) from the VCF4.2 spec. Would it make sense to list them in the hts-spec as well?

@brainstorm
Copy link
Contributor Author

Adding the Oauth2 bit might not have been the best idea since AWS API Gateway chokes on it:

skarmavbild 2019-02-21 kl 15 08 41

@delagoya
Copy link

@brainstorm thanks a couple of notes on quick review:

  1. unsigned int32 comes from trying to minimize the bytes necessary to store a genomic position in highly compress formats like BAM or CRAM. Since there are query parameters, I think it is OK to define it as int64 with a minimum of 0

  2. Per your AWS API Gateway error, there are some OAS OAuth2 examples. But this seems specific to API Gateway and would not be a good thing to add to a official specification, but rather define a API Gateway specific schema in your own repo, import the schema components using $ref, and add in the custom-authorizer there.

  3. No comment on this yet, will need to dig into the schema details.

@jmarshall
Copy link
Member

Re (1), that OpenAPI documentation says format is free text so you would seem to be entirely justified in putting uint32 there.

(IMHO the spec saying "32-bit unsigned int" here is over-constraining things. It's a text format so just "unsigned int" or "int, minimum 0" says the same thing as far as the protocol is concerned. However "32-bit" has been there since the original Google Docs document and never come up in discussion as far as I know.)

Re (3), please raise this as a separate issue. Subsetting VCF fields via the fields parameter appears to have been overlooked when variants were added to htsget in PR #233.

Also

  • The contact information will need changing before merging. This could be the ga4gh-htsget mailing list; or the refget OpenAPI spec uses a GA4GH Security Notifications email address here.

@daviesrob
Copy link
Member

I think it would be good to avoid 32 bit limitations where not strictly needed. SAM does support positions longer than that, and we're planning to add support for them in HTSlib (see samtools/htslib#709). While not necessary for human sequencing, it is for quite a few other organisms.

@brainstorm
Copy link
Contributor Author

brainstorm commented Feb 22, 2019

Thanks everyone for the prompt and valuable feedback, I just added the proposed changes and opened #386 as suggested.

name: GA4GH
email: security-notification@ga4gh.org
license:
name: GPLv3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“The license information for the exposed API.” Probably best to omit license for the time being.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, various parties may be allergic to GPL in particular -- omitting it is probably easier than selecting an alternative? 😉

pub/htsget-openapi.yaml Outdated Show resolved Hide resolved
- "SAMPLE"
- "FILTER"
- "FORMAT"
- "ALT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to leave this out until #386 is resolved.

@mlin
Copy link
Member

mlin commented May 15, 2019

Sorry for the terrible belatedness of me adding a couple of comments here:

  1. This is a really terrific companion to the spec, so first thank you!!
  2. The example responses should probably illustrate HTTP URLs in addition to data: URIs, since the redirect is a core mechanic of the protocol, which servers designed for scalable operations would usually rely on. We could make an example for publicly accessible 1000 Genomes data or such.
  3. The possible error responses could be fleshed out in more detail following the prose spec.

I would really like to help with these, but have bandwidth problems so would also welcome if somebody else is able to chew on them.

Thanks again this is great!

@brainstorm
Copy link
Contributor Author

brainstorm commented May 17, 2019

Thanks everyone for the feedback and code review, I'm refactoring and while I'm at it, I'll introduce 1.2.0 spec details as well (class), WIP here:

https://app.swaggerhub.com/apis/brainkod/htsget/1.2.0#/default/searchReadId

And some open questions for OpenAPI (just in case somebody here knows the solution), here:

https://twitter.com/braincode/status/1129301936828588038

@brainstorm brainstorm changed the title Add htsget 1.1.1, OpenAPI v3.0.0 spec Add htsget 1.2.0, OpenAPI v3.0.2 spec May 17, 2019
…k from @mlin (thanks), also add class as an extra attribute. /cc @ohofmann
@brainstorm
Copy link
Contributor Author

Spec bumped to 1.2.0, thanks @mlin for the feedback and the heads up @ohofmann.

I hope there's nothing else missing/broken or out of sync... looks valid/ok on SwaggerHub:

https://app.swaggerhub.com/apis/brainkod/htsget/1.2.0

OT: Who is in charge of the CircleCI config on this repo?

@brainstorm
Copy link
Contributor Author

👋

@jmarshall
Copy link
Member

The example of a /variants/{id} response is a BAM stream. It needs to be VCF or BCF instead.

OT: Who is in charge of the CircleCI config on this repo?

You can figure that out by looking at the relevant PR (#328). However it's immaterial — if this PR is rebased to after PR #328 landed, Circle CI will be happy. As this PR does not involve PDFs, it doesn't matter either way.

@brainstorm
Copy link
Contributor Author

brainstorm commented Sep 2, 2019

Thanks @jmarshall, good catch, fixed now!

/cc @ohofmann

@jmarshall
Copy link
Member

Agreed at the meeting that this is now ready for merging. Two remaining minor concerns:

  1. The swaggerhub URL at the top is specific to brainkod:
servers:
    url: https://virtserver.swaggerhub.com/brainkod/htsget/1.2.0

Can this be replaced with a non-specific URL like the one in refget-openapi.yaml?

  1. Last time I looked the treatment of error return codes was a bit haphazard, however we're happy for that to be tidied up at a later date.

@brainstorm: If you would like to address (1), I'll take care of rebasing and merging this. Thanks!

jmarshall pushed a commit to jmarshall/hts-specs that referenced this pull request Sep 16, 2019
Includes barebones authorizationCode Oauth2 flow, which should aid/inform
code generation.

Uses int64 with minimum 0, unsure if that is really an uint64 though.
@jmarshall jmarshall added the Merged For marking PRs that have been merged outwith GitHub and are shown as merely “Closed” label Sep 16, 2019
@jmarshall
Copy link
Member

Merged as 766424f. Thanks!

@jmarshall jmarshall closed this Sep 16, 2019
thefferon added a commit to thefferon/hts-specs that referenced this pull request Oct 9, 2019
* Update CRAM spec section on substitution matrix and codes.

* Respond to review comments.

* CRAM Slice and Container ref seq IDs must match (samtools#401)

* Code review part 2.

* Fix minor typo in the predecessors of BCF2 (samtools#427)

* layed -> laid

* Update MAINTAINERS.md (samtools#432)

Proposal to add Rasko Leinonen as refget maintainer.

* add jmmut to MAINTAINERS.md and move Cristina to "Past Members"

* change order of maintainers

* Clarify that INFO/END is used to form a CHROM:POS-END region (PR samtools#436) (samtools#436)

INFO/END (when present) provides the size of the interval that the
variant is located in, along with the CHROM and POS fields. This is
also used when indexing VCF/BCF files, as can be gleaned from §6.3.1's
description of BCF's rlen field.

The implications of INFO/END have not previously been clear. In the
absence of clear documentation, some SV tools have been using INFO/END
fields for their own semi-related purposes (using INFO/CHR2:INFO/END
as the other side's position in an interchromosomal rearrangement),
leading to broken .csi indexes and region queries that don't work.
Fixes samtools#425.

* Allow C and Java native text spellings of NaN and infinities (samtools#409)

C's printf doesn't output mixed case, while Java's Double.valueOf and
Double.toString parse/output only `Infinity`, not `Inf`. Rather than
requiring special-case code for both input and output in both languages,
relax the VCF specification to allow NAN/INF/INFINITY case-insensitively.

(Add "IEEE-754" to be specific and to improve the line breaks.)

* Adding a note about <NON_REF> (samtools#380)

* Update PDFs (CRAM and VCF additions; others cosmetic)

CRAMv3 PRs samtools#401 and samtools#412.
VCFv4.3 PRs samtools#380 (<NON_REF>), samtools#409 (infinity/NaN), and samtools#436 (INFO/END).
All: Minor typo and whitespace formatting fixes.

* Add htsget 1.2.0 OpenAPI v3.0.2 spec (PR samtools#385)

Includes barebones authorizationCode Oauth2 flow, which should aid/inform
code generation.

Uses int64 with minimum 0, unsure if that is really an uint64 though.

* Codify the existing policy of generally squashing PRs (PR samtools#444)

* Permit AP_Delta in multi-ref slices.

This means AP_delta can become negative.  I have validated this
decodes fine in both htsjdk and htslib.  This is because AP is ITF8
and hence signed, like all other integers, so it would need explicit
code to forbid this (which obviously isn't in the implementations).
Hence the limitation is primarily one of an over-zealous specification.

The impact of this is for position-sorted multi-ref slices AP can
legally be stored efficiently.

Also clarified fields in the container compression header when in
multi-ref mode.

Fixes samtools#431
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
htsget Merged For marking PRs that have been merged outwith GitHub and are shown as merely “Closed”
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants