Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata in download: aminoAcidInsertions included, serialization to be discussed #1421

Closed
1 of 3 tasks
corneliusroemer opened this issue Mar 21, 2024 · 6 comments
Closed
1 of 3 tasks
Labels
backend related to the loculus backend component

Comments

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Mar 21, 2024

I've just tried the metadata download from the ebola-zaire instance on main. This is an example row. Few points for discussion:

  • Why are aminoAcidInsertions (and nuc ones) included at all? We can include them but it's more related to alignments rather than metadata proper. After all we don't include snps, or deletions. Insertions allow the reconstruction of a sequence from alignments, but that's all.
  • If aminoAcidInsertions and (nuc ones) null? It seems we use null for not provided, should we use that or rather use something like N/A? null seems to be simply the default in kotlin.
  • submittedAt and released at is a unix timestamp, should we rather serialize it as human readable timestamp, something like 2023-01-05Z23:11:56?
image
@corneliusroemer corneliusroemer added backend related to the loculus backend component RFC labels Mar 21, 2024
@chaoran-chen
Copy link
Member

chaoran-chen commented Mar 21, 2024

Why are aminoAcidInsertions (and nuc ones) included at all?

They shouldn't be, that's a bug.

It seems we use null for not provided

That's also a bug. In a TSV/CSV, I think that unknown values should just be left empty (please not N/A..) -> GenSpectrum/LAPIS#708

submittedAt and released at is a unix timestamp

LAPIS does not yet support timestamps as a type. We could provide it as 2023-01-05Z23:11:56 but that would then be considered as a string and would not support range filters. We could: (1) leave it as is for now and wait until we add timestamps in LAPIS, or (2) add an additional column, i.e., have submittedAt and submittedAtFormatted?

@corneliusroemer
Copy link
Contributor Author

Thanks! How do we distinguish an empty list from null/None for things like lists of frame shifts?

I guess having a unix timestamp is ok for now, was just not sure if this was intended.

@theosanderson
Copy link
Member

How do we distinguish an empty list from null/None for things like lists of frame shifts?

what's would the former mean? (if the answer is: "preprocessing pipeline configured not to provide frameshifts" then let's just not have the column at all in those cases)

@corneliusroemer
Copy link
Contributor Author

Maybe frameshifts is not the best example, what about something else like associated biosamples: there might be a difference between not knowing about them and knowing there are none.

@chaoran-chen
Copy link
Member

Thanks to GenSpectrum/LAPIS-SILO#354 and GenSpectrum/LAPIS#715, insertions are no longer included in the metadata file.

@corneliusroemer
Copy link
Contributor Author

The insertions have been fixed, there's a new issue for the timestamp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend related to the loculus backend component
Projects
None yet
Development

No branches or pull requests

3 participants