Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Io to bio with genbank fixes #394

Merged
merged 22 commits into from
Nov 5, 2023
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,18 @@ In order to simplify the development experience, and environment setup, the poly

Whether you're a beginner with Go or you're an experienced developer, You should see the suggestions popup automatically when you goto the *Plugins* tab in VSCode. Using these plugins can help accelerate the development experience and also allow you to work more collaboratively with other poly developers.

## Local Checks

Poly runs numerous CI/CD checks via Github Actions before a PR can be merged. In order to make your PR mergeable, your PR must pass all of these checks.

A quick way to check your PR will pass is to run:

```sh
gofmt -s -w . && go test ./...
```

Additionally, you may want to [install](https://golangci-lint.run/usage/install/#local-installation) and run the linter.

# How to report a bug

### Security disclosures
Expand Down
2 changes: 1 addition & 1 deletion bio/example_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ ORIGIN
records, _ := parser.Parse()

fmt.Println(records[0].Features[2].Attributes["translation"])
// Output: MTMITPSLHACRSTLEDPRVPSSNSLAVVLQRRDWENPGVTQLNRLAAHPPFASWRNSEEARTDRPSQQLRSLNGEWRLMRYFLLTHLCGISHRIWCTLSTICSDAA
// Output: [MTMITPSLHACRSTLEDPRVPSSNSLAVVLQRRDWENPGVTQLNRLAAHPPFASWRNSEEARTDRPSQQLRSLNGEWRLMRYFLLTHLCGISHRIWCTLSTICSDAA]
}

func ExampleNewSlow5Parser() {
Expand Down
6 changes: 3 additions & 3 deletions bio/fasta/fasta.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,15 @@ Fasta Parser begins here
Many thanks to Jordan Campbell (https://github.com/0x106) for building the first
parser for Poly and thanks to Tim Stiles (https://github.com/TimothyStiles)
for helping complete that PR. This work expands on the previous work by allowing
for concurrent parsing and giving Poly a specific parser subpackage,
for concurrent parsing and giving Poly a specific parser subpackage,
as well as few bug fixes.

Fasta is a very simple file format for working with DNA, RNA, or protein sequences.
It was first released in 1985 and is still widely used in bioinformatics.

https://en.wikipedia.org/wiki/_format
https://en.wikipedia.org/wiki/FASTA_format

One interesting use of the concurrent parser is working with the Uniprot
One interesting use of the concurrent parser is working with the Uniprot
fasta dump files, which are far too large to fit into RAM. This parser is able
to easily handle those files by doing computation actively while the data dump
is getting parsed.
Expand Down
147 changes: 147 additions & 0 deletions bio/genbank/data/NC_001141.2_redux.gb
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
LOCUS NC_001141 439888 bp DNA linear CON 15-SEP-2023
DEFINITION Saccharomyces cerevisiae S288C chromosome IX, complete sequence.
ACCESSION NC_001141
VERSION NC_001141.2
DBLINK BioProject: PRJNA128
Assembly: GCF_000146045.2
KEYWORDS RefSeq.
SOURCE Saccharomyces cerevisiae S288C
ORGANISM Saccharomyces cerevisiae S288C
Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina;
Saccharomycetes; Saccharomycetales; Saccharomycetaceae;
Saccharomyces.
REFERENCE 1 (bases 1 to 439888)
AUTHORS Engel,S.R., Wong,E.D., Nash,R.S., Aleksander,S., Alexander,M.,
Douglass,E., Karra,K., Miyasato,S.R., Simison,M., Skrzypek,M.S.,
Weng,S. and Cherry,J.M.
TITLE New data and collaborations at the Saccharomyces Genome Database:
updated reference genome, alleles, and the Alliance of Genome
Resources
JOURNAL Genetics 220 (4) (2022)
PUBMED 34897464
REFERENCE 2 (bases 1 to 439888)
AUTHORS Churcher,C., Bowman,S., Badcock,K., Bankier,A., Brown,D.,
Chillingworth,T., Connor,R., Devlin,K., Gentles,S., Hamlin,N.,
Harris,D., Horsnell,T., Hunt,S., Jagels,K., Jones,M., Lye,G.,
Moule,S., Odell,C., Pearson,D., Rajandream,M., Rice,P., Rowley,N.,
Skelton,J., Smith,V., Barrell,B. et al.
TITLE The nucleotide sequence of Saccharomyces cerevisiae chromosome IX
JOURNAL Nature 387 (6632 SUPPL), 84-87 (1997)
PUBMED 9169870
REFERENCE 3 (bases 1 to 439888)
AUTHORS Goffeau,A., Barrell,B.G., Bussey,H., Davis,R.W., Dujon,B.,
Feldmann,H., Galibert,F., Hoheisel,J.D., Jacq,C., Johnston,M.,
Louis,E.J., Mewes,H.W., Murakami,Y., Philippsen,P., Tettelin,H. and
Oliver,S.G.
TITLE Life with 6000 genes
JOURNAL Science 274 (5287), 546 (1996)
PUBMED 8849441
REFERENCE 4 (bases 1 to 439888)
CONSRTM NCBI Genome Project
TITLE Direct Submission
JOURNAL Submitted (14-SEP-2023) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA
REFERENCE 5 (bases 1 to 439888)
CONSRTM Saccharomyces Genome Database
TITLE Direct Submission
JOURNAL Submitted (04-MAY-2012) Department of Genetics, Stanford
University, Stanford, CA 94305-5120, USA
REMARK Protein update by submitter
REFERENCE 6 (bases 1 to 439888)
CONSRTM Saccharomyces Genome Database
TITLE Direct Submission
JOURNAL Submitted (31-MAR-2011) Department of Genetics, Stanford
University, Stanford, CA 94305-5120, USA
REMARK Sequence update by submitter
REFERENCE 7 (bases 1 to 439888)
CONSRTM Saccharomyces Genome Database
TITLE Direct Submission
JOURNAL Submitted (14-DEC-2009) Department of Genetics, Stanford
University, Stanford, CA 94305-5120, USA
COMMENT REVIEWED REFSEQ: This record has been curated by SGD. The reference
sequence is identical to BK006942.

On Apr 26, 2011 this sequence version replaced NC_001141.1.

##Genome-Annotation-Data-START##
Annotation Provider :: SGD
Annotation Status :: Full Annotation
Annotation Version :: R64-4-1
URL :: http://www.yeastgenome.org/
##Genome-Annotation-Data-END##
COMPLETENESS: full length.
FEATURES Location/Qualifiers
source 1..439888
/organism="Saccharomyces cerevisiae S288C"
/mol_type="genomic DNA"
/strain="S288C"
/db_xref="taxon:559292"
/chromosome="IX"
telomere complement(1..7784)
/note="TEL09L; Telomeric region on the left arm of
Chromosome IX; composed of an X element core sequence, X
element combinatorial repeats, a long Y' element, and a
short terminal stretch of telomeric repeats"
/db_xref="SGD:S000028896"
gene complement(<483..>6147)
/locus_tag="YIL177C"
/db_xref="GeneID:854630"
mRNA complement(join(<483..4598,4987..>6147))
/locus_tag="YIL177C"
/product="Y' element ATP-dependent helicase"
/transcript_id="NM_001179522.1"
/db_xref="GeneID:854630"
CDS complement(join(483..4598,4987..6147))
/locus_tag="YIL177C"
/EC_number="3.6.4.12"
/note="Putative Y' element ATP-dependent helicase"
/codon_start=1
/product="Y' element ATP-dependent helicase"
/protein_id="NP_012092.1"
/db_xref="GeneID:854630"
/db_xref="SGD:S000001439"
/translation="MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVR
SFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPI
PSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIA
SARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQ
ASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNF
GAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVS
SCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFA
GPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALG
NSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRE
LPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVD
SFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQ
KISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAP
PGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRCGCLNVAPVRNFI
EEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFET
EVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKR
SEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPES
KAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKL
VTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPP
IKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATAS
MSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTN
ATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVR
TSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTN
SSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTES
TNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHP
VTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDI
YFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYL
EDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQ
VFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG
GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPH
WTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYML
MVAVHKELDSDDVPDGRFDILLCRDSSREVGE"
rep_origin 7470..8793
/note="ARS902; Putative replication origin; identified in
multiple array studies, not yet confirmed by plasmid-based
assay"
/db_xref="SGD:S000130156"
mRNA join(<155222,155311..>155765)
/gene="COX5B"
/locus_tag="YIL111W"
/product="cytochrome c oxidase subunit Vb"
/transcript_id="NM_001179459.1"
/db_xref="GeneID:854695"
CONTIG join(BK006942.2:1..439888)
//

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading
Loading