-
Notifications
You must be signed in to change notification settings - Fork 3
Bank API
Patrick Durand edited this page Apr 14, 2017
·
3 revisions
Bank is the representation of a file containing some sequences. Accepted formats are: Fasta and Fastq (plain text or gzip).
Bank is the class you use:
- to open a sequence file
- to iterate over its sequences
- to do some work with Sequence objects
This code snippet illustrates the use of the Bank and Sequence APIs:
# we import pyGATB Bank
from gatb import Bank
# We will use a file containing some Fasta sequences
F_NAME='../thirdparty/gatb-core/gatb-core/test/db/query.fa'
# We create the bank representation of the Fasta sequence file
bank=Bank(F_NAME)
print ("File '%s' is of type: %s"% (bank.uri, bank.type))
nseqs=0
# We iterate over some sequences.
for i, seq in enumerate(bank):
# 'seq' is of type 'Sequence'.
# Accessing 'Sequence' internals is done as follows:
# sequence header : seq.comment
# sequence quality: seq.quality (Fastq only)
# sequence letters: seq.sequence
# sequence size : len(seq)
seqid=seq.comment.decode("utf-8").split(" ")[0]
if i<5:
print('%d: %s: %d letters' % (i, seqid, len(seq)))
nseqs+=1
print('#sequences: %d' % nseqs)
(This code is taken from here).
Output of this Python3-pyGATB program is:
File '../thirdparty/gatb-core/gatb-core/test/db/query.fa' is of type: fasta
0: ENSTTRP00000007204: 585 letters
1: ENSTTRP00000007206: 232 letters
2: ENSTTRP00000007207: 435 letters
3: ENSTTRP00000007208: 529 letters
4: ENSTTRP00000000008: 529 letters
#sequences: 71