Skip to content

Bank API

Patrick Durand edited this page Apr 14, 2017 · 3 revisions

Bank is the representation of a file containing some sequences. Accepted formats are: Fasta and Fastq (plain text or gzip).

Bank is the class you use:

  • to open a sequence file
  • to iterate over its sequences
  • to do some work with Sequence objects

This code snippet illustrates the use of the Bank and Sequence APIs:

# we import pyGATB Bank
from gatb import Bank

# We will use a file containing some Fasta sequences
F_NAME='../thirdparty/gatb-core/gatb-core/test/db/query.fa'

# We create the bank representation of the Fasta sequence file
bank=Bank(F_NAME)

print ("File '%s' is of type: %s"% (bank.uri, bank.type))

nseqs=0

# We iterate over some sequences.
for i, seq in enumerate(bank):
  # 'seq' is of type 'Sequence'.
  # Accessing 'Sequence' internals is done as follows:
  #   sequence header : seq.comment
  #   sequence quality: seq.quality (Fastq only)
  #   sequence letters: seq.sequence
  #   sequence size   : len(seq)
  seqid=seq.comment.decode("utf-8").split(" ")[0]
  if i<5:
    print('%d: %s: %d letters' % (i, seqid, len(seq)))
  nseqs+=1  

print('#sequences: %d' % nseqs)

(This code is taken from here).

Output of this Python3-pyGATB program is:

File '../thirdparty/gatb-core/gatb-core/test/db/query.fa' is of type: fasta
0: ENSTTRP00000007204: 585 letters
1: ENSTTRP00000007206: 232 letters
2: ENSTTRP00000007207: 435 letters
3: ENSTTRP00000007208: 529 letters
4: ENSTTRP00000000008: 529 letters
#sequences: 71
Clone this wiki locally