Skip to content

Commit

Permalink
Merge pull request #34 from ewels/master
Browse files Browse the repository at this point in the history
First release review tweaks
  • Loading branch information
ypriverol authored Apr 26, 2021
2 parents dc85bd2 + defbeda commit b06ebf4
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 15 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ jobs:
env:
NXF_VER: ${{ matrix.nxf_ver }}
NXF_ANSI_LOG: false
COSMIC_USERNAME: ${{ secrets.COSMIC_USERNAME }}
COSMIC_PASSWORD: ${{ secrets.COSMIC_PASSWORD }}

strategy:
matrix:
Expand Down
30 changes: 17 additions & 13 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,30 +20,34 @@ The main source of canonical protein sequence in pgdb is ENSEMBL. The user can t
* [Decoy](#decoys) - Add decoy proteins to the final database.
* [Output](#output) - Output results including clean databases and decoy generation

## Ensembl
## Pipeline modes

The pipeline will download the the ENSEMBL protein reference proteome, this will be added to the final protein database. The protein databae is downloaded from [ENSEMBL FTP](http://www.ensembl.org/info/data/ftp/index.html)
### Ensembl

## Ensembl non canonical
The pipeline will download the the ENSEMBL protein reference proteome, this will be added to the final protein database. The protein database is downloaded from [ENSEMBL FTP](http://www.ensembl.org/info/data/ftp/index.html).

The Ensembl non canonical includes the pseudogenes, lncRNAs, etc. The accessions of each type of kind of novel protein is predefined by the [pypgatk tool](https://github.com/bigbio/py-pgatk)
### Ensembl non canonical

* ncRNA_ENST00000456688 - non coding RNA transcript.
* altorf_ENST00000310473 - alternative open reading frame
* pseudo_ENST00000436135 - pseudo gene translation
The Ensembl non canonical includes the pseudogenes, lncRNAs, etc. The accessions of each type of kind of novel protein is predefined by the [pypgatk tool](https://github.com/bigbio/py-pgatk).

## Variants
* `ncRNA_ENST00000456688` - non coding RNA transcript.
* `altorf_ENST00000310473` - alternative open reading frame
* `pseudo_ENST00000436135` - pseudo gene translation

### Variants

The COSMIC or cBioPortal variants are downloaded automatically from these resources. The accessions of those proteins are:

* COSMIC:ANXA3_ENST00000503570:p.A67T:Substitution-Missense - Accession of the protein includes the position of the aminoacid variant.
* `COSMIC:ANXA3_ENST00000503570:p.A67T:Substitution-Missense` - Accession of the protein includes the position of the aminoacid variant.

## Decoy
### Decoy

Decoy can be added to the final database. Decoys accessions are prefix with `DECOY_` by default, but they can be configured by the users.

## Output
## Output files

The nf-core/pgdb pipeline produces one single output file:

/fasta_database.fa
* `/fasta_database.fa`

The FASTA database including all the protein sequences including the reference proteomes, variants, pseudo-genes, etc.
This FASTA database includes all of the protein sequences including the reference proteomes, variants, pseudo-genes, etc.

0 comments on commit b06ebf4

Please sign in to comment.