Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Front matter for website #48

Open
ilectra opened this issue Jul 19, 2018 · 13 comments
Open

Front matter for website #48

ilectra opened this issue Jul 19, 2018 · 13 comments
Assignees

Comments

@ilectra
Copy link
Collaborator

ilectra commented Jul 19, 2018

No description provided.

@timlevine
Copy link
Collaborator

timlevine commented Jul 24, 2018 via email

@timlevine
Copy link
Collaborator

timlevine commented Jul 24, 2018

Welcome to the HHyeast website.

This website offers the results of remote homology searches of the entire genome of the model budding yeast Saccharomyces cerevisiae. The searches have been carried out using the HHsearch package developed by Johanne Soeding and colleagues [1], a tool also known as "HHpred" through its online server [2].

The results show visualisations of the strongest homologies for 100% of 6,713 verified yeast open reading frames (ORFs) in three databases:

  1. PDB (solved structures)
  2. Pfam (curated protein domain families from the European Bioinformatics Institute)
  3. the yeast proteome itself.

For each ORF, the summary of all three that allows you to investigate each set of hits in more detail, re-setting the thresholds etc that allow more or less hits to be displayed.

Data Download
As well as being able to save images of the domain displays through the controls in each window, for each ORF the file from which the data has been extracted for the visualisation can be downloaded (suffix ".hhr", opened as a text file).

Gaps between the domains in the visualisations
HHyeast has discovered YYY additional hits in the gaps between the domains displayed here [3]. Although these are not in the visualisations, it is clear which ORFs have such hits from the download buttons, which ... insert text here

Job submission
Type a gene identifier, either systematic name or standard name if available. The server will offer valid options. Typically three panels will be displayed, one for each of the databases (PDB, Pfam Yeast), but a panel will not be displayed if no hits reach the default threshold for display. Each panel can then be examined in detail allowing re-setting of display thresholds for the other two panels.

NOTES

  1. "Protein homology detection by HMM-HMM comparison". Söding J. Bioinformatics. 2005, updated most recently in "A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core." Zimmermann et al., J Mol Biol. 2018
  2. https://toolkit.tuebingen.mpg.de/#/tools/hhpred
  3. "HHyeast reveals XXX new domains in the yeast proteome". Christidi et al., Manuscript in Preparation

@ilectra
Copy link
Collaborator Author

ilectra commented Aug 6, 2018

Currently, if there are no hits above probability threshold, no panel appears in the summary view. My plan is that, if with a new probability threshold in the detail view some hits do appear, then the "missing" panel would re-appear in the summary view, like you described. Is this the functionality you'd like?

@timlevine
Copy link
Collaborator

I'I'd like something to appear for every protein even if there are no hits above 50% - maybe a message saying you can look in more detail, as you suggest.

How many proteins have zero hits that meet the length criteria?

@ilectra
Copy link
Collaborator Author

ilectra commented Aug 6, 2018

ok, I'll try to implement that.

And I've no idea how many proteins have zero hits that meet the length criteria...

@ilectra
Copy link
Collaborator Author

ilectra commented Sep 27, 2018

Unfortunately it's not at all obvious how to show an empty plot when there are no hits and then fill it up when a lower threshold is provided - see #53 , noted for future development. I can show a message that no hits are available for this db, but lower probability hits will not be available for those ORF's. I suspect there are not many ORF's like that.

@ilectra
Copy link
Collaborator Author

ilectra commented Dec 5, 2018

@timlevine , @tamuri , I'm trying to finalise this. Can I please have some numbers for the XXX's and YYY's, as well as some text for the "Gaps between the domains in the visualisations" section?

@timlevine
Copy link
Collaborator

@tamuri , @ilectra ,

So far no time to look for XXX and YYY.

I can re-write this without XXX and YYY if that's going to be better than nothing!

I must admit that this is a problem of my own making. I have not found the time to visit the overall discovery rate of new domains. I have a results file that Asif sent me ("hhrpy_hits_20171010" and similar). It needs some work to reveal what's new in there.

Then there's the domains in the gaps. I have mislaid the file / data I was sent on that.

@ilectra ilectra reopened this Dec 5, 2018
@ilectra
Copy link
Collaborator Author

ilectra commented Dec 5, 2018

That's fine, @timlevine , we can come back and revisit the data and their visualisation when there's more funding/time. Shall we say then that I'll skip any reference to gap analysis, as well as Note 3, this time? And restrict the file downloads to the original (whole genome search) .hhr files?

@timlevine
Copy link
Collaborator

Is there a reason we cannot offer the gap downloads? I'd hope we can do that.

So the thing that I need to do is re-write the text to explain where we've got to. Also, would it be OK to have a special download explaining how to unpack the information within the usual download files as well as in the gap files?

It's been a long while and I have forgotten all about where the gap files are and how to download them. If you could give a simpleton's guide on downloading all of them (batch) that would be helpful

@timlevine
Copy link
Collaborator

@ilectra
First stab - this needs more work to produce a download or more text to go below this to help users understand the HHR files

I have pasted the text back in here and it's lost the formatting - can you extract the changes anyway from the elongated paragraphs? I think that the "Gaps between the domains in the visualisations
" section should be left in and so it needs some more text, but only when we know what the button will look like!
T

Welcome to the HHyeast website.
This website offers the results of remote homology searches of the entire genome of the model budding yeast Saccharomyces cerevisiae. The searches have been carried out using the HHsearch package developed by Johanne Soeding and colleagues [1], a tool also known as "HHpred" through its online server [2].
The results show visualisations of the strongest homologies for 100% of 6,713 verified yeast open reading frames (ORFs) in three databases:

  1. PDB (solved structures)
  2. Pfam (curated protein domain families from the European Bioinformatics Institute)
  3. the yeast proteome itself.

Entering the name of each ORF alows you either to “Download file” of the HHpred results (see below). Alternatively you can choose “Display plot”, which leads to a visaulisation that summarises the strong hits to the ORF in all three databases (PDB, Pfam and yeast, minus the ORF itself) in three separate boxes. Note that for these strong hits, similar hits are clustered together and only one hit per cluster is displayed. To see more detail than this, you can delve deeper within “Display plot”, by pressing one of the three buttons at the bottom (“Detailed PDB hits” etc.). These show every single hit (multiple per cluster), and provide you with options to visualise more or less hits by re-setting the threshold and the degree of accepted overlap. Once you have chosen new settings for one database, you can then apply this to all three by choosing “Go to summary view”.

Data Download

As well as being able to save images of the domain displays through the controls in each window, “Download file” for each ORF the file from which the data has been extracted for the visualisation can be downloaded (suffix ".hhr", opened as a text file).

Gaps between the domains in the visualisations

HHyeast has discovered many additional hits in the gaps between the domains displayed here [3]. Although these are not in the visualisations, it is clear which ORFs have such hits from the download buttons, which ... XXXX

Job submission

Type a gene identifier, either systematic name or standard name if available. The server will offer valid options. Typically three panels will be displayed, one for each of the databases (PDB, Pfam Yeast), but a panel will not be displayed if no hits reach the default threshold for display. Each panel can then be examined in detail allowing re-setting of display thresholds for the other two panels.

NOTES

  1. "Protein homology detection by HMM-HMM comparison". Söding J. Bioinformatics. 2005, updated most recently in "A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core." Zimmermann et al., J Mol Biol. 2018
  2. https://toolkit.tuebingen.mpg.de/#/tools/hhpred
  3. "HHyeast reveals hundreds of new domains in the yeast proteome". Christidi et al., Manuscript in Preparation

@timlevine
Copy link
Collaborator

@ilectra
here's a further edit - this time leaving the formatting in place by keep int he text in MS Word
Welcome to the HHyeast website.docx

@ilectra
Copy link
Collaborator Author

ilectra commented Dec 5, 2018

@timlevine , I decided to split the information between the different views, to display the instructions that are relevant to the specific view, instead of explaining everything in the start page. Can you please have a look and let me know what you think?
note: the explanation about the data download and format is what I want to implement, but it's not there yet. It's the next (and final!) item on my list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants