Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal of topics and masterclasses + presentation #16

Open
LydiaFrance opened this issue Oct 27, 2021 · 4 comments
Open

Proposal of topics and masterclasses + presentation #16

LydiaFrance opened this issue Oct 27, 2021 · 4 comments

Comments

@LydiaFrance
Copy link
Collaborator

Hi @fedenanni and @malvikasharan

This is the presentation I've put together based on our last meeting. It provides the motivation/overview of what I think the training should cover. I think it would be good to present this to the Crick soon.

https://docs.google.com/presentation/d/1vwRLcj01DiPJ_OWqQzMb8PQxbQgv9dNwyj8jku02UUM/edit?usp=sharing

@fedenanni
Copy link
Collaborator

fedenanni commented Oct 27, 2021

Ehi @LydiaFrance for me it looks great!! If @malvikasharan is happy I think you can reach out to the Crick and set up a meeting soon to go though this with them! Great work!

(I’m back on Monday but I saw the notification and was pretty curious :D )

@malvikasharan
Copy link
Collaborator

This looks great to me Lydia. Please do send it to the full list of recipients. For the specific ask, please tag James B., James F. and Rebecca in your email. I have to add another person to the thread which I will do now, and you can take things from there. Pinging @KirstieJane if she would like to add something.

@malvikasharan
Copy link
Collaborator

Comment by James Briscoe:

I would aim for fewer than 5-6 distinct classes. I doubt most GLs would commit to that many. One option would be to aim for ~2 core sessions and then additional sessions focused on specific issues. The two core sessions might be along the lines “Best practices in managing computational biology projects” and “An overview of AI/ML and deep learning in biomedical research for group leaders”.

It will be important that course material is as practical as possible and tailored to biologists, using examples typical of what we do. This is an issue for all computational training (and interdisciplinary training more generally) – content of many computational courses, beyond introductory level, tend to come from disciplines that have different working practices and use examples that are not directly relevant to what most of us do. (As an example: While there are undoubtedly things we can learn from agile working, biomed research is never going to be the same as software development.)

It would be useful to involve some STP leads from the core facilities that are data heavy (I’m thinking BABS and Imaging) so that the course material is directly aligned with what happens at the Crick. Along the same lines, James Turner is leading on Open Science and reproducibility in the Crick and it will probably be helpful to get his input.

@malvikasharan
Copy link
Collaborator

Comment by Lydia:

With a new PhD student, which of these training courses are compulsory? I’m trying to get an idea about the role of a group leader in directing their students to training, whether the student has to volunteer, or whether they all go through a basic level of training.
Looking at the inhouse training, I can’t access the Crick intranet so I can’t see what is taught. It would be particularly helpful to see what is in the “Data Science Specialisation with R”, “Programming courses on Tutorialspoint” , “Crick data challenge”.

The training I’m writing is from a top-down perspective and designed for the Group Leaders, but part of that is giving the leaders a view of what their lab members should know. This should therefore feed into the Crick training and so I’ll need to flag up anything that might be missing. For example, I can’t see from the training titles about version control/git (the exception is in the R Advanced Courses from Jumping Rivers). This will be an important part of the training course I’m building, and if there are no obvious resources for lab members to learn it then that’s a problem.

It would also be helpful to know about the data management expectations within the Crick, and if there is a standardised data management process.

Response from James Fleming:

None are compulsory, and there is no particular guidance on progression at the moment either – it’s very much a conversation between student and GL. It’s the first focus area we’ve taken away to try and group by level, and then by ‘flow’, ie what predecessors you should take.
The Crick data challenge is an event, rather than a training course – it’s designed to pair experts from scientific computing, bioinformatics and computational labs with more ‘wet lab’ scientists with an aim of solving novel problems. Many of these then spawn ongoing projects over time.
Tutorialspoint is available here: https://www.tutorialspoint.com/computer_programming/index.htm
Johns Hopkins Data Science specialisation here: https://www.coursera.org/specializations/jhu-data-science

Agree around the gaps – again, something we are looking at ourselves. Not crowdsourced views yet, but at a glance, the gaps for me are at least the following:
Software engineering practice – version control, backlog management, unit testing, quality assurance
Software architecture – principles of design, services, APIs, code documentation
Software engineering management for teams – agile methodology, source management, sprint management, CI/CD frameworks, deployment management
Infrastructure – HPC, VMs, Cloud, Containers
Building and managing databases – choosing the right technology, relational, non-relational, graph, schemas, efficient query design
Tensorflow/Nextflow – designing and optimising effective pipelines
Many areas of AI & ML – architecture of networks, designing effective approaches, understanding data suitability, safety & ethics, discoverability/transparency of outcomes
Data visualisation – Shiny, PowerBI, Tableau etc.
FAIR Data and effective data management

I’m sure there are many more!

On your question around data management, our policies currently stop at the basics, where to store it, retention policies, expectation around publication etc. There is a lot to do around embedding consistent data approaches in experimentation, metadata management etc. Looping Karen Ambrose who leads the Research Data Services team, and is leading the programme to address these areas.

@malvikasharan malvikasharan changed the title Proposal Presentation Proposal of topics and masterclasses + presentation Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants