Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: should we cover SQL, and if so, in which course? #70

Closed
gvwilson opened this issue May 21, 2019 · 5 comments
Closed

Discussion: should we cover SQL, and if so, in which course? #70

gvwilson opened this issue May 21, 2019 · 5 comments
Labels
discussion discussion before a proposal on hold Issues to come back to later.

Comments

@gvwilson
Copy link
Contributor

  1. Should we explain how to write SQL queries?
  2. If so, should this go in the novice or intermediate course?
@gvwilson gvwilson added the discussion discussion before a proposal label May 21, 2019
@lwjohnst86
Copy link
Contributor

I think it would be better suited in the intermediate course. I think keeping it focused on only a few things that are "new" in the novice course, the better (e.g. how to start thinking in the R language). Plus, at least in my field, we don't really work with SQL databases and if you did, the dplyr package can link to the database and you can continue using the dplyr verbs to extract the data (dplyr converts to SQL commands under the hood).

@ljdursi
Copy link
Contributor

ljdursi commented May 23, 2019

I feel like having being at least somewhat aware of SQL is probably a prerequisite for an RSE. The idea of using pandas / dplyr for those operations on small local data in a couple tables for the novice material, and using that as context for SQL queries for bigger or external databases in the intermediate material might be a good way to go.

@joelostblom
Copy link
Contributor

I vote yes - intermediate.

My general opinion is that once the concepts of querying datasets are understood in either pandas, dplyr, or sql, transitioning between them is mostly getting familiar with a new syntax and it is more important to hammer home the fundamental concepts in one syntax than teaching many different ones. Having that said, the abundance of sql (and not to forget : people talking about sql), justifies including an introduction imo. And although I think learners would pick it up quickly once encountering it in the wild, I see value in introducing it for showing how the skills we taught so far translates well and make sql seem less intimidating.

The pandas docs have a relevant comparison of commands to consider including https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html

@ChristinaLK
Copy link
Contributor

A bit late to this, but I have always liked teaching SQL before R in data carpentry because you get exposed to the basic table operations (select cols and rows, group, both line by line and summary transformations) in a way that's a little less programmy.

All that to say, I don't think SQL is appropriate in the novice material, but framing the R and Python sections that do this as "these are universal table operations" is important.

@lwjohnst86
Copy link
Contributor

Return to this later (as discussed in 2019-06-11 meeting). Focus on current content. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion discussion before a proposal on hold Issues to come back to later.
Projects
None yet
Development

No branches or pull requests

5 participants