Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial #101

Closed
wants to merge 15 commits into from
Closed

Tutorial #101

wants to merge 15 commits into from

Conversation

sameeraxiomine
Copy link
Contributor

Created the following set of files

  1. tutorial/README.md
  2. SparkRedshiftTutorial.scala
  3. images/loadreadstep.gif
  4. images/loadunloadstep.gif
  5. images/savetoredshift.gif

First Commiting Readme.md, source code and images
Update tutorial/README.md
Update image path
Update images
Fixing the tutorial links
Update Links
Link to the entire program
Fix path
Made more fixes to verbiage
Updates
@codecov-io
Copy link

Current coverage is 87.16%

Merging #101 into master will decrease coverage by -7.65% as of df65fde

@@            master    #101   diff @@
======================================
  Files           11      11       
  Stmts          444     444       
  Branches       109     109       
  Methods          0       0       
======================================
- Hit            421     387    -34
  Partial          0       0       
- Missed          23      57    +34

Review entire Coverage Diff as of df65fde

Powered by Codecov. Updated on successful CI builds.

*
*/
object SparkRedshiftTutorial {
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spacing and indentation seems off in this file. Is it indented using a mixture of spaces and tabs? Please re-indent using only spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed this.

@JoshRosen
Copy link
Contributor

Not to nitpick, but I think PNG might be a better file format for images due to its better quality / compression ratios.

We are ready to interact with Redshift using the spark-redshift library. The skeleton of the program we will be using is shown in Listing 1. The entire `SparkRedshiftTutorial.scala` program can be accessed from [here](SparkRedshiftTutorial.scala). You can also use the Spark REPL to run the lines listed in the program below.

```scala
package com.databricks.spark.redshift.tutorial
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the full tutorial source is available as a .scala file, do you think we can cut down on some of the skeleton / harness here to make the prose read a bit more smoothly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Cut down the comments as they are included in the .scala program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also changed as gif to png


Figure 1 : UNLOAD action

First the Spark Driver communicates with the Redshift Leader node to obtain the schema of the table (or query) requested. The attribute `override lazy val schema: StructType` in the class `com.databricks.spark.redshift.RedshiftRelation` will obtain the schema on demand by invoking the method `resolveTable` of the class `com.databricks.spark.redshift.JDBCWrapper`. The `JDBCWrapper` class is responsible for fetching the schema from the Redshift Leader.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, comma.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@JoshRosen
Copy link
Contributor

I took one editing pass, but may have additional feedback. I would try reading the current draft aloud to find odd phrasing, typos, and misspellings, then take an editing pass to fix the mechanical issues.

Update based on comments received
Update the tutorial contents
Updates
Code fix
@sameeraxiomine
Copy link
Contributor Author

I updated the tutorial and source code based on your comments. I made some more of my own as I did a full pass through it.

@JoshRosen
Copy link
Contributor

I have some additional comments that I'd like to address, but I'm going to take care of them myself by submitting a followup PR. Therefore, I'm going to merge this now. Thanks!

@JoshRosen JoshRosen closed this in c72dc89 Oct 15, 2015
@JoshRosen JoshRosen mentioned this pull request Oct 15, 2015
JoshRosen added a commit that referenced this pull request Oct 17, 2015
This patch is a follow-up to #101 and makes many minor edits in the tutorial text.

/cc sameeraxiomine

Author: Josh Rosen <joshrosen@databricks.com>

Closes #106 from JoshRosen/tutorial-edits.
dorisZ017 pushed a commit to ActionIQ/spark-redshift that referenced this pull request May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants