-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorial #101
Tutorial #101
Conversation
First Commiting Readme.md, source code and images
Current coverage is
|
* | ||
*/ | ||
object SparkRedshiftTutorial { | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spacing and indentation seems off in this file. Is it indented using a mixture of spaces and tabs? Please re-indent using only spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed this.
Not to nitpick, but I think PNG might be a better file format for images due to its better quality / compression ratios. |
We are ready to interact with Redshift using the spark-redshift library. The skeleton of the program we will be using is shown in Listing 1. The entire `SparkRedshiftTutorial.scala` program can be accessed from [here](SparkRedshiftTutorial.scala). You can also use the Spark REPL to run the lines listed in the program below. | ||
|
||
```scala | ||
package com.databricks.spark.redshift.tutorial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the full tutorial source is available as a .scala
file, do you think we can cut down on some of the skeleton / harness here to make the prose read a bit more smoothly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Cut down the comments as they are included in the .scala program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also changed as gif to png
|
||
Figure 1 : UNLOAD action | ||
|
||
First the Spark Driver communicates with the Redshift Leader node to obtain the schema of the table (or query) requested. The attribute `override lazy val schema: StructType` in the class `com.databricks.spark.redshift.RedshiftRelation` will obtain the schema on demand by invoking the method `resolveTable` of the class `com.databricks.spark.redshift.JDBCWrapper`. The `JDBCWrapper` class is responsible for fetching the schema from the Redshift Leader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
I took one editing pass, but may have additional feedback. I would try reading the current draft aloud to find odd phrasing, typos, and misspellings, then take an editing pass to fix the mechanical issues. |
I updated the tutorial and source code based on your comments. I made some more of my own as I did a full pass through it. |
I have some additional comments that I'd like to address, but I'm going to take care of them myself by submitting a followup PR. Therefore, I'm going to merge this now. Thanks! |
Upgrade to spark v3.2.0
Created the following set of files