Skip to content

Commit

Permalink
Merge pull request #139 from OlivierBlanvillain/publish-docs
Browse files Browse the repository at this point in the history
Publish docs
  • Loading branch information
OlivierBlanvillain authored May 25, 2017
2 parents 517ca96 + eefecb5 commit 58eef48
Show file tree
Hide file tree
Showing 8 changed files with 70 additions and 46 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,13 @@ associated channels (e.g. GitHub, Gitter) to be a safe and friendly environment

## Documentation

* [TypedDataset: Feature Overview](http://olivierblanvillain.github.io/frameless/GettingStarted.html)
* [Comparing TypedDatasets with Spark's Datasets](http://olivierblanvillain.github.io/frameless/TypedDatasetVsSparkDataset.html)
* [Typed Encoders in Frameless](http://olivierblanvillain.github.io/frameless/TypedEncoder.html)
* [Injection: Creating Custom Encoders](http://olivierblanvillain.github.io/frameless/Injection.html)
* [Using Cats with RDDs](http://olivierblanvillain.github.io/frameless/Cats.html)
* [Proof of Concept: TypedDataFrame](http://olivierblanvillain.github.io/frameless/TypedDataFrame.html)
* [TypedDataset: Feature Overview](http://typelevel.org/frameless/FeatureOverview.html)
* [Comparing TypedDatasets with Spark's Datasets](http://typelevel.org/frameless/TypedDatasetVsSparkDataset.html)
* [Typed Encoders in Frameless](http://typelevel.org/frameless/TypedEncoder.html)
* [Injection: Creating Custom Encoders](http://typelevel.org/frameless/Injection.html)
* [Job\[A\]](http://typelevel.org/frameless/Job.html)
* [Using Cats with RDDs](http://typelevel.org/frameless/Cats.html)
* [Proof of Concept: TypedDataFrame](http://typelevel.org/frameless/TypedDataFrame.html)

## Why?

Expand Down
8 changes: 8 additions & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -184,3 +184,11 @@ lazy val credentialSettings = Seq(
password <- Option(System.getenv().get("SONATYPE_PASSWORD"))
} yield Credentials("Sonatype Nexus Repository Manager", "oss.sonatype.org", username, password)).toSeq
)

copyReadme := copyReadmeImpl.value
lazy val copyReadme = taskKey[Unit]("copy for website generation")
lazy val copyReadmeImpl = Def.task {
val from = baseDirectory.value / "README.md"
val to = baseDirectory.value / "docs" / "src" / "main" / "tut" / "README.md"
sbt.IO.copy(List((from, to)), overwrite = true, preserveLastModified = true)
}
43 changes: 22 additions & 21 deletions docs/src/main/tut/Injection.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Injection: Creating Custom Encoders

```tut:invisible
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
Expand All @@ -11,41 +12,41 @@ implicit val sqlContext = spark.sqlContext
spark.sparkContext.setLogLevel("WARN")
import spark.implicits._
```
```
Injection lets us define encoders for types that do not have one by injecting `A` into an encodable type `B`.
This is the definition of the injection typeclass:
This is the definition of the injection typeclass:
```scala
trait Injection[A, B] extends Serializable {
def apply(a: A): B
def invert(b: B): A
}
```
```

## Example

Let's define a simple case class:
Let's define a simple case class:

```tut:book
case class Person(age: Int, birthday: java.util.Date)
val people = Seq(Person(42, new java.util.Date))
```
```

And an instance of a `TypedDataset`:

```tut:book:fail
val personDS = TypedDataset.create(people)
```
```

Looks like we can't, a `TypedEncoder` instance of `Person` is not available, or more precisely for `java.util.Date`.
But we can define a injection from `java.util.Date` to an encodable type, like `Long`:
Looks like we can't, a `TypedEncoder` instance of `Person` is not available, or more precisely for `java.util.Date`.
But we can define a injection from `java.util.Date` to an encodable type, like `Long`:

```tut:book
import frameless._
implicit val dateToLongInjection = new Injection[java.util.Date, Long] {
def apply(d: java.util.Date): Long = d.getTime()
def invert(l: Long): java.util.Date = new java.util.Date(l)
}
```
```

We can be less verbose using the `Injection.apply` function:

Expand All @@ -54,37 +55,37 @@ import frameless._
implicit val dateToLongInjection = Injection((_: java.util.Date).getTime(), new java.util.Date((_: Long)))
```

Now we can create our `TypedDataset`:
Now we can create our `TypedDataset`:

```tut:book
val personDS = TypedDataset.create(people)
```
```

## Another example

Let's define a sealed family:
Let's define a sealed family:

```tut:book
sealed trait Gender
case object Male extends Gender
case object Female extends Gender
case object Other extends Gender
```
```

And a simple case class:
And a simple case class:

```tut:book
case class Person(age: Int, gender: Gender)
val people = Seq(Person(42, Male))
```
```

Again if we try to create a `TypedDataset`, we get a compilation error.

```tut:book:fail
val personDS = TypedDataset.create(people)
```
```

Let's define an injection instance for `Gender`:
Let's define an injection instance for `Gender`:

```tut:book
implicit val genderToInt: Injection[Gender, Int] = Injection(
Expand All @@ -98,14 +99,14 @@ implicit val genderToInt: Injection[Gender, Int] = Injection(
case 2 => Female
case 3 => Other
})
```
```

And now we can create our `TypedDataset`:
And now we can create our `TypedDataset`:

```tut:book
val personDS = TypedDataset.create(people)
```
```

```tut:invisible
spark.stop()
```
```
28 changes: 15 additions & 13 deletions docs/src/main/tut/Job.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# Job\[A\]

All operations on `TypedDataset` are lazy. An operation either returns a new
All operations on `TypedDataset` are lazy. An operation either returns a new
transformed `TypedDataset` or a `Job[A]`, where `A` is the result of running a
non-lazy computation in Spark. `Job` serves several functions:
non-lazy computation in Spark. `Job` serves several functions:

- Makes all operations on a `TypedDataset` lazy, which makes them more predictable compared to having
few operations being lazy and other being strict:
few operations being lazy and other being strict
- Allows the programmer to make expensive blocking operations explicit
- Allows for Spark jobs to be lazily sequenced using monadic composition via for-comprehension
- Provides an obvious place where you can annotate/name your Spark jobs to make it easier
to track different parts of your application in the Spark UI

The toy example showcases the use of for-comprehension to explicitly sequences Spark Jobs.
The toy example showcases the use of for-comprehension to explicitly sequences Spark Jobs.
First we calculate the size of the `TypedDataset` and then we collect to the driver
exactly 20% of its elements:
exactly 20% of its elements:

```tut:invisible
import org.apache.spark.{SparkConf, SparkContext}
Expand All @@ -32,28 +32,28 @@ import spark.implicits._
```tut:book
val ds = TypedDataset.create(1 to 20)
val countAndTakeJob =
val countAndTakeJob =
for {
count <- ds.count()
count <- ds.count()
sample <- ds.take((count/5).toInt)
} yield sample
countAndTakeJob.run()
```

The `countAndTakeJob` can either be executed using `run()` (as we show above) or it can
The `countAndTakeJob` can either be executed using `run()` (as we show above) or it can
be passed along to other parts of the program to be further composed into more complex sequences
of Spark jobs.
of Spark jobs.

```tut:book
import frameless.Job
def computeMinOfSample(sample: Job[Seq[Int]]): Job[Int] = sample.map(_.min)
val finalJob = computeMinOfSample(countAndTakeJob)
val finalJob = computeMinOfSample(countAndTakeJob)
```

Now we can execute this new job by specifying a [group-id](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext@setJobGroup(groupId:String,description:String,interruptOnCancel:Boolean):Unit) and a description.
This allows the programmer to see this information on the Spark UI and help track, say,
Now we can execute this new job by specifying a [group-id][group-id] and a description.
This allows the programmer to see this information on the Spark UI and help track, say,
performance issues.

```tut:book
Expand All @@ -66,4 +66,6 @@ finalJob.

```tut:invisible
spark.stop()
```
```

[group-id]: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext@setJobGroup(groupId:String,description:String,interruptOnCancel:Boolean):Unit
Empty file removed docs/src/main/tut/README.md
Empty file.
4 changes: 2 additions & 2 deletions docs/src/main/tut/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
- [TypedDataset: Feature Overview](GettingStarted.md)
- [TypedDataset: Feature Overview](FeatureOverview.md)
- [Comparing TypedDatasets with Spark's Datasets](TypedDatasetVsSparkDataset.md)
- [Typed Encoders in Frameless](TypedEncoder.md)
- [Injection: Creating Custom Encoders](Injection.md)
- [Job\[A\]](Job.md)
- [Using Cats with RDDs](Cats.md)
- [Proof of Concept: TypedDataFrame](TypedDataFrame.md)
- [Proof of Concept: TypedDataFrame](TypedDataFrame.md)
4 changes: 3 additions & 1 deletion scripts/docs-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

set -eux

sbt tut
sbt copyReadme tut

gitbook="node_modules/gitbook-cli/bin/gitbook.js"

Expand All @@ -13,4 +13,6 @@ fi

$gitbook build docs/target/tut docs/book

mv docs/book/* .

exit 0
16 changes: 13 additions & 3 deletions scripts/docs-publish.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,24 @@

set -eux

# Check that the working directory is a git repository and the repository has no outstanding changes.
git diff-index --quiet HEAD

commit=$(git show -s --format=%h)

git checkout gh-pages

git checkout master .
git merge "$commit"

bash scripts/docs-build.sh

git add .

git commit -am "Update book"
git commit -am "Rebuild documentation ($commit)"

echo "git push"
echo "Verify that you didn't break anything:"
echo " $ python -m SimpleHTTPServer 8000"
echo " $ xdg-open http://localhost:8000/"
echo ""
echo "Then push to the gh-pages branch:"
echo " $ git push gh-pages"

0 comments on commit 58eef48

Please sign in to comment.