Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Apache Spark 2.1.1; resolves #43. #82

Merged
merged 2 commits into from
Oct 2, 2017
Merged

Conversation

ruebot
Copy link
Member

@ruebot ruebot commented Oct 2, 2017

Looks like we're good on Altiscale with this.

$ /opt/spark-beta/bin/alti-spark-shell --jars /mnt/ephemeral0/aut/aut-0.9.1-SNAPSHOT-fatjar.jar --conf spark.local.dir=/mnt/ephemeral0/aut/tmp --executor-cores 20 --executor-memory 10240M
/tmp/ruebot-hive-1.2.1-lib.zip: OK
ok - no need to re-generate the same /tmp/ruebot-hive-1.2.1-lib.zip, continuing
mkdir: `/user/ruebot/apps': File exists
put: `/user/ruebot/apps/hive-1.2.1-lib.zip': File exists
/opt/alti-spark-2.1.1 /mnt/ephemeral0/aut
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2017-10-02 18:07:46,110 WARN  org.apache.spark.SparkConf (Logging.scala:logWarning(66)) - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
2017-10-02 18:07:51,680 WARN  org.apache.spark.deploy.yarn.Client (Logging.scala:logWarning(66)) - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Hive history file=/tmp/ruebot/hive_job_log_eafdf2fa-1303-4837-8567-47b74e7fe0ae_711780380.txt
Spark context Web UI available at http://10.252.18.87:45100
Spark context available as 'sc' (master = yarn, app id = application_1506640654827_0337).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

import io.archivesunleashed.spark.matchbox._ 
import io.archivesunleashed.spark.rdd.RecordRDD._ 

val r = RecordLoader.loadArchives("/shared/au/wahr/WAHR_womens_march", sc) 
.keepValidPages()
.map(r => r.getUrl)
.take(10)


// Exiting paste mode, now interpreting.

import io.archivesunleashed.spark.matchbox._
import io.archivesunleashed.spark.rdd.RecordRDD._
r: Array[String] = Array(http://1988-unreal.tumblr.com/post/156248858195/bangmybox-miley-at-the-womensmarch, http://1000visions.tumblr.com/post/92161694912/1000-visions-of-global-change-alex-tuai, http://1045thecat.iheart.com/articles/trending-104650/madonna-defends-fiery-speech-at-womens-15494262/, http://1027jackfm.iheart.com/articles/inauguration-2017-501436/live-stream-womens-march-on-washington-15490495/, http://1043myfm.iheart.com/onair/lisa-foxx-32262/articles/15/501436/breaking-womens-march-organizers-say-crowd-15490683/, http://100percentfedup.com/gruesome-video-muslim-mob-tears-27-year-old-woman-apart-killing-false-accusation-burning-quran/, http://1454days.com/index.php/2017/01/21/d...

@lintool want to merge if you're good?

@codecov
Copy link

codecov bot commented Oct 2, 2017

Codecov Report

Merging #82 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #82   +/-   ##
=======================================
  Coverage   44.82%   44.82%           
=======================================
  Files          41       41           
  Lines         821      821           
  Branches      147      147           
=======================================
  Hits          368      368           
  Misses        408      408           
  Partials       45       45

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b929a32...5fe9ae7. Read the comment docs.

@ruebot
Copy link
Member Author

ruebot commented Oct 2, 2017

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

import io.archivesunleashed.spark.matchbox._
import io.archivesunleashed.spark.rdd.RecordRDD._

val r = RecordLoader.loadArchives("/shared/au/wahr/WAHR_womens_march", sc)
.keepValidPages()
.map(r => ExtractDomain(r.getUrl))
.countItems()
.take(10)

// Exiting paste mode, now interpreting.

import io.archivesunleashed.spark.matchbox._                                    
import io.archivesunleashed.spark.rdd.RecordRDD._
r: Array[(String, Int)] = Array((www.instagram.com,66171), (paper.li,17253), (linkis.com,11861), (www.youtube.com,10920), (www.periscope.tv,2592), (www.huffingtonpost.com,1106), (www.nytimes.com,975), (www.buzzfeed.com,940), (myaccount.nytimes.com,744), (www.cnn.com,701))

@ianmilligan1 ianmilligan1 merged commit fe5506e into master Oct 2, 2017
@ianmilligan1 ianmilligan1 deleted the issue-43 branch October 2, 2017 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants