-
Notifications
You must be signed in to change notification settings - Fork 0
/
Spark.txt
99 lines (78 loc) · 4.76 KB
/
Spark.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
Spark
官网
http://spark.incubator.apache.org/
1 jupyter中运行spark-submit命令
windows 2019+conda环境,代码单元格运行以下命令
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!D:\ProgramData\spark-3.2.1-bin-hadoop3.2-scala2.13\bin\spark-submit --conf "spark.pyspark.python=D:\programdata\miniconda3\envs\jupyterpy38\python.exe" D:\ProgramData\spark-3.2.1-bin-hadoop3.2-scala2.13\examples\src\main\python\pi.py 10
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2 在spark上运行Hive
新的推荐用法是 builder.enableHiveSupport()
--------------------------------------------------------------------------------------------------
spark = SparkSession\
.builder\
.master('local[*]')\ # 主机地址
.config("hive.metastore.uris","thrift://127.0.0.1:9083")\ # 元地址,可选
.appName('AppName')\
.enableHiveSupport()\ #启用Hive支持
.getOrCreate()
result = spark.sql("SELECT * FROM xxxxx")
--------------------------------------------------------------------------------------------------
3 WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/D:/ProgramData/spark-3.2.1-bin-hadoop3.2-scala2.13/jars/spark-unsafe_2.13-3.2.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
不影响运行。经查询,一般认为与java版本有关,java11 出现这个,java8 正常。
4 初始化环境
已经逐渐被2 的方式代替
-------------------------------------------------------------------------------------------------------
import pyspark
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("App")
sc = SparkContext()
from pyspark.sql import SQLContext
sqlcontext = SQLContext(sc)
--------------------------------------------------------------------------------------------------------
5 github例子
含java/scala/python/R的代码demo
https://github.com/apache/spark/tree/v3.2.1/examples
在下载的安装包中也可以找到
6 若干文档库
总目录 http://spark.incubator.apache.org/docs/latest/
SQL与DF http://spark.incubator.apache.org/docs/latest/sql-programming-guide.html
RDD编程 http://spark.incubator.apache.org/docs/latest/rdd-programming-guide.html
流 http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html
结构流 http://spark.incubator.apache.org/docs/latest/structured-streaming-programming-guide.html
ml机器学习 http://spark.incubator.apache.org/docs/latest/ml-guide.html
python http://spark.incubator.apache.org/docs/latest/api/python/getting_started/index.html
7 问题:Py4JJavaError: An error occurred while calling o675.showString.
解决:但是重新运行笔记本后无异常
8 分析IIS 等日志
https://blog.csdn.net/weixin_34408717/article/details/93452463
9 java应用中提交spark
https://www.cnblogs.com/javalinux/p/15094833.html
10 python 应用中提交pyspark 代码
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
import subprocess
if __name__=='__main__':
cmd=rf'''"D:\ProgramData\spark-3.2.1-bin-hadoop3.2-scala2.13\bin\spark-submit2.cmd" --conf "spark.pyspark.python=D:\programdata\miniconda3\envs\reportpy38\python.exe" "D:\ProgramData\spark-3.2.1-bin-hadoop3.2-scala2.13\examples\src\main\python\pi.py" 10
'''
print(cmd)
result = subprocess.check_output(cmd)
print(result)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11 Spark内存资源分配
https://blog.csdn.net/qq_28658621/article/details/122579717
12 Spark JVM参数优化设置及Sparkstreaming优化和反压机制
http://t.zoukankan.com/jiashengmei-p-13746372.html
13 Spark Streaming动态资源分配
https://blog.csdn.net/wangpei1949/article/details/90734304
Spark Streaming资源动态分配和动态控制消费速率
http://t.zoukankan.com/sparkbigdata-p-5544609.html
14 Spark入Hbase的四种方式效率对比
https://www.cnblogs.com/runnerjack/p/10480468.html
15 .NET for Apache Spark
https://docs.microsoft.com/zh-cn/dotnet/spark/what-is-apache-spark-dotnet
https://github.com/dotnet/spark
16 本机启动
启动master
<spark——home>\bin\spark-class org.apache.spark.deploy.master.Master
启动worker
<spark——home>\bin\spark-class org.apache.spark.deploy.worker.Worker spark://{masterUI上显示的 IP}:7077 --cores 2 --memory 2G