Spark SQL Client implements a native ADO.NET connection to a spark thrift server. Allowing dotnet applications to make spark SQL queries without requiring 3rd party ODBC drivers.
SparkConnection
implements DbConnection and can be used in the same way as any other database connection in ADO.NET
await using var conn = new SparkConnection("Data Source=https://mydomain.net/path/to/thrift/server; User ID=myusername; Password=pa55w0rd");
conn.Open();
using DbDataReader reader = await conn.ExecuteReaderAsync("SELECT ID, Description FROM Entities");
reader.Read();
var id = reader.GetValue("ID");
var description = reader.GetValue("Description");
SparkConnection
supports the dapper library, allowing a simpler interface over the raw ADO.NET
await using var conn = new SparkConnection("Data Source=https://mydomain.net/path/to/thrift/server; User ID=myusername; Password=pa55w0rd");
IEnumerable<DapperTests> entities = await conn.QueryAsync<DapperTests>("SELECT ID, Description FROM Entities");
A SparkConnection
requires a connection. The current string supports the following properties
Data Source
(Required) - The full URL for the spark serverUser ID
(Optional) - The username to use for authenticationPassword
(Optional) - The password to use for authentication
If your spark cluster is running within databricks the connection string can be built with the following steps
To determine the Data Source
- launch the databricks workspace
- click on the Clusters icon
- click a cluster
- expand "Advanced Options"
- click the JDBC/ODBC tab
- set the connection string
Data Source
ashttps://<server-hostname>/<http-path>
(It should look something like https://adb-1556877622322125.5.azuredatabricks.net/sql/protocolv1/o/1556877622322125/0207-135143-scoot967)
To determine the User ID
and Password
- launch the databricks workspace
- click on your username and click on user settings
- click "Generate new token"
- fill in details and click "Generate"
- set connection string
Password
to the token value - set connection string
User ID
totoken
Your final connection string should like similiar to
Data Source=https://adb-1556877622322125.5.azuredatabricks.net/sql/protocolv1/o/1556877622322125/0207-135143-scoot967; User ID=token; Password=dapi62e4563e092a3a573e034339fbab013d
An alternative to specifying a Username and Password is to specify the AccessToken
property on SparkConncection
await using var conn = new SparkConnection("Data Source=https://mydomain.net/path/to/thrift/server; User ID=myusername; Password=pa55w0rd");
conn.AccessToken = "<JWT token>"
This will be used a Bearer token on authentication
Connection to spark thrift servers can be made via Simba spark jdbc odbc drivers. However installing ODBC drivers to use within dotnet can be difficult if you are not in control of the underlaying servers. The Spark SQL Client allows connection with requiring additional ODBC drivers.