Skip to content

Developer Env

Harish Butani edited this page Jan 13, 2022 · 3 revisions

Setting up a Developer Env

  • We have only used IntelliJ, other IDEs should also work, as this is a standard SBT project.
  • Import the project into IntelliJ. You should be able to build the project without any changes.
  • To run tests you need to do the following additional steps.
    • Grant your demo user additional privileges. For example
-- to run unit tests, add the additional grants
grant drop any table to tpcds;
grant create any function to tpcds;
grant create any procedure to tpcds;
  • in SQL/*Plus connected as the demo user, run the src/test/resources/ddls.sql script
  • Set up src/test/resources/spark-ora-test.properties locally
    • This contains the instance connection information, for example:
// oci is an arbitrary prefix that you reference when you run tests

oci.spark.sql.catalog.oracle.authMethod=ORACLE_WALLET
oci.spark.sql.catalog.oracle.url=jdbc:oracle:thin:@<tnsnames_addressname>
oci.spark.sql.catalog.oracle.user=<your_demo_user>
oci.spark.sql.catalog.oracle.password=<demo_user_password
oci.spark.sql.catalog.oracle.net.tns_admin=<wallet_location>
  • Create the sparktest oracle user
create user sparktest identified by SparkOnOracle1;
grant CONNECT, RESOURCE, unlimited tablespace to sparktest;
  • Run the ReadPathTestSetup main class; but before running change the oci.spark.sql.catalog.oracle.user=sparktest. Run with -Dspark.oracle.test.db_instance=oci option.
  • Don't forget to change back oci.spark.sql.catalog.oracle.user=<your_demo_user> after you have run the test
  • Create the write test tables, connecting as sparktest
create table unit_test_write(
  c_char_1 char(1),
  c_char_5 char(5),
  c_varchar2_10 varchar2(10),
  c_varchar2_40 varchar2(40),
  c_nchar_1 nchar(1),
  c_nchar_5 nchar(5),
  c_nvarchar2_10 nvarchar2(10),
  c_nvarchar2_40 nvarchar2(40),
  c_byte  number(2),
  c_short number(4),
  c_int number(9),
  c_long number(18),
  c_number number(25),
  c_decimal_scale_5 number(25,5),
  c_decimal_scale_8 number(25,8),
--  c_float binary_float,
--  c_double binary_double,
--  c_binary long,
  c_date date,
  c_timestamp timestamp
--  c_timestamp_with_tz timestamp with tz,
--  c_timestamp_with_local_tz timestamp with local tz
);
grant all privileges on unit_test_write  to public;

create table unit_test_write_partitioned(
  c_varchar2_40 varchar2(40),
  c_int number(9),
        state            VARCHAR2(2),
 channel          VARCHAR2(1)
)
PARTITION BY LIST (state, channel)
(
    PARTITION yearly_west_direct VALUES (('OR','D'),('UT','D'),('WA','D')),
    PARTITION yearly_west_indirect VALUES (('OR','I'),('UT','I'),('WA','I')),
    PARTITION yearly_south_direct VALUES (('AZ','D'),('TX','D'),('GA','D')),
    PARTITION yearly_south_indirect VALUES (('AZ','I'),('TX','I'),('GA','I')),
    PARTITION yearly_east_direct VALUES (('PA','D'), ('NC','D'), ('MA','D')),
    PARTITION yearly_east_indirect VALUES (('PA','I'), ('NC','I'), ('MA','I')),
    PARTITION yearly_north_direct VALUES (('MN','D'),('WI','D'),('MI','D')),
    PARTITION yearly_north_indirect VALUES (('MN','I'),('WI','I'),('MI','I')),
    PARTITION yearly_ny_direct VALUES ('NY','D'),
    PARTITION yearly_ny_indirect VALUES ('NY','I'),
    PARTITION yearly_ca_direct VALUES ('CA','D'),
    PARTITION yearly_ca_indirect VALUES ('CA','I'),
    PARTITION rest VALUES (DEFAULT)
    )
    ;
grant all privileges on unit_test_write_partitioned  to public;
  • Now you should be able to run the test suite. For any test, set the working_dir to <project-root>/sql and set VM options -Dspark.oracle.test.db_instance=oci.
  • Currently, the UDTTypesTest fail on free tier ADW.

Sub-Projects

  • common: utilities, helper, logging code. Examples: timeIt, OraSparkUtils...
  • orastuff: code related to Connection Management, Dataset Splits, ResultSet handling,...
  • sql: code related to SQL translation. Major components are:
    • Oracle Catalog
    • Operator Translation
    • Expression Translation
    • Datatype Handling
    • Task co-location and parallelism
    • Dataframe API extensions
  • mllib: any extensions related to MLLib
  • packaging projects:
    • spark-extended: creates the spark-oracle-<version>.zip, which can be dropped into almost any spark deployment.

Build and Package

-- to compile, test
sbt clean compile test

-- to build spark-extended package
sbt universal:packageBin

-- to build spark-extended package with oracle jars included
sbt -DaddOraJars=true universal:packageBin

Configure

# Oracle Catalog

# Enable Spark Oracle extensions
spark.sql.extensions=org.apache.spark.sql.oracle.SparkSessionExtensions
spark.kryo.registrator=org.apache.spark.sql.connector.catalog.oracle.OraKryoRegistrator

# Enable the Oracle Catalog integration
spark.sql.catalog.oracle=org.apache.spark.sql.connector.catalog.oracle.OracleCatalog

# oracle sql logging and jdbc fetchsize
spark.sql.catalog.oracle.log_and_time_sql.enabled=true
spark.sql.catalog.oracle.log_and_time_sql.log_level=info
spark.sql.catalog.oracle.fetchSize=5000
spark.sql.oracle.enable.querysplitting=false
spark.sql.oracle.querysplit.target=1Mb

# Configure JDBC connection information

# Example for a non wallet instance
#spark.sql.catalog.oracle.url=jdbc:oracle:thin:@<your_instance_details>
#spark.sql.catalog.oracle.user=tpcds
#spark.sql.catalog.oracle.password=tpcds

# Example config for an ADW instance
#spark.sql.catalog.oracle.authMethod=ORACLE_WALLET
#spark.sql.catalog.oracle.url=jdbc:oracle:thin:@<tns-address>
#spark.sql.catalog.oracle.user=<your_demo_user>
#spark.sql.catalog.oracle.password=<your_passwd>
#spark.sql.catalog.oracle.net.tns_admin=<wallet_location>