In this lab we will experiment we will explore how to use the JSONB objects within CockroachDB.
- DBeaver
- ./cockroach sql --insecure
- psql
Student, Database, pgurl, adminurl
The lab cluster is configured in Google Clould in a singe zone:
- us-east4-b
https://github.com/glennfawcett/roachcrib
Create a table by importing a CSV file from a cloud storage bucket.
IMPORT TABLE jblob (
id INT PRIMARY KEY,
myblob JSONB
) CSV DATA ('https://raw.githubusercontent.com/cockroachlabs/workshop_labs/master/JSON-optimization/data/raw_test_blob.tsv')
WITH
delimiter = e'\t';
- How many json objects were imported?
- How useful is this within a database?
Create a table with FLATTENED JSONB objects by importing a CSV file from a cloud storage bucket. This CSV file was created by a python3 script to read the JSON file and extract all values into rows.
IMPORT TABLE jflat (
id INT PRIMARY KEY,
myflat JSONB
) CSV DATA ('https://raw.githubusercontent.com/cockroachlabs/workshop_labs/master/JSON-optimization/data/raw_test_flat.tsv')
WITH
delimiter = e'\t';
- How many json objects were imported?
- Create a query that counts the number with the same
c_base_ap_id
.
- Create a query that sums the
r_price
values byc_base_ap_id
showing the TOP 10 sums ofr_price
.
Import more data into the jflat
table:
IMPORT INTO jflat (id, myflat)
CSV DATA (
'https://raw.githubusercontent.com/cockroachlabs/workshop_labs/master/JSON-optimization/data/raw_test_flat2.tsv'
)
WITH
delimiter = e'\t';
- How many json objects are in the table now?
Run the following query:
SELECT id FROM jflat WHERE myflat::JSONB @> '{"c_sattr19": "momjzdfu"}';
- How much can you improve the performance of the above query? Show the query, DDL and amount of improvement.
Run the following query:
select myflat::JSONB->>'c_sattr19' as attr19,
myflat::JSONB->>'r_seat' as seat,
count(*),
sum(CAST(myflat::JSONB->>'r_price' as INT))
from jflat
where myflat::JSONB->>'c_sattr19' like '%mom%'
group by 1,2;
- What is the response time of the above query?
- Does the above query use any indexes?
Tune the above query. You can add Indexes and/or columns to the table. Feel free to create a new table as well and poplulate from the original table.
- How much can you improve the performance of the above query? Show the query, DDL and amount of improvement.
Consider the following query:
SELECT id from jflat where myflat::JSONB @> '{"c_sattr19": "momjzdfu"}';
Is it faster to use an INVERTED INDEX
or create a computed column with an index?
Using the companies.json file, create a table in CockroachDB to include the JSON object. Modify the table to create the most performant queries possible to calculate the following:
-
Top 10 highest aquisition prices by Year and Aquiring Company in USD. Include the aquiring company name, aquisition year, and SUM of the total amount spent that year.
-
Explore multiple methods to improve performance
-
- Raw JSON
-
- JSON with Computed Columns
-
- JSON with Inverted Indexes