Skip to content

Commit

Permalink
Improve load/scan docs
Browse files Browse the repository at this point in the history
  • Loading branch information
prrao87 committed Feb 5, 2025
1 parent cf3dc11 commit 000a7c5
Show file tree
Hide file tree
Showing 2 changed files with 136 additions and 32 deletions.
79 changes: 47 additions & 32 deletions src/content/docs/cypher/query-clauses/load-from.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,20 @@ Some example usage for the `LOAD FROM` clause is shown below.

```cypher
LOAD FROM "user.csv" (header = true)
WHERE CAST(age, INT64) > 25
WHERE age > 25
RETURN COUNT(*);
----------------
| COUNT_STAR() |
----------------
| 3 |
----------------
```
This returns:
```
┌──────────────┐
│ COUNT_STAR() │
│ INT64 │
├──────────────┤
│ 3 │
└──────────────┘
```

### Skipping lines

To skip the first 2 lines of the CSV file, you can use the `SKIP` parameter as follows:

Expand All @@ -38,20 +44,28 @@ RETURN *;
```

### Create nodes from input file

You can pass the contents of `LOAD FROM` to a

```cypher
// Create a node table
// Scan file and use its contents to create nodes
LOAD FROM "user.csv" (header = true)
CREATE (:User {name: name, age: CAST(age, INT64)});
CREATE (:User {name: name, age: CAST(age AS INT64)});
MATCH (u:User) RETURN u;
----------------------------------------------------
| u |
----------------------------------------------------
| {_ID: 0:0, _LABEL: User, name: Adam, age: 30} |
----------------------------------------------------
| {_ID: 0:1, _LABEL: User, name: Karissa, age: 40} |
----------------------------------------------------
| {_ID: 0:2, _LABEL: User, name: Zhang, age: 50} |
----------------------------------------------------
// Return the nodes we just created
MATCH (u:User) RETURN u.name, u.age;
```
```
┌─────────┬───────┐
│ u.name │ u.age │
│ STRING │ INT64 │
├─────────┼───────┤
│ Adam │ 30 │
│ Karissa │ 40 │
│ Zhang │ 50 │
│ Noura │ 25 │
└─────────┴───────┘
```

### Reorder and subset columns
Expand All @@ -64,16 +78,16 @@ input file has more columns specified in a different order.
// Return age column before the name column
LOAD FROM "user.csv" (header = true)
RETURN age, name LIMIT 3;
--------------------
| age | name |
--------------------
| 30 | Adam |
--------------------
| 40 | Karissa |
--------------------
| 50 | Zhang |
--------------------
```
```
┌───────┬─────────┐
│ age │ name │
│ INT64 │ STRING │
├───────┼─────────┤
│ 30 │ Adam │
│ 40 │ Karissa │
50 │ Zhang
└───────┴─────────┘
```

### Bound variable names and data types
Expand All @@ -97,11 +111,12 @@ WHERE name =~ 'Adam*'
RETURN name, age;
```
```
--------------
| name | age |
--------------
| Adam | 30 |
--------------
┌────────┬───────┐
│ name │ age │
│ STRING │ INT64 │
├────────┼───────┤
│ Adam │ 30 │
└────────┴───────┘
```

:::caution[Note]
Expand Down
89 changes: 89 additions & 0 deletions src/content/docs/get-started/scan.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: "Scan data from various sources"
---

import { LinkCard } from '@astrojs/starlight/components';

Cypher supports the `LOAD FROM` clause to scan data from various sources. Scanning is the act of
reading data from a source, but _not inserting it_ into the database (inserting data into the database
involves "copying"; see the [Import data](/import) section for details).

Scanning data using `LOAD FROM` is very useful to inspect a subset of your data, understand its structure
and perform transformations like rearranging columns.

## General usage

Say you have a `user.csv` file that looks like this:

```csv
// user.csv
name,age
Adam,30
Karissa,40
Zhang,50
Noura,25
```

You can scan the file and print the first two rows to the console:
```cypher
LOAD FROM "ex_data/user.csv" (header = true)
RETURN COUNT(*);
```
This counts the number of rows in the file.
```
┌──────────────┐
│ COUNT_STAR() │
│ INT64 │
├──────────────┤
│ 4 │
└──────────────┘
```

You can also apply filter predicates via the `WHERE` clause, like this:
```cypher
LOAD FROM "ex_data/user.csv" (header = true)
WHERE age > 25
RETURN *;
```
The above query counts only the rows where the `age` column is greater than 25.
```
┌──────────────┐
│ COUNT_STAR() │
│ INT64 │
├──────────────┤
│ 3 │
└──────────────┘
```
Note that when scanning from from CSV files, all data is parsed as strings, but Kùzu will attempt
to auto-cast the data to the correct type when possible, for proper comparisons with numbers.

You can reorder the columns by simply returning them in the order you want. The `LIMIT` keyword
can be used to limit the number of rows returned.

```cypher
LOAD FROM "ex_data/user.csv" (header = true)
RETURN age, name LIMIT 2;
```
```
┌───────┬─────────┐
│ age │ name │
│ INT64 │ STRING │
├───────┼─────────┤
│ 30 │ Adam │
│ 40 │ Karissa │
│ 50 │ Zhang │
│ 25 │ Noura │
└───────┴─────────┘
```

## Explore more features

`LOAD FROM` is a general-purpose clause in Cypher for scanning data from various data sources,
including files and in-memory DataFrames. See the detailed documentation page below for more details
on how to use the `LOAD FROM` clause.

<LinkCard
title="LOAD (Scan)"
description="Explore the options for the LOAD clause in Kùzu"
href="/cypher/query-clauses/load-from"
/>

0 comments on commit 000a7c5

Please sign in to comment.