You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: CHANGELOG.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -94,7 +94,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
94
94
- Simplified flow statuses within Flow System (no more Queued or Scheduled status)
95
95
- Extended flow start conditions with more debug information for UI needs
96
96
- Simplified flow cancellation API:
97
-
- Cancelling in Waiting/Running states is accepted, and aborts the flow and it's associated tasks
97
+
- Cancelling in Waiting/Running states is accepted, and aborts the flow, and it's associated tasks
98
98
- Cancelling in Waiting/Running states also automatically pauses flow configuration
99
99
100
100
## [0.162.1] - 2024-02-28
@@ -142,7 +142,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
142
142
143
143
## [0.157.0] - 2024-02-12
144
144
### Added
145
-
- Complete support for `arm64` architecture (including M-series Apple silicon)
145
+
- Complete support for `arm64` architecture (including M-series Apple Silicon)
146
146
-`kamu-cli` now depends on multi-platform Datafusion, Spark, Flink, and Jupyter images allowing you to run data processing at native CPU speeds
147
147
### Changed
148
148
- Spark engine is upgraded to latest version of Spark 3.5
@@ -152,7 +152,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
152
152
153
153
## [0.156.3] - 2024-02-09
154
154
### Added
155
-
- Native support for `arm64` architecture (including M-series Apple silicon) in `kamu-cli` and `kamu-engine-datafusion`
155
+
- Native support for `arm64` architecture (including M-series Apple Silicon) in `kamu-cli` and `kamu-engine-datafusion`
156
156
- Note: Flink and Spark engine images still don't provide `arm64` architecture and continue to require QEMU
157
157
### Changed
158
158
- Flow system scheduling rules improved to respect system-wide throttling setting and take last successful run into account when rescheduling a flow or after a restart
"Note that if you just type `df` in a cell - you will get an error. That's because by default this kernel executes operations in the remore PySpark environment. To access `df` you need to use `%%local` cell command which will execute code in this local Python kernel.\n",
527
+
"Note that if you just type `df` in a cell - you will get an error. That's because by default this kernel executes operations in the remote PySpark environment. To access `df` you need to use `%%local` cell command which will execute code in this local Python kernel.\n",
Copy file name to clipboardexpand all lines: images/demo/user-home/01 - Kamu Basics (COVID-19 example)/02 - Collaboration.ipynb
+1-1
Original file line number
Diff line number
Diff line change
@@ -59,7 +59,7 @@
59
59
"- Decentralized storage like IPFS, Arweave (see next tutorial on \"Web3 Data\")\n",
60
60
"- Or even some old FTP server (see [full list](https://docs.kamu.dev/node/deploy/storage/))\n",
61
61
"\n",
62
-
"As a reporitory for this demo we will use [**Kamu Node**](https://docs.kamu.dev/node/) - you can think of it as a small server on top of some storage (AWS S3 or Minio in this case) that speaks ORF protocol and provides a bunch of cool additional features, like highly optimized uploads/downloads, dataset search, and even executing remote SQL queries.\n",
62
+
"As a repository for this demo we will use [**Kamu Node**](https://docs.kamu.dev/node/) - you can think of it as a small server on top of some storage (AWS S3 or Minio in this case) that speaks ORF protocol and provides a bunch of cool additional features, like highly optimized uploads/downloads, dataset search, and even executing remote SQL queries.\n",
Copy file name to clipboardexpand all lines: resources/cli-reference.md
+5-5
Original file line number
Diff line number
Diff line change
@@ -653,7 +653,7 @@ Push local data into a repository
653
653
654
654
Use this command to share your new dataset or new data with others. All changes performed by this command are atomic and non-destructive. This command will analyze the state of the dataset at the repository and will only upload data and metadata that wasn't previously seen.
655
655
656
-
Similarly to git, if someone else modified the dataset concurrently with you - your push will be rejected and you will have to resolve the conflict.
656
+
Similarly to git, if someone else modified the dataset concurrently with you - your push will be rejected, and you will have to resolve the conflict.
657
657
658
658
**Examples:**
659
659
@@ -695,7 +695,7 @@ Use this command to rename a dataset in your local workspace. Renaming is safe i
695
695
696
696
**Examples:**
697
697
698
-
Renaming is often useful when you pull a remote dataset by URL and it gets auto-assigned not the most convenient name:
698
+
Renaming is often useful when you pull a remote dataset by URL, and it gets auto-assigned not the most convenient name:
699
699
700
700
kamu pull ipfs://bafy...a0da
701
701
kamu rename bafy...a0da my.dataset
@@ -827,7 +827,7 @@ Manage set of remote aliases associated with datasets
827
827
*`add` — Adds a remote alias to a dataset
828
828
*`delete` — Deletes a remote alias associated with a dataset
829
829
830
-
When you pull and push datasets from repositories kamu uses aliases to let you avoid specifying the full remote referente each time. Aliases are usually created the first time you do a push or pull and saved for later. If you have an unusual setup (e.g. pushing to multiple repositories) you can use this command to manage the aliases.
830
+
When you pull and push datasets from repositories kamu uses aliases to let you avoid specifying the full remote reference each time. Aliases are usually created the first time you do a push or pull and saved for later. If you have an unusual setup (e.g. pushing to multiple repositories) you can use this command to manage the aliases.
831
831
832
832
**Examples:**
833
833
@@ -947,7 +947,7 @@ Executes an SQL query or drops you into an SQL shell
947
947
948
948
**Options:**
949
949
950
-
*`--url <URL>` — URL of a running JDBC server (e.g jdbc:hive2://example.com:10000)
950
+
*`--url <URL>` — URL of a running JDBC server (e.g. jdbc:hive2://example.com:10000)
951
951
*`-c`, `--command <CMD>` — SQL command to run
952
952
*`--script <FILE>` — SQL script file to execute
953
953
*`--engine <ENG>` — Engine type to use for this SQL session
@@ -1201,7 +1201,7 @@ There are two types of compactions: soft and hard.
1201
1201
1202
1202
Soft compactions produce new files while leaving the old blocks intact. This allows for faster queries, while still preserving the accurate history of how dataset evolved over time.
1203
1203
1204
-
Hard compactions rewrite the history of the dataset as if data was originally written in big batches. They allow to shrink the history of a dataset to just a few blocks, reclaim the space used by old data files, but at the expense of history loss. Hard compactions will rewrite the metadata chain, changing block hashes. Therefore they will **break all downstream datasets** that depend on them.
1204
+
Hard compactions rewrite the history of the dataset as if data was originally written in big batches. They allow to shrink the history of a dataset to just a few blocks, reclaim the space used by old data files, but at the expense of history loss. Hard compactions will rewrite the metadata chain, changing block hashes. Therefore, they will **break all downstream datasets** that depend on them.
Copy file name to clipboardexpand all lines: src/app/cli/src/cli_parser.rs
+5-5
Original file line number
Diff line number
Diff line change
@@ -805,7 +805,7 @@ pub fn cli() -> Command {
805
805
r#"
806
806
Use this command to share your new dataset or new data with others. All changes performed by this command are atomic and non-destructive. This command will analyze the state of the dataset at the repository and will only upload data and metadata that wasn't previously seen.
807
807
808
-
Similarly to git, if someone else modified the dataset concurrently with you - your push will be rejected and you will have to resolve the conflict.
808
+
Similarly to git, if someone else modified the dataset concurrently with you - your push will be rejected, and you will have to resolve the conflict.
809
809
810
810
**Examples:**
811
811
@@ -850,7 +850,7 @@ pub fn cli() -> Command {
850
850
851
851
**Examples:**
852
852
853
-
Renaming is often useful when you pull a remote dataset by URL and it gets auto-assigned not the most convenient name:
853
+
Renaming is often useful when you pull a remote dataset by URL, and it gets auto-assigned not the most convenient name:
854
854
855
855
kamu pull ipfs://bafy...a0da
856
856
kamu rename bafy...a0da my.dataset
@@ -1012,7 +1012,7 @@ pub fn cli() -> Command {
1012
1012
])
1013
1013
.after_help(indoc::indoc!(
1014
1014
r#"
1015
-
When you pull and push datasets from repositories kamu uses aliases to let you avoid specifying the full remote referente each time. Aliases are usually created the first time you do a push or pull and saved for later. If you have an unusual setup (e.g. pushing to multiple repositories) you can use this command to manage the aliases.
1015
+
When you pull and push datasets from repositories kamu uses aliases to let you avoid specifying the full remote reference each time. Aliases are usually created the first time you do a push or pull and saved for later. If you have an unusual setup (e.g. pushing to multiple repositories) you can use this command to manage the aliases.
1016
1016
1017
1017
**Examples:**
1018
1018
@@ -1108,7 +1108,7 @@ pub fn cli() -> Command {
1108
1108
Arg::new("url")
1109
1109
.long("url")
1110
1110
.value_name("URL")
1111
-
.help("URL of a running JDBC server (e.g jdbc:hive2://example.com:10000)"),
1111
+
.help("URL of a running JDBC server (e.g. jdbc:hive2://example.com:10000)"),
1112
1112
Arg::new("command")
1113
1113
.short('c')
1114
1114
.long("command")
@@ -1291,7 +1291,7 @@ pub fn cli() -> Command {
1291
1291
1292
1292
Soft compactions produce new files while leaving the old blocks intact. This allows for faster queries, while still preserving the accurate history of how dataset evolved over time.
1293
1293
1294
-
Hard compactions rewrite the history of the dataset as if data was originally written in big batches. They allow to shrink the history of a dataset to just a few blocks, reclaim the space used by old data files, but at the expense of history loss. Hard compactions will rewrite the metadata chain, changing block hashes. Therefore they will **break all downstream datasets** that depend on them.
1294
+
Hard compactions rewrite the history of the dataset as if data was originally written in big batches. They allow to shrink the history of a dataset to just a few blocks, reclaim the space used by old data files, but at the expense of history loss. Hard compactions will rewrite the metadata chain, changing block hashes. Therefore, they will **break all downstream datasets** that depend on them.
0 commit comments