Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One more take at streaming #82

Merged
merged 7 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,10 @@ includes both reading and writing support.

In simple [perf tests](https://github.com/metosin/jsonista/blob/master/test/jsonista/json_perf_test.clj), tagged JSON is much faster than EDN or Transit.

## Streaming

See [docs/streaming.md](docs/streaming.md).

## Performance

* All standard encoders and decoders are written in Java
Expand Down
72 changes: 72 additions & 0 deletions docs/streaming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Streaming JSON with Jsonista

## JSON Lines (aka JSONL)

Sometimes you want to store a stream of JSON objects in a file. This is common for things like logging.
This pattern is often called [JSON Lines](https://jsonlines.org/).

### Writing

```clj
(jsonista.core/write-values (io/output-stream "/tmp/foo.json") [{"foo" 1} {"bar" 1}])
```

For actual streaming, use a lazy sequence or an eduction instead of a
vector. For example:

```clj
(jsonista.core/write-values
(io/output-stream "/tmp/foo.json")
(eduction (map (fn [i] {:i i})) (range 100)))
```

Alternatively, you can use Jackson's imperative API directly:

```clj
(let [obj-mapper (jsonista.core/object-mapper {:close false})]
(with-open [out (io/output-stream "/tmp/foo.json")
wrt (io/writer out)]
(jsonista.core/write-value wrt {"foo" 1} obj-mapper)
(.write wrt "\n")
(jsonista.core/write-value wrt {"bar" 1} obj-mapper)))
```

### Reading

```clj
(into [] (jsonista.core/read-values (io/input-stream "/tmp/foo.json")))
```

## Top-level array

Instead of being separated on separate lines, sometimes you just want
a big JSON array, but don't want to keep all of the data in memory at
once.

### Writing

Use `jsonista.core/write-values-as-array`, which works just like `jsonista.core/write-values`.

### Reading

Use `jsonista.core/read-values`, it autodetects the format.

## An array inside an object

Sometimes you need to stream an array that sits inside an object. For this, it's best to drop down to the Jackson [JsonParser API](https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.18.0-rc1/com/fasterxml/jackson/core/JsonParser.html)

```clj
(let [input "{\"foo\": 1, \"bars\": [{\"bar\": 2},{\"bar\": 3}], \"close\": \"end\"}"
obj-mapper (jsonista.core/object-mapper)]
(with-open [rdr (java.io.StringReader. input)]
(let [p (.. obj-mapper getFactory (createParser rdr))]
;; position cursor to start of first entry in "bars"
(.nextToken p) ; START_OBJECT
(.nextToken p) ; FIELD_NAME "foo"
(.nextToken p) ; VALUE_NUMBER_INT 1
(.nextToken p) ; FIELD_NAME "bar"
(.nextToken p) ; START_ARRAY
(.nextToken p) ; START_OBJECT
;; grab all entries, ignore rest of input
(doall (iterator-seq (.readValuesAs p Object))))))
```
162 changes: 157 additions & 5 deletions src/clj/jsonista/core.clj
Original file line number Diff line number Diff line change
Expand Up @@ -63,13 +63,13 @@
(com.fasterxml.jackson.core JsonGenerator$Feature JsonFactory)
(com.fasterxml.jackson.annotation JsonInclude$Include)
(com.fasterxml.jackson.databind
JsonSerializer ObjectMapper
JsonSerializer ObjectMapper SequenceWriter
SerializationFeature DeserializationFeature Module)
(com.fasterxml.jackson.databind.module SimpleModule)
(java.io InputStream Writer File OutputStream DataOutput Reader)
(java.net URL)
(com.fasterxml.jackson.datatype.jsr310 JavaTimeModule)
(java.util List Map Date)
(java.util Iterator List Map Date)
(clojure.lang Keyword Ratio Symbol)))

(defn- ^Module clojure-module
Expand Down Expand Up @@ -131,15 +131,16 @@
| `:date-format` | string for custom date formatting. default: `yyyy-MM-dd'T'HH:mm:ss'Z'` |
| `:encode-key-fn` | true to coerce keyword keys to strings, false to leave them as keywords, or a function to provide custom coercion (default: true) |
| `:encoders` | a map of custom encoders where keys should be types and values should be encoder functions |
| `:close` | close OutputStreams & other closeable targets after write-value (default: true) |

Encoder functions take two parameters: the value to be encoded and a
JsonGenerator object. The function should call JsonGenerator methods to emit
the desired JSON.

| Decoding options | |
| ------------------- | -------------------------------------------------------------- |
| `:decode-key-fn` | true to coerce keys to keywords, false to leave them as strings, or a function to provide custom coercion (default: false) |
| `:bigdecimals` | true to decode doubles as BigDecimals (default: false) |"
| `:decode-key-fn` | true to coerce keys to keywords, false to leave them as strings, or a function to provide custom coercion (default: false) |
| `:bigdecimals` | true to decode doubles as BigDecimals (default: false) |"
([] (object-mapper {}))
([options]
(let [factory (:factory options)
Expand All @@ -158,7 +159,8 @@
(:strip-nils options) (.setSerializationInclusion JsonInclude$Include/NON_NULL)
(:strip-empties options) (.setSerializationInclusion JsonInclude$Include/NON_EMPTY)
(:do-not-fail-on-empty-beans options) (.disable SerializationFeature/FAIL_ON_EMPTY_BEANS)
(:escape-non-ascii options) (doto (-> .getFactory (.enable JsonGenerator$Feature/ESCAPE_NON_ASCII)))))]
(:escape-non-ascii options) (doto (-> .getFactory (.enable JsonGenerator$Feature/ESCAPE_NON_ASCII)))
(contains? options :close) (.configure JsonGenerator$Feature/AUTO_CLOSE_TARGET (boolean (:close options)))))]
(doseq [module (:modules options)]
(.registerModule mapper module))
(.disable mapper SerializationFeature/WRITE_DATES_AS_TIMESTAMPS)
Expand Down Expand Up @@ -212,6 +214,38 @@
(-read-value [this ^ObjectMapper mapper]
(.readValue mapper this ^Class Object)))

(defprotocol ReadValues
(-read-values [this mapper]))

(extend-protocol ReadValues

(Class/forName "[B")
(-read-values [this ^ObjectMapper mapper]
(.readValues (.readerFor mapper ^Class Object) ^bytes this))

nil
(-read-values [_ _])

File
(-read-values [this ^ObjectMapper mapper]
(.readValues (.readerFor mapper ^Class Object) this))

URL
(-read-values [this ^ObjectMapper mapper]
(.readValues (.readerFor mapper ^Class Object) this))

String
(-read-values [this ^ObjectMapper mapper]
(.readValues (.readerFor mapper ^Class Object) this))

Reader
(-read-values [this ^ObjectMapper mapper]
(.readValues (.readerFor mapper ^Class Object) this))

InputStream
(-read-values [this ^ObjectMapper mapper]
(.readValues (.readerFor mapper ^Class Object) this)))

(defprotocol WriteValue
(-write-value [this value mapper]))

Expand All @@ -232,6 +266,61 @@
(-write-value [this value ^ObjectMapper mapper]
(.writeValue mapper this value)))

(defprotocol WriteAll
(-write-all [this ^SequenceWriter writer]))

(extend-protocol WriteAll

(Class/forName "[Ljava.lang.Object;")
(-write-all [this ^SequenceWriter w]
(.writeAll w ^"[Ljava.lang.Object;" this))

Iterable
(-write-all [this ^SequenceWriter w]
(.writeAll w this)))

(defprotocol WriteValues
(-write-values [this values mapper])
(-write-values-as-array [this values mapper]))

(defmacro ^:private -write-values*
[method this value mapper]
`(doto ^SequenceWriter
(-write-all
~value
(-> ~mapper
(.writerFor Object)
(.withRootValueSeparator "\n")
(.without SerializationFeature/FLUSH_AFTER_WRITE_VALUE)
(. ~method ~this)))
(.close)))


(extend-protocol WriteValues
File
(-write-values [this value ^ObjectMapper mapper]
(-write-values* writeValues this value mapper))
(-write-values-as-array [this value ^ObjectMapper mapper]
(-write-values* writeValuesAsArray this value mapper))

OutputStream
(-write-values [this value ^ObjectMapper mapper]
(-write-values* writeValues this value mapper))
(-write-values-as-array [this value ^ObjectMapper mapper]
(-write-values* writeValuesAsArray this value mapper))

DataOutput
(-write-values [this value ^ObjectMapper mapper]
(-write-values* writeValues this value mapper))
(-write-values-as-array [this value ^ObjectMapper mapper]
(-write-values* writeValuesAsArray this value mapper))

Writer
(-write-values [this value ^ObjectMapper mapper]
(-write-values* writeValues this value mapper))
(-write-values-as-array [this value ^ObjectMapper mapper]
(-write-values* writeValuesAsArray this value mapper)))

;;
;; public api
;;
Expand Down Expand Up @@ -277,3 +366,66 @@
(-write-value to object default-object-mapper))
([to object ^ObjectMapper mapper]
(-write-value to object mapper)))

(defn- wrap-values
[^Iterator iterator]
(when iterator
(reify
Iterable
(iterator [this] iterator)
Iterator
(hasNext [this] (.hasNext iterator))
(next [this] (.next iterator))
(remove [this] (.remove iterator))
clojure.lang.IReduceInit
(reduce [_ f val]
(loop [ret val]
(if (.hasNext iterator)
(let [ret (f ret (.next iterator))]
(if (reduced? ret)
@ret
(recur ret)))
ret)))
clojure.lang.Sequential)))

(defn read-values
"Decodes a sequence of values from a JSON as an iterator
from anything that satisfies [[ReadValue]] protocol.
By default, File, URL, String, Reader and InputStream are supported.

The returned object is an Iterable, Iterator and IReduceInit.
It can be reduced on via [[reduce]] and turned into a lazy sequence
via [[iterator-seq]].

To configure, pass in an ObjectMapper created with [[object-mapper]],
see [[object-mapper]] docstring for the available options."
([object]
(wrap-values (-read-values object default-object-mapper)))
([object ^ObjectMapper mapper]
(wrap-values (-read-values object mapper))))

(defn write-values
"Encodes a sequence of values as JSON, separating values with a line return.
By default, `to` can be a File, OutputStream, DataOutput or Writer.

By default, `values` can be an array or an Iterable.

To configure, pass in an ObjectMapper created with [[object-mapper]],
see [[object-mapper]] docstring for the available options."
([to values]
(-write-values to values default-object-mapper))
([to values ^ObjectMapper mapper]
(-write-values to values mapper)))

(defn write-values-as-array
"Encodes a sequence of values as a JSON array.
By default, `to` can be a File, OutputStream, DataOutput or Writer.

By default, `values` can be an array or an Iterable.

To configure, pass in an ObjectMapper created with [[object-mapper]],
see [[object-mapper]] docstring for the available options."
([to values]
(-write-values-as-array to values default-object-mapper))
([to values ^ObjectMapper mapper]
(-write-values-as-array to values mapper)))
Loading
Loading