Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce how much code is generated #745

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open

Conversation

Marwes
Copy link

@Marwes Marwes commented Jan 12, 2021

An attempt to replicate the wins in #687 without using unsafe or losing any performance. To achieve this, all commonly duplicated methods have been extracted to less generic methods (only generic on R) which parse up to the visit method and returns an enum to indicate how to proceed. Specifically these enums hold the Error themselves instead of being wrapped in a `Result´ since that helps codegen slightly.

I unfortunately only have access to a laptop prone to throttling during benchmarking atm, so I don't have reliable measurements but this does seem to give no difference in performance or a couple of percent slowdown so I'd appreciate if someone could attempt to run this independently. (To get some more precise measurements in json-benchmark I hacked in criterion which can run with this command cargo run --release --features lib-serde,all-files,parse-struct,parse -dom,serde_json,criterion --no-default-features -- --bench)

cargo llvm-lines  --bin json-benchmark --no-default-features --features lib-serde,file-twitter,performance  | head -30

Before

  Lines          Copies       Function name
  -----          ------       -------------
  111368 (100%)  1640 (100%)  (TOTAL)
   13186 (11.8%)   43 (2.6%)  <serde_json::de::SeqAccess<R> as serde::de::SeqAccess>::next_element_seed
    9397 (8.4%)    15 (0.9%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
    5430 (4.9%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    4267 (3.8%)    15 (0.9%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_key_seed
    3939 (3.5%)    39 (2.4%)  <serde_json::ser::Compound<W,F> as serde::ser::SerializeMap>::serialize_value
    3262 (2.9%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    3151 (2.8%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
    2722 (2.4%)     7 (0.4%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
    2353 (2.1%)    38 (2.3%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_value_seed
    2162 (1.9%)   119 (7.3%)  core::ptr::drop_in_place
    1947 (1.7%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
    1888 (1.7%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Media>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    1864 (1.7%)    15 (0.9%)  <serde_json::de::MapKey<R> as serde::de::Deserializer>::deserialize_any
    1789 (1.6%)     7 (0.4%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_str
    1626 (1.5%)     6 (0.4%)  serde_json::de::Deserializer<R>::deserialize_number
    1557 (1.4%)    39 (2.4%)  serde::ser::SerializeMap::serialize_entry
    1422 (1.3%)     6 (0.4%)  serde::ser::Serializer::collect_seq
    1330 (1.2%)    25 (1.5%)  core::result::Result<T,E>::map
    1321 (1.2%)    10 (0.6%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_option
    1291 (1.2%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::SearchMetadata>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    1013 (0.9%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Media>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
     988 (0.9%)    28 (1.7%)  <serde::private::de::missing_field::MissingFieldDeserializer<E> as serde::de::Deserializer>::deserialize_any
     946 (0.8%)    38 (2.3%)  serde::private::de::missing_field
     910 (0.8%)     5 (0.3%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
     867 (0.8%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::StatusEntities>::deserialize::__Visitor as serde::de::Visitor>::visit_map
     854 (0.8%)     1 (0.1%)  json_benchmark::copy::twitter::_::<impl serde::ser::Serialize for json_benchmark::copy::twitter::User>::serialize
     817 (0.7%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::UserMention>::deserialize::__Visitor as serde::de::Visitor>::visit_map

After

  Lines         Copies       Function name
  -----         ------       -------------
  90777 (100%)  1617 (100%)  (TOTAL)
   5430 (6.0%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   5118 (5.6%)    43 (2.7%)  <serde_json::de::SeqAccess<R> as serde::de::SeqAccess>::next_element_seed
   4885 (5.4%)    15 (0.9%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
   3262 (3.6%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   3151 (3.5%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
   2353 (2.6%)    38 (2.4%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_value_seed
   2301 (2.5%)    39 (2.4%)  <serde_json::ser::Compound<W,F> as serde::ser::SerializeMap>::serialize_value
   2183 (2.4%)     7 (0.4%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
   2162 (2.4%)   119 (7.4%)  core::ptr::drop_in_place
   1947 (2.1%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
   1888 (2.1%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Media>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1785 (2.0%)    15 (0.9%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_key_seed
   1744 (1.9%)    15 (0.9%)  <serde_json::de::MapKey<R> as serde::de::Deserializer>::deserialize_any
   1557 (1.7%)    39 (2.4%)  serde::ser::SerializeMap::serialize_entry
   1422 (1.6%)     6 (0.4%)  serde::ser::Serializer::collect_seq
   1299 (1.4%)     7 (0.4%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_str
   1291 (1.4%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::SearchMetadata>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1013 (1.1%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Media>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
    988 (1.1%)    28 (1.7%)  <serde::private::de::missing_field::MissingFieldDeserializer<E> as serde::de::Deserializer>::deserialize_any
    946 (1.0%)    38 (2.4%)  serde::private::de::missing_field
    910 (1.0%)     5 (0.3%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
    894 (1.0%)     6 (0.4%)  serde_json::de::Deserializer<R>::deserialize_number
    867 (1.0%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::StatusEntities>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    854 (0.9%)     1 (0.1%)  json_benchmark::copy::twitter::_::<impl serde::ser::Serialize for json_benchmark::copy::twitter::User>::serialize
    817 (0.9%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::UserMention>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    769 (0.8%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Url>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    741 (0.8%)    10 (0.6%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_option

@Marwes
Copy link
Author

Marwes commented Apr 7, 2022

@dtolnay Rebased and re-ran my criterion hack for "json-benchmark", there is some variance from run to run but it seems like this may actually improve performance (at least improvements seem more common and larger than any regressions in the variance). Any chance this can get merged?

Gnuplot not found, using plotters backend
parse-dom/data/canada.json
                        time:   [7.1562 ms 7.1770 ms 7.1991 ms]
                        thrpt:  [298.20 MiB/s 299.12 MiB/s 299.99 MiB/s]
                 change:
                        time:   [-0.0684% +0.3025% +0.7077%] (p = 0.11 > 0.05)
                        thrpt:  [-0.7027% -0.3016% +0.0685%]
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

parse-struct/data/canada.json
                        time:   [3.3351 ms 3.3442 ms 3.3533 ms]
                        thrpt:  [640.19 MiB/s 641.94 MiB/s 643.69 MiB/s]
                 change:
                        time:   [-15.564% -15.263% -14.974%] (p = 0.00 < 0.05)
                        thrpt:  [+17.611% +18.012% +18.432%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking parse-dom/data/citm_catalog.json: Collecting 100 samples in estimated 5.3243 s (1300 iteratio                                                                                                          parse-dom/data/citm_catalog.json
                        time:   [3.9768 ms 3.9968 ms 4.0209 ms]
                        thrpt:  [409.65 MiB/s 412.12 MiB/s 414.20 MiB/s]
                 change:
                        time:   [+4.1500% +4.8664% +5.7135%] (p = 0.00 < 0.05)
                        thrpt:  [-5.4047% -4.6406% -3.9847%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking parse-struct/data/citm_catalog.json: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50.
Benchmarking parse-struct/data/citm_catalog.json: Collecting 100 samples in estimated 8.2943 s (5050 itera                                                                                                          parse-struct/data/citm_catalog.json
                        time:   [1.5731 ms 1.5777 ms 1.5826 ms]
                        thrpt:  [1.0164 GiB/s 1.0196 GiB/s 1.0226 GiB/s]
                 change:
                        time:   [-2.3439% -1.4440% -0.8393%] (p = 0.00 < 0.05)
                        thrpt:  [+0.8464% +1.4651% +2.4002%]
                        Change within noise threshold.

parse-dom/data/twitter.json
                        time:   [2.0885 ms 2.0933 ms 2.0986 ms]
                        thrpt:  [286.97 MiB/s 287.70 MiB/s 288.37 MiB/s]
                 change:
                        time:   [+0.9558% +1.2591% +1.5778%] (p = 0.00 < 0.05)
                        thrpt:  [-1.5533% -1.2435% -0.9467%]
                        Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) low mild
  6 (6.00%) high mild
  8 (8.00%) high severe

parse-struct/data/twitter.json
                        time:   [846.45 us 848.43 us 850.53 us]
                        thrpt:  [708.10 MiB/s 709.85 MiB/s 711.52 MiB/s]
                 change:
                        time:   [-10.075% -8.8594% -7.6386%] (p = 0.00 < 0.05)
                        thrpt:  [+8.2703% +9.7205% +11.204%]
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  2 (2.00%) low severe
  5 (5.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

@Noah-Kennedy
Copy link

@dtolnay are you able to take a look at this?

@indiv0
Copy link

indiv0 commented May 15, 2022

👍 this would be great to have, especially in crates that depend on crates with with lots and lots of Deserialize types.

@Marwes
Copy link
Author

Marwes commented Jul 4, 2022

@dtolnay Are you able to take a look at this? Is it something that might be merged at some point?

@Elabajaba
Copy link

This reduces the overall compile time of the gltf crate by about ~50-55% in my testing. (and the compilation of the extremely heavy gltf-json part where all the serde stuff lives by ~2/3)

Timings with this PR

Timings with the current release version of serde_json

@Elabajaba
Copy link

I did 5 runs of json benchmark for both the current master branch and this PR on my home Linux server with everything I have running on it disabled (0.00 average load, 3950x CPU, stayed sub 60C the entire time so no thermal throttling, rust 1.63, sccache disabled, rm -rf ./target/release/ && RUSTFLAGS='-C codegen-units=1' cargo run --release --no-default-features --features parse-struct,lib-serde,all-files --timings instead of cargo clean to preserve the cargo-timings).

For this PR, canada.json was ~8% faster, citm_catalog.json was ~6% slower, and twitter.json was basically the same. Build times were slightly faster as well, though it's dominated by the jemalloc-sys build script. The final json-benchmark bin compile time was reduced by about 1s (from ~8s to ~7s).

Averages

Current 44d9c53

data/canada.json:       538 MB/s
data/citm_catalog.json: 1036 MB/s
data/twitter.json:      754 MB/s
build times:            24.264s

This PR 60e4ac2

data/canada.json:       580 MB/s
data/citm_catalog.json: 974 MB/s
data/twitter.json:      748 MB/s
build times:            23.11s

Differences (PR / Current)

data/canada.json:       107.8067%
data/citm_catalog.json: 94.0154%
data/twitter.json:      99.2042%
build times:            95.2440%

@Walther
Copy link

Walther commented Apr 12, 2023

Kindest little bump - what is the status of this PR?
Is there anything where help would be needed?

@Marwes
Copy link
Author

Marwes commented Apr 18, 2023

The PR has some conflicts now, but nothing that would be difficult to fix. This is still something I'd like to see merged but in the end it is up to the time and interest of the owner(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants