Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of indent-region. #47

Merged
merged 1 commit into from
Jul 5, 2023
Merged

Improve performance of indent-region. #47

merged 1 commit into from
Jul 5, 2023

Conversation

taku0
Copy link
Contributor

@taku0 taku0 commented Jun 10, 2023

Add jsonian-indent-region and rewrite jsonian-indent-line to speed up indent-region.

When indenting whole large-json-file.json, jsonian-indent-region takes only 6 seconds while indent-region-line-by-line, the default backend for indent-region, takes 5 minutes.

Background:
I'm author of JSON Par, minor mode for structural editing of JSON, which is based on json-mode. As you know, json-mode is abandoned and I couldn't contact the author, I would like to migrate from json-mode to your jsonian.el. JSON Par overrides indent-region-function and other basic functions to improve performance but it should be in the base mode.

@iwahbe iwahbe self-assigned this Jun 13, 2023
@iwahbe iwahbe self-requested a review June 15, 2023 21:14
@iwahbe iwahbe removed their assignment Jun 15, 2023
Copy link
Owner

@iwahbe iwahbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @taku0. Thanks for the PR. 5 minutes is untenable and 6 seconds seems pretty good. This is definitely an area for improvement for jsonian.

I'm going to keep playing with your code, and I'll do a more through review by the end of the weekend.

From playing with your fork, I see, in addition to the vast speed improvement, an inconsistency.

Given the following (modified) end segment of `large-json-file.json:

                "type": "object",
                "required": [
                    "errors",
                    "id",
                    "location",
                    "msiArmId",
$                     "name",^
                    "provisioningState",
                    "storageAccountArmId",
                    "systemData",
                    "type"
                ]
            }
        }
    }
}

Running jsonian-indent-line with the cursor at $ correctly moves "name", back into line. However, running indent-region with the point at $ and the mark at ^ indents "provisioningState", to match "name",:

                "type": "object",
                "required": [
                    "errors",
                    "id",
                    "location",
                    "msiArmId",
$                     "name",^
                      "provisioningState",
                    "storageAccountArmId",
                    "systemData",
                    "type"
                ]
            }
        }
    }
}

This is obviously not correct. Can you take a look?

(jsonian--forward-to-significant-char)
(current-column))))
(- (max start end) (min start end)))))
(forward-line 0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be willing to explain why your implementation of jsonian--infer-indentation is faster and the intuition behind how it works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed jsonian--infer-indentation not for performance but for robustness. For example, existing implementation doesn't behave well for the following examples (where ■ is the point):

{
  "a":
    1
  ■
}
[
  [
    [

■1
    ]
  ]
]

The new implementation travels from the innermost parent to the outermost ancestor, inferring the indentation from their first child with conditions:

  • The open bracket must be at the end of the line (excluding spaces).
  • The array/object must not be empty.
  • The first child must be before the point.
  • The inferred indentation must be greater than zero.

If we cannot infer an indentation, inferring from the outermost ancestor without the third condition.

Comment on lines +1501 to +1503
(let ((indent nil)
(origin (point))
(done nil)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be consistent with how we declare variables. parent-position is declared (defaulted to nil), we should do the same with indent and done. This is constant with the rest of the code base.

Suggested change
(let ((indent nil)
(origin (point))
(done nil)
(let (indent
(origin (point))
done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention is that if a variable isn't initialized at declaration but needs to be initialized later, I declare it as a bare identifier. Otherwise, i.e. if a variable is initialized to nil-as-false or nil-as-a-not-found-marker, I explicitly initialize it to nil. This is just a quirk of mine growing up as a Java programmer. I will follow whatever styles you prefer.

Add `jsonian-indent-region' and rewrite `jsonian-indent-line' to speed up
`indent-region'.
@taku0
Copy link
Contributor Author

taku0 commented Jun 16, 2023

This is obviously not correct. Can you take a look?

It's an off-by-one error. Fixed.

@taku0
Copy link
Contributor Author

taku0 commented Jun 18, 2023

Maybe I have to adapt your terminology. The following is my understanding. Is it correct?

Term I use Description Term you use
Level/nesting level Number of ancestors of the item. ?
Indentation How many spaces should we put for the beginning of a line. Indentation level/indent level
Offset/basic offset Indentation per level. Indentation/indent
Member Array element or key-value pair of object. Item/element
Key String before colon in object. Tag/key
Value Array element or something after colon in object. Value

Example:

[
  [
    [
      "abc"
    ]
  ]
]

At string "abc":

  • Level: 3
  • Offset: 2
  • Indentation: 6

@iwahbe iwahbe self-requested a review July 5, 2023 09:38
Copy link
Owner

@iwahbe iwahbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay. I dropped the ball on review.

I have read through the code a couple more times, to the point where I have an at least cursory understanding. As far as I can tell, this is currently an improvement to both accuracy and speed, so I'm going to merge as is.

I'll find some time later to do a cleanup pass to unify naming and comments where my understanding is murky.

Thanks for the PR!

@iwahbe iwahbe merged commit a94c33a into iwahbe:main Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants