Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement rolling hyper-log-log algorithm #8068

Merged
merged 13 commits into from
Jul 4, 2024
Merged

Conversation

knizhnik
Copy link
Contributor

Problem

See #7466

Summary of changes

Implement algorithm descried in https://hal.science/hal-00465313/document

Now new GUC is added:
neon.wss_max_duration which specifies size of sliding window (in seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this window using new function
approximate_working_set_size_seconds. Old function approximate_working_set_size is preserved for backward compatibility. But its scope is also limited by neon.wss_max_duration.

Version of Neon extension is changed to 1.4

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@knizhnik knizhnik requested review from a team as code owners June 15, 2024 10:47
Copy link

github-actions bot commented Jun 15, 2024

3024 tests run: 2909 passed, 0 failed, 115 skipped (full report)


Flaky tests (1)

Postgres 16

  • test_statvfs_pressure_usage: debug

Code coverage* (full report)

  • functions: 32.6% (6932 of 21267 functions)
  • lines: 50.0% (54446 of 108873 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
7df9051 at 2024-07-04T16:39:49.844Z :recycle:

@bayandin bayandin added the run-extra-build-macos When placed on a PR, tells the CI to run a build on macOS. No unit tests are run, though. label Jun 17, 2024
@kelvich kelvich requested a review from save-buffer June 18, 2024 15:44
@sharnoff sharnoff self-requested a review June 18, 2024 16:19
Copy link
Contributor

@MMeent MMeent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on memory efficiency, and which version we install by default.

pgxn/neon/hll.h Outdated Show resolved Hide resolved
pgxn/neon/neon.control Outdated Show resolved Hide resolved
MMeent and others added 2 commits June 20, 2024 10:01
We don't use sliding windows, so we can just drop the historical snapshot
requirement from the implementation, thus removing some tracking overhead.
@knizhnik knizhnik requested a review from MMeent June 20, 2024 07:20
@sharnoff
Copy link
Member

Some questions, based on discussion from #7466. Not blocking, just trying to get a better understanding:

  1. How do you expect this to be used? e.g. should sql-exporter expose a small number of metrics (maybe just n=1?) based on calling approximate_working_set_size_seconds with different values?
  2. What's the semantics of the new GUC? (can it be changed at runtime? -- not that I think it needs to be, just curious 😄)
  3. Do I understand it correctly that approximate_working_set_size_seconds(N) returns the size estimate for the last N seconds?

@kelvich
Copy link
Contributor

kelvich commented Jun 25, 2024

todo: add rollback scripts

@knizhnik
Copy link
Contributor Author

Some questions, based on discussion from #7466. Not blocking, just trying to get a better understanding:

  1. How do you expect this to be used? e.g. should sql-exporter expose a small number of metrics (maybe just n=1?) based on calling approximate_working_set_size_seconds with different values?

Actually, no idea. This is why I have complained in #7466 that I do not understand how autoscaler is going to use this working set size estimation for some period of time.

  1. What's the semantics of the new GUC? (can it be changed at runtime? -- not that I think it needs to be, just curious 😄)

There is no GUC any more: I have removed it after @MMeent review. So now window in unlimited - you can request estimation of working set sizer any period from compute startup.

  1. Do I understand it correctly that approximate_working_set_size_seconds(N) returns the size estimate for the last N seconds?

Yes

pgxn/neon/file_cache.c Outdated Show resolved Hide resolved
pgxn/neon/neon--1.3--1.4.sql Show resolved Hide resolved
@knizhnik knizhnik requested a review from MMeent July 4, 2024 12:48
pgxn/neon/file_cache.c Outdated Show resolved Hide resolved
@knizhnik knizhnik merged commit 88b13d4 into main Jul 4, 2024
66 checks passed
@knizhnik knizhnik deleted the sliding_hyperloglog branch July 4, 2024 19:04
@Bodobolero Bodobolero added the /release-notes Release notes content label Jul 6, 2024
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
/release-notes Release notes content run-extra-build-macos When placed on a PR, tells the CI to run a build on macOS. No unit tests are run, though.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants