Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQLite] Add option to execute PRAGMA optimize; on close of a connection #2111

Closed
abonander opened this issue Sep 22, 2022 · 4 comments · Fixed by #2116
Closed

[SQLite] Add option to execute PRAGMA optimize; on close of a connection #2111

abonander opened this issue Sep 22, 2022 · 4 comments · Fixed by #2116
Labels
db:sqlite Related to SQLite E-easy enhancement New feature or request good first issue Good for newcomers

Comments

@abonander
Copy link
Collaborator

abonander commented Sep 22, 2022

While browsing the SQLite manual, I noticed this section on the ANALYZE page: https://www.sqlite.org/lang_analyze.html#recommended_usage_pattern

Applications with long-lived databases that use complex queries should consider running the following commands just prior to closing each database connection:

PRAGMA analysis_limit=400;
PRAGMA optimize;

The optimize pragma is usually a no-op but it will occasionally run ANALYZE if it seems like doing so will be useful to the query planner. The analysis_limit pragma limits the scope of any ANALYZE command that the optimize pragma runs so that it does not consume too many CPU cycles. The constant "400" can be adjusted as needed. Values between 100 and 1000 work well for most applications.

The docs then go on to explain why this is suggested:

Each SQLite database connection records cases when the query planner would benefit from having accurate results of ANALYZE at hand. These records are held in memory and accumulate over the life of a database connection. The PRAGMA optimize command looks at those records and runs ANALYZE on only those tables for which new or updated ANALYZE data seems likely to be useful. In most cases PRAGMA optimize will not run ANALYZE, but it will occasionally do so either for tables that have never before been analyzed, or for tables that have grown significantly since they were last analyzed.

Since the actions of PRAGMA optimize are determined to some extent by prior queries that have been evaluated on the same database connection, it is recommended that PRAGMA optimize be deferred until the database connection is closing and has thus had an opportunity to accumulate as much usage information as possible. It is also reasonable to set a timer to run PRAGMA optimize every few hours, or every few days, for database connections that stay open for a long time.

I could see something like the following added to SqliteConnectOptions:

impl SqliteConnectOptions {
    /// Execute `PRAGMA optimize;` on the SQLite connection before closing.
    ///
    /// The SQLite manual recommends using this for long-lived databases.
    ///
    /// This will collect and store statistics about the layout of data in your tables to help the query planner make better decisions.
    /// Over the connection's lifetime, the query planner will make notes about which tables could use up-to-date statistics so this
    /// command doesn't have to scan the whole database every time. Thus, the best time to execute this is on connection close.
    ///
    /// You may also wish to set the value for [`analyze_limit`][Self::analysis_limit] so database connections close quickly,
    /// even if your database is very large.
    ///
    /// Not enabled by default.
    ///
    /// See [the SQLite manual](https://www.sqlite.org/lang_analyze.html#automatically_running_analyze) for details.
    pub fn optimize_on_close(self, enabled: bool) -> Self { /* ... */ }

    /// Set a soft limit on the number of rows that `ANALYZE` touches per index.
    ///
    /// This also affects `PRAGMA optimize` which is set by [Self::optimize_on_close].
    ///
    /// The value recommended by SQLite is `400`. There is no default.
    /// 
    /// See [the SQLite manual](https://www.sqlite.org/lang_analyze.html#approx) for details.
    pub fn analysis_limit(self, limit: impl Into<Option<u32>>) -> Self { /* ... */ }
}
@abonander abonander added enhancement New feature or request db:sqlite Related to SQLite E-easy labels Sep 22, 2022
@abonander
Copy link
Collaborator Author

It is also reasonable to set a timer to run PRAGMA optimize every few hours, or every few days, for database connections that stay open for a long time.

I don't believe a specific option for this is necessary as this can be done by using a SqlitePool and setting max_lifetime.

@abonander abonander added the good first issue Good for newcomers label Sep 22, 2022
@miles170
Copy link
Contributor

Hello, I would like to try this out. But I have the following questions:

  1. Should we only run PRAGMA optimize when the user explicitly calls close?
  2. If analyze_limit is None, which means we don't have to call PRAGMA analysis_limit = XXX;?

@abonander
Copy link
Collaborator Author

abonander commented Sep 23, 2022

@miles170

  1. Should we only run PRAGMA optimize when the user explicitly calls close?

That one's a hard call. If the connection is dropped without being explicitly closed then either the program is about to exit and the worker thread might get killed before it finishes running, or an error occurred on the connection that may have left it in a bad state, so I would probably say that it should only be run on an explicit close() call.

It might be a good idea to note that in the documentation though.

  1. If analyze_limit is None, which means we don't have to call PRAGMA analysis_limit = XXX;?

Yeah, if it's set to None then don't send it. Because it affects explicit ANALYZE calls as well, the user might want those to run completely and not exit early.

It makes me wonder if we should have a separate option that sets analysis_limit just before sending PRAGMA optimize on-close so it doesn't affect explicit ANALYZE calls. Maybe that should just be an argument to optimize_on_close?

Also, I didn't notice that the pragma is actually called analysis_limit so to avoid confusion please use the same name for the method on SqliteConnectOptions instead of analyze_limit (I've edited the issue to reflect this).

@Garmelon
Copy link

Garmelon commented Aug 4, 2023

It seems like this issue should be closed since the linked PR resolves it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
db:sqlite Related to SQLite E-easy enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants