Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[management] Auto update geolite #2297

Merged
merged 19 commits into from
Sep 9, 2024

Conversation

benniekiss
Copy link
Contributor

@benniekiss benniekiss commented Jul 21, 2024

Describe your changes

Introduces automated updates for GeoLite databases, eliminating the need to manually run the download-geolite2.sh script. Key changes include functions to fetch and update database files based on versioned filenames, auto-download logic when outdated files are detected, and a new --disable-geolite-update flag.

By default, this flag is set to true, disabling automatic updates. Users who want to enable auto-updates can do so by setting --disable-geolite-update=false. Additionally, obsolete scripts and reload functions have been removed.

This update simplifies version tracking and ensures database consistency without manual intervention while offering flexibility for self-hosted users.

Issue ticket number and link

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)
  • Extended the README / documentation, if necessary

@mlsmaycon mlsmaycon requested a review from bcmmbaga July 22, 2024 13:03
@benniekiss benniekiss force-pushed the auto_update_geolite branch from 547018f to 446f67f Compare July 23, 2024 14:32
@benniekiss
Copy link
Contributor Author

I can go ahead and refactor the switch...case... statement into two separate functions if that's necessary for the SonarCloud analysis

@benniekiss benniekiss force-pushed the auto_update_geolite branch from 446f67f to 2427f1b Compare July 27, 2024 14:10
@benniekiss benniekiss force-pushed the auto_update_geolite branch 2 times, most recently from 847e88e to 6b49362 Compare August 6, 2024 15:04
@bcmmbaga
Copy link
Contributor

bcmmbaga commented Aug 8, 2024

Hi @benniekiss, Thank you for your valuable contribution!.

How does it handle automatic geolite db's updates? For instance, I'm currently running the 20240305 db's, and when updates are released, will it upgrade the current GeoLite db's?

@benniekiss
Copy link
Contributor Author

Hi @benniekiss, Thank you for your valuable contribution!.

How does it handle automatic geolite db's updates? For instance, I'm currently running the 20240305 db's, and when updates are released, will it upgrade the current GeoLite db's?

It just downloads the new files and recreates them with the fresh data. The way it decides to download the new files is whether or not the files posted at https://docs.netbird.io/selfhosted/geo-support#downloading-the-databases have changed filenames. None of the logic around database creation has changed -- I really only changed the filename parsing and the heuristics around deciding when to recreate the databases.

Regarding whether it will auto-download when updates are released, I am not 100% sure. However, I have tested by naming the Geolite db's something like GeoLite2-City_20200101.mmdb, starting the management service, and seeing if it will pull the current databases -- and it does successfully download. It does not download if the database filenames match. I've included logging as well that makes this apparent to users.

I noticed that both the geonames.db and Geolite2-Cities.mmdb filenames were hardcoded, and the function getDatabaseFileName always returned download.SUFFIX as the filename since it was parsing the url, not the headers. This meant that it was difficult to tell what version the current geolite database was for admins.

Beyond the auto-updating feature, the new functions name the databases based on the filenames parsed from the Content-Disposition header, not the url, so it's easier to tell what version the database is and whether it is out of date.

The reason I put this together was because I was reviewing the documentation and saw the need to manually run download-geolite2.sh whenever an update was pushed; and I also noticed that outdated geolite data was the cause of some issues users had reported. This should make it easier to diagnose those issues, and also removes the need for users to manually manage the databases.

Copy link
Contributor

@bcmmbaga bcmmbaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add an environment flag to allow skipping the auto-update? Since updates occur frequently (1-2 times a week), this could impact self-hosted users with restricted download speeds.

Additionally, we can remove the download-geolite2.sh script, as it's no longer needed. We should also remove the reloader that checks for changes to geodatabase files and reloads them, as this functionality is no longer necessary with automatic updates.

func (gl *Geolocation) reloader(ctx context.Context) {
for {
select {
case <-gl.stopCh:
return
case <-time.After(gl.reloadCheckInterval):
if err := gl.locationDB.reload(ctx); err != nil {
log.WithContext(ctx).Errorf("geonames db reload failed: %s", err)
}
newSha256sum1, err := calculateFileSHA256(gl.mmdbPath)
if err != nil {
log.WithContext(ctx).Errorf("failed to calculate sha256 sum for '%s': %s", gl.mmdbPath, err)
continue
}
if !bytes.Equal(gl.sha256sum, newSha256sum1) {
// we check sum twice just to avoid possible case when we reload during update of the file
// considering the frequency of file update (few times a week) checking sum twice should be enough
time.Sleep(50 * time.Millisecond)
newSha256sum2, err := calculateFileSHA256(gl.mmdbPath)
if err != nil {
log.WithContext(ctx).Errorf("failed to calculate sha256 sum for '%s': %s", gl.mmdbPath, err)
continue
}
if !bytes.Equal(newSha256sum1, newSha256sum2) {
log.WithContext(ctx).Errorf("sha256 sum changed during reloading of '%s'", gl.mmdbPath)
continue
}
err = gl.reload(ctx, newSha256sum2)
if err != nil {
log.WithContext(ctx).Errorf("mmdb reload failed: %s", err)
}
} else {
log.WithContext(ctx).Tracef("No changes in '%s', no need to reload. Next check is in %.0f seconds.",
gl.mmdbPath, gl.reloadCheckInterval.Seconds())
}
}
}
}

@benniekiss
Copy link
Contributor Author

How should I go about implementing a disable flag? Since the check only happens when the management service is started, I could add a flag, --disable-geolite-update, so that if the geolite database doesnt exist, it is downloaded, and then on subsequent restarts, no further updates are done and only the existing database is loaded.

@bcmmbaga
Copy link
Contributor

You’re on the right track with the --disable-geolite-update flag. I suggest implementing it so that the database files are only downloaded the first time if they do not already exist. The key behavior we want to control with this flag is the check for outdated local databases.

If the flag is set, the system should skip the process of checking whether the local databases are outdated by comparing the names from the download URLs. This means that no further updates are attempted unless the user explicitly opts to check for updates on every restart of the management service.

@benniekiss benniekiss force-pushed the auto_update_geolite branch from 6b49362 to c973006 Compare August 17, 2024 19:48
@benniekiss
Copy link
Contributor Author

benniekiss commented Aug 17, 2024

Had to do some refactoring to allow disabling the auto-update feature, but I think this is a little cleaner approach.

I consolidated getGeoNamesDBFilename and getMMDBFilename into an exported function, GetMaxMindFilenames, which accepts two parameters: a dataDir string and an autoUpdate bool. When autoUpdate is false, it skips the URL name lookup and goes straight to finding an existing database file; if it cannot find a database with the new naming scheme, it defaults to the old database names, geonames.db and GeoLite2-City.mmdb. This should allow current deployments to disable auto-updating the database and essentially maintain the same functionality.

I added a unit test for GetMaxMindFilenames, which I tested locally and was passing.

Finally, I changed the .mmdb naming scheme from GeoLite2-City_{DATE}.mmdb to GeoLite2-City-maxmind_{DATE}.mmdb for clarity.

If users want to migrate to the new auto-update functionality, they have to manually remove the old databases.

Otherwise, once I remove the download-geolite2.sh and the reloader function, all that's left is updating the documentation.

@benniekiss
Copy link
Contributor Author

Before I remove the reloader functions and download-geolite2.sh, should these maybe be kept around for a while for a deprecation period? Users might still have a need for the download-geolite2.sh script if disabling auto-update.

It would be a little more work, but I could also refactor the reloader functions so that it can periodically poll the url for new database updates if enabled, that way the management service doesn't have to be restarted to get new database updates

@bcmmbaga
Copy link
Contributor

Before I remove the reloader functions and download-geolite2.sh, should these maybe be kept around for a while for a deprecation period? Users might still have a need for the download-geolite2.sh script if disabling auto-update.

It would be a little more work, but I could also refactor the reloader functions so that it can periodically poll the url for new database updates if enabled, that way the management service doesn't have to be restarted to get new database updates

The database names have changed from the previous ones. We need to either update the documentation for manual updates or adjust the script to use the new names. Since we have an auto-update option that triggers on management restart, updating the script might be redundant. We should remove them.

@benniekiss benniekiss force-pushed the auto_update_geolite branch from 79b6d36 to 398eff9 Compare August 20, 2024 21:40
@benniekiss benniekiss force-pushed the auto_update_geolite branch from 197e709 to 37a2f67 Compare August 23, 2024 10:49
@bcmmbaga
Copy link
Contributor

@benniekiss can you fix test TestGeolite_Name_Lookup failing on windows. Logs

@benniekiss
Copy link
Contributor Author

@benniekiss can you fix test TestGeolite_Name_Lookup failing on windows. Logs

could you share the logs again? I can't access them.

I think this is because I'm using the path module to parse filenames when I should be using path.filepath so that it targets os-specific file separators

@bcmmbaga
Copy link
Contributor

@benniekiss can you fix test TestGeolite_Name_Lookup failing on windows. Logs

could you share the logs again? I can't access them.

I think this is because I'm using the path module to parse filenames when I should be using path.filepath so that it targets os-specific file separators

Here is the updated link: logs

@benniekiss
Copy link
Contributor Author

I think this should fix it I think. I don't have a windows environment to test locally, though.

bcmmbaga
bcmmbaga previously approved these changes Aug 26, 2024
@bcmmbaga bcmmbaga changed the title [Feature] Auto update geolite [management] Auto update geolite Aug 26, 2024
@bcmmbaga bcmmbaga self-requested a review August 26, 2024 14:04
@bcmmbaga bcmmbaga dismissed their stale review August 26, 2024 14:08

If the auto update is disabled, the management download the files using previous format (GeoLite2-City.mmdb and geonames.db) instead of the new format (GeoLite2-City-geonames_20240305.db and GeoLite2-City-maxmind_20240305.mmdb)

@bcmmbaga bcmmbaga force-pushed the auto_update_geolite branch from 731b9b5 to 7c59902 Compare August 26, 2024 19:20
bcmmbaga
bcmmbaga previously approved these changes Aug 26, 2024
@benniekiss
Copy link
Contributor Author

Do you want me to rebase onto main, or anything else before merging?

I can create a PR updating the docs here, but I was not sure if there were any other places where documentation should be updated.

@bcmmbaga
Copy link
Contributor

Do you want me to rebase onto main, or anything else before merging?

I can create a PR updating the docs here, but I was not sure if there were any other places where documentation should be updated.

We're good to merge this PR without a rebase. For the documentation, it would be great to update the page since we’ve moved away from manual downloads to using the auto-download flag.

# Conflicts:
#	.github/workflows/test-infrastructure-files.yml
Copy link

sonarqubecloud bot commented Sep 8, 2024

@mlsmaycon mlsmaycon merged commit 12c3631 into netbirdio:main Sep 9, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants