Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade to maven-indexer 7.0.1 and improve index downloads #4999

Merged
merged 2 commits into from
May 5, 2023

Conversation

mbien
Copy link
Member

@mbien mbien commented Nov 22, 2022

upgrade to maven-indexer 7.0.1. (NetBeans entered the modern era! :-) )

  • module requires now jdk 11+
  • apache lucene 9.5.0 with memory mapped index
  • uses panama on JDK 19 (PR1, PR2) (JDK 20 PR)
    look out for: INFO [org.apache.lucene.store.MemorySegmentIndexInputProvider]: Using MemorySegmentIndexInput with Java 19; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
  • multi-threaded remote index extraction (at least twice as fast as ST, disabled by default)
  • removed hack which backported a bugfix
  • no EOL libs anymore in this module
  • remote index download permission UI
  • limited concurrent indexing extracted and already integrated in NB 17

remote index (maven central) extraction takes ~6 minutes (post change) on my system (4 cores, hyperthreading off).
local .m2 indexing speed is about the same

benchmark post-PR (old 4 core intel i7 and NVMe, JDK 19):

4 threads:
INFO [org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl]: Indexing of central took 383.24 s.
index folder: 6.4GB

3 threads:
INFO [org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl]: Indexing of central took 453.92 s.
index folder: 6,2 GB

2 threads:
INFO [org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl]: Indexing of central took 557.24 s.
index folder: 5,9 GB

1 thread:
INFO [org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl]: Indexing of central took 916.25 s.
index folder: 5,6 GB

thread count is capped to 4 since it incurs a on-disk file size penalty, furthermore the parallel extraction is always followed by a single threaded merge, which means more threads would have diminishing returns and move the bottle neck to the ST merge phase.

NB 17 on JDK 19 as comparison:

INFO [org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl]: Indexing of central took 1118.49 s.
index folder: 5,8 GB

note: tmp files aren't cleaned up properly atm. This might fill up your tmp during testing. (edit: fixed via apache/maven-indexer#273) fixed in 7.0.1

https://issues.apache.org/jira/browse/NETBEANS-1268

UI:
repo index download permission notifications:

index-permission-bubble_v2
index-permissions-notificaiton

Options (permission table + multi threading check box):
indexer-options

@mbien mbien added Upgrade Library Library (Dependency) Upgrade Maven [ci] enable "build tools" tests labels Nov 22, 2022
@mbien mbien force-pushed the indexer7 branch 3 times, most recently from b9c9ff2 to 6c8c9c9 Compare November 22, 2022 07:57
@mbien mbien added the ci:all-tests [ci] enable all tests label Jan 13, 2023
@mbien mbien force-pushed the indexer7 branch 2 times, most recently from 4837a0b to c7b719f Compare January 13, 2023 09:27
@mbien mbien added the ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) label Feb 27, 2023
@mbien mbien changed the title upgrade to maven-indexer 7.0.0 upgrade to maven-indexer 7.0.1 Feb 27, 2023
@mbien
Copy link
Member Author

mbien commented Feb 27, 2023

bump to maven-indexer 7.0.1

@mbien
Copy link
Member Author

mbien commented Feb 27, 2023

thanks to @cstamas for resolving the silo cleanup issue - works great!

@mbien
Copy link
Member Author

mbien commented Mar 17, 2023

draft: apache/maven-indexer#302 would give us more control over the temp storage location + give us the option to filter the index

some tests (time filter on y axis, field filter on x):

full: 5.6 -> 2.8 GB
2y: 2.6  -> 1.4 GB
1y: 1.4 -> 0,8 GB

mbien added a commit to mbien/netbeans that referenced this pull request Apr 19, 2023
 - disable CV tests on JDK 8
 - move some more jobs to JDK 11

this configuration would allow us to move new modules like rust/go/hcl
to JDK 11 and to upgrade exiting modules, like the maven-indexer apache#4999.
mbien added a commit to mbien/netbeans that referenced this pull request Apr 22, 2023
 - disable CV tests on JDK 8
 - move some more jobs to JDK 11

this configuration would allow us to move new modules like rust/go/hcl
to JDK 11 and to upgrade exiting modules, like the maven-indexer apache#4999.
mbien added a commit to mbien/netbeans that referenced this pull request Apr 22, 2023
 - disable CV tests on JDK 8
 - move some more jobs to JDK 11

this configuration would allow us to move new modules like rust/go/hcl
to JDK 11 and to upgrade exiting modules, like the maven-indexer apache#4999.
@mbien mbien requested a review from lkishalmi April 28, 2023 07:08
@mbien mbien added the Upgrade JDK Upgrade to the JDK requirements of a module. label Apr 28, 2023
@mbien mbien changed the title upgrade to maven-indexer 7.0.1 upgrade to maven-indexer 7.0.1 and improve index downloads Apr 29, 2023
@vieiro
Copy link
Contributor

vieiro commented Apr 30, 2023

Hi @mbien,

I have no remote URLs to download from:

imagen

Is this as expected? How do I add one?

On other news, here's some stuff from my stdout (question: are we expected to System.out all artifacts in the local repo?)

...
     [exec] [Maven Repo Index Transfer/Scan] INFO org.eclipse.aether.internal.impl.DefaultArtifactResolver - Artifact org.jboss.shrinkwrap.descriptors:shrinkwrap-descriptors-bom:pom:2.0.0 is present in the local repository, but cached from a remote repository ID that is unavailable in current build context, verifying that is downloadable from [central (https://repo.maven.apache.org/maven2, default, releases)]
     [exec] [Maven Repo Index Transfer/Scan] INFO org.eclipse.aether.internal.impl.DefaultArtifactResolver - Artifact org.jboss.shrinkwrap.resolver:shrinkwrap-resolver-bom:pom:3.1.4 is present in the local repository, but cached from a remote repository ID that is unavailable in current build context, verifying that is downloadable from [central (https://repo.maven.apache.org/maven2/, default, releases+snapshots)]
     [exec] [Maven Repo Index Transfer/Scan] INFO org.eclipse.aether.internal.impl.DefaultArtifactResolver - Artifact org.jboss.shrinkwrap.descriptors:shrinkwrap-descriptors-bom:pom:2.0.0 is present in the local repository, but cached from a remote repository ID that is unavailable in current build context, verifying that is downloadable from [central (https://repo.maven.apache.org/maven2/, default, releases+snapshots)]
     [exec] [Maven Repo Index Transfer/Scan] INFO org.eclipse.aether.internal.impl.DefaultArtifactResolver - Artifact org.jboss.shrinkwrap:shrinkwrap-bom:pom:1.2.6 is present in the local repository, but cached from a remote repository ID that is unavailable in current build context, verifying that is downloadable from [central (https://repo.maven.apache.org/maven2/, default, releases+snapshots)]
     [exec] INFO [org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl]: Indexing of local took 18,09 s.

@mbien
Copy link
Member Author

mbien commented Apr 30, 2023

@vieiro this is a list of user given permissions. Try opening a pom or something which triggers indexing (e.g add dependency to a project via UI), nb will then show a notification, based on what you answer it will show up in the list. You don't have to change anything on the list unless you want to change the permission in post.

@vieiro
Copy link
Contributor

vieiro commented Apr 30, 2023

@vieiro this is a list of user given permissions. Try opening a pom or something which triggers indexing (e.g add dependency to a project via UI), nb will then show a notification, based on what you answer it will show up in the list. You don't have to change anything on the list unless you want to change the permission in post.

Hi @mbien , thanks for the tip!

Works perfect now. I've tried to unpack the maven index from central, after a few minutes with fans at top speed and ~3.9Gb var/cache/mavenindex/ I decided to cancel it, and it's now disabled forever! Thanks very much for this!

imagen

Copy link
Contributor

@matthiasblaesing matthiasblaesing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work. This is what I noticed:

  • I canceled a running remote index download, then I have no option to restart that, because the "Index Now" button only triggers a local index, not an index of the remote repositories. I did not look deeply into the mechanisms that hold the "indexing was started, but aborted" logic, but somewhere this seems to be stored, because it did not retrigger, when asking for completion. IMHO either the "Index Now" button also covers the remote repositories or the listview for the known repository should allow requesting an update on an individual basis.
  • The update frequency is still "Never", this contradicts my answer to allow downloading
  • Multithreaded indexing should IMHO be enabled by default (we should at least pretend, that hardware manufactures know how to build thermally stable hardware)
  • I suggest for the popup:
    • move text into bundle, so that it might be localiced

    • Text suggestion:

          pane.setText(MessageFormat.format(
                  "Maven index download requested for: {0}<br/>"
                  + "<ul>"
                  + "<li><a href=\"allow-always\">Allow always</a></li>"
                  + "<li><a href=\"allow-once\">Allow once</a></li>"
                  + "<li><a href=\"deny-once\">Deny once</a></li>"
                  + "<li><a href=\"deny-always\">Deny always</a></li>"
                  + "</ul>"
                  + "A repository index contains artifact metadata which is useful for some NetBeans features.<br />"
                  + "<a href=\"disable-downloads\">Disable all remote downloads</a>",
                  repo.getRepositoryUrl()
          ));

      Leads to:

      Bildschirmfoto vom 2023-04-30 21-52-20

      At least Jan noticed, that maven download insisted on downloading while on the bus, that might be a situation where I'd like it to "Deny once".

  • Looking at the indexing notification: Will this be a problem for LSP?

@mbien
Copy link
Member Author

mbien commented Apr 30, 2023

I canceled a running remote index download, then I have no option to restart that, because the "Index Now" button only triggers a local index, not an index of the remote repositories.

right, canceling will add a delay before it will try to run again. This is intended. I am not sure how important the usecase is to start the indexing right after a manual cancel again. I agree that the "index now" button needs to be looked at since it is only for the local index and this is not clearly communicated in the UI. Restart after cancel is more of a dev testing issue for me and less of a common usecase.

Multithreaded indexing should IMHO be enabled by default

I don't want to do this before filtering is available since this has an index size overhead. This would be another PR. We also got feedback of notebook usage etc which I don't want to ignore either.

I suggest for the popup:

I will take a look at this. But I won't implement "run once" for now since this complicates things unnecessarily. Indexing is not a run-once feature, either you turn it on or off. Cancelling pauses it as mentioned before.

Looking at the indexing notification: Will this be a problem for LSP?

likely. IndexingNotificationProvider.java has one method more now, if this isn't implemented and nobody sets the permission, indexing won't run now. Let me add a default implementation which simply allows all downloads to have the same behavior as before (assuming there is a IndexingNotificationProvider in the lookup!).

Copy link
Member Author

@mbien mbien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes:

  • "update now" button will trigger updates for all repos (permission required, download pause is reset)
    • if a remote repo is already up2date, it won't update the repo
    • changed label text to "Check for Updates"
  • added a dummy impl for IndexingNotificationProvider to retain old behavior even when no proper implementation is found
  • extracted text to bundle, updated the notification a bit

index-permission-bubble_v2

IMHO (...) the listview for the known repository should allow requesting an update on an individual basis.

i think this might be a potential upgrade for some time later. I thought about it too but I opted for not allowing too much micromanagement for now.

Comment on lines +735 to +740
if (!RepositoryPreferences.isIndexDownloadAllowedFor(repo)) {
IndexingNotificationProvider np = Lookup.getDefault().lookup(IndexingNotificationProvider.class);
if(np != null) {
np.requestPermissionsFor(repo);
}
return true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about returning false when no IndexingNotificationProvider is found. But it is probably better to add a dummy implementation which simply allows everything. Once all users of this API have their own implementation, the dummy can be removed.

I think it is a better behavior to not a allow downloads by default if there is no provider.

@mbien mbien added the ci:all-tests [ci] enable all tests label May 1, 2023
@apache apache locked and limited conversation to collaborators May 1, 2023
@apache apache unlocked this conversation May 1, 2023
@matthiasblaesing
Copy link
Contributor

I think this is mostly good to go. The only thing missing is IMHO restoring periodic updates. Given, that you now clear all downloads, there is IMHO no reason to still keep it on never and it IMHO contradicts the "allow" setting.

@vieiro
Copy link
Contributor

vieiro commented May 1, 2023

Great job, @mbien!

There's a "gobally" -> "globally" typo you may want to take a look at before merging.

@mbien
Copy link
Member Author

mbien commented May 1, 2023

There's a "gobally" -> "globally" typo you may want to take a look at before merging.

thanks! I wouldn't have noticed this.

The only thing missing is IMHO restoring periodic updates. Given, that you now clear all downloads, there is IMHO no reason to still keep it on never and it IMHO contradicts the "allow" setting.

good point. Probably better to remove the never option now. The usecase of never is also covered by the index downloads check box. Going to take a look at it.

@mbien mbien removed ci:all-tests [ci] enable all tests do not merge Don't merge this PR, it is not ready or just demonstration purposes. labels May 1, 2023
@mbien
Copy link
Member Author

mbien commented May 2, 2023

last commit removes the never update frequency. Typo is fixed too but already squashed.

planning to squash it into two commits before merge

@mbien mbien added the ci:all-tests [ci] enable all tests label May 2, 2023
mbien added 2 commits May 5, 2023 01:55
 - module requires now jdk 11+
 - apache lucene 9.5.0 with memory mapped index
 - will use panama on JDK 19
 - multi-threaded remote index extraction (at least twice as fast as ST)
 - revert "hack: backported LUCENE-10431 hashcode hotfix to EOL lucene 8.11.1."
   (commit 35b41f6.)
 - implement permission system via netbeans notifications
 - multithreaded extraction setting for maven indexer options
 - better temp storage handling (extract directly to cache folder)
 - cancellation sets a timeout, pausing starts of other repo updates
 - 'update now' button should update everything and reset timeout
 - remove 'never' update frequency since it was replaced by
   a combination of other settings
 - replace some deprecated usages
@mbien
Copy link
Member Author

mbien commented May 5, 2023

last test run with all tests enabled. thanks for the reviews -> going to merge today

@mbien
Copy link
Member Author

mbien commented May 5, 2023

merging earlier than planned since I got some time for testing and the next steps (search service/filters)

Lucene 9.6 is about to be released which has a better panama implementation which supports using panama on JDK 19 and 20 in contrast to only 19. Going to have to bump to that too once available.

@mbien mbien merged commit 5d5a120 into apache:master May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:all-tests [ci] enable all tests ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) Maven [ci] enable "build tools" tests performance Upgrade JDK Upgrade to the JDK requirements of a module. Upgrade Library Library (Dependency) Upgrade
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants