Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential N^2 issue in large file read #1279

Open
4 of 12 tasks
gsjaardema opened this issue Jan 9, 2019 · 26 comments
Open
4 of 12 tasks

Potential N^2 issue in large file read #1279

gsjaardema opened this issue Jan 9, 2019 · 26 comments
Assignees
Milestone

Comments

@gsjaardema
Copy link
Contributor

gsjaardema commented Jan 9, 2019

Please provide as much of the following information as you can, as applicable to the issue being reported. Naturally, not all information is relevant to every issue, but the more information we have to start, the better!

Environment Information

Feel free to skip this if the issue is related to documentation, a feature request, or general discussion.

  • What platform are you using? (please provide specific distribution/version in summary)
    • Linux
    • Windows
    • OSX
    • Other
    • NA
  • 32 and/or 64 bit?
    • 32-bit
    • 64-bit
  • What build system are you using?
    • autotools (configure)
    • cmake
  • Can you provide a sample netCDF file or C code to recreate the issue?
    • Yes (please attach to this issue, thank you!)
    • No
    • Not at this time

Summary of Issue

I have a file with 10,000 dims and 7,000 vars. If I do a nccopy with the file in netcdf-3 classic format, it takes 0.422 seconds. If I do the same nccopy except that the file is in netcdf-4 classic format, it takes 13.360 seconds. These are both for debug, non-optimized builds.

I ran the nccopy in valgrind and it points to the rec_match_dimscales function as taking 32.68 % of the time (inclusive -- including the functions it calls). It calls ncindexth and nclistget each about 60,000,000 times.

It looks like rec_match_dimscales has a loop over the vars and then inside that loop it loops over the dims to find matches for the dimscales of the var. This results in basically O(nvars * ndims) scale behavior.

Maybe it is possible to somehow preprocess the dims data to make it easier to match the vars dimscale dims without a linear search through the dims list? Perhaps a hash on the objno and fileno would permit O(1) access instead of O(N^2)...

The attached file was compressed with gzip.

mix-nc4.g.gz

These are all with current github master.

@gsjaardema
Copy link
Contributor Author

A correction to the above -- debug nccopy time for netcdf-3 is 0.7 to 0.8 seconds.

Optimized shows:

ceerws2703a:build(master)> time ../bin/nccopy mix-nc4.g tmp-4.g (3 times)
	0m11.405s
	0m11.775s
	0m11.799s

ceerws2703a:build(master)> time ../bin/nccopy mix-nc3.g tmp-3.g (3 times)
	0m0.545s
	0m0.529s
	0m0.589s

@edhartnett
Copy link
Contributor

Can you try with the code in PR #1262. Can you try with that code? You must recreate you test file with the library after applying 1262. Then you will avoid the dimscale mathing.

@gsjaardema
Copy link
Contributor Author

I tried it on my laptop (not the system used above) and it only dropped times from about 10.6 to 9.9. I will try on the system above later this week or early next week.

@DennisHeimbigner
Copy link
Collaborator

Part of the problem is that nccopy is in the end going to
access all of the metadata in order to copy it. There is no
way in that situation to get any better speed than the underlying
HDF5 will allow. My speculation is that doing the copy using
only he HDF5 library will not be significantly faster. If there is an hdf5
copy utility, then we could test that out.

@DennisHeimbigner
Copy link
Collaborator

Actually I see there is an h5copy utility. Might be interesting to
do a complete copy and time it. If it is much faster, then it is doing
some kind of optimization of which we are not aware.

@edhartnett
Copy link
Contributor

edhartnett commented Jan 10, 2019

Greg, did you recreate your file with the new version of netcdf?

Use h5dump to confirm that all datasets in the file have the _Netcdf4Coodinates attr, which lists dimids for that dataset.

@gsjaardema
Copy link
Contributor Author

Using valgrind, the patch reduced the overall execution time from 18.289 to 15.364 or 16% which is good.

I haven't found a good hdf5 utility to check how fast it can read/write the files. h5copy copies only selected datasets as best I can tell.

@edhartnett
Copy link
Contributor

Can you use a profiler and see where all the time is being spent?

It should not be in rec_match_dimscales(). So where is it?

@gsjaardema
Copy link
Contributor Author

The percentage time in rec_match_dimscales() went from 32.7% to 17.8%, but it is still the most expensive function (including the functions it calls nclistget and ncindexth .

I've attached a screen shot showing the before and after profile sorted by self time.

screen shot 2019-01-14 at 10 08 37 am

I recreated the file using nccopy old_file.nc new_file.nc and then did all subsequent runs using new_file.nc. I do see the _Netcdf4Coodinates attribute using h5dump.

@gsjaardema
Copy link
Contributor Author

It looks like the _Netcdf4Coordinates attribute exists on all vars which have more than 1 dimension, but if the variable only has a single dimension, it does not have that attribute. In this file, probably about half are multi-dim and about half are ndims=1

@gsjaardema
Copy link
Contributor Author

If I modify to write the _Netcdf4Coordinates attribute for all variables, regenerate the file, and rerun valgrind, it drops from 15.364 to 12.871 or another 16%. The rec_match_dimscales() now accounts for only 0.01% of the time. The top functions are now HDF5 or malloc/free related.

screen shot 2019-01-14 at 11 31 43 am

(I think the memcpy which is 50% coming from nclistsetalloc could be reduced with smarter reallocation but probably a hit in memory use...)

@edhartnett
Copy link
Contributor

What do you mean "modify to write the _Netcdf4Coordinates attribute for all variables"? This should be happening automatically.

@gsjaardema
Copy link
Contributor Author

No, currently only happening for multi-dimensional variables according to documentation and to code. See nc4hdf.c

@@ -1012,7 +1012,7 @@ var_create_dataset(NC_GRP_INFO_T *grp, NC_VAR_INFO_T *var, nc_bool_t write_dimid
    /* Always write the hidden coordinates attribute, which lists the
     * dimids of this var. When present, this speeds opens. When no
     * present, dimscale matching is used. */
-   if (var->ndims > 1)
+   /*   if (var->ndims > 1) */
       if ((retval = write_coord_dimids(var)))
          BAIL(retval);

@edhartnett
Copy link
Contributor

OK, thanks for catching that, it should be used even for 1D vars. I will fix that.

Let me construct a test case with 5000 dimensions and see what I can see. ;-)

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Jan 14, 2019

I got h5copy to copy the file using this command:
h5copy -i input.h5 -o output.h5 -f ref -s / -d /x
In order to avoid a name conflict (God know why) I had
to add the extra group /x to receive the content of / in the input file.
But it seems to copy everything.

@DennisHeimbigner
Copy link
Collaborator

If nclistsetalloc is the problem, then it is likely because its default list
allocation size is too small. It starts by allocating 16 elements and
then doubles every time it needs to increase.
The ncindex code probably is the ultimate problem and needs to use
a larger default starting size.

@edhartnett
Copy link
Contributor

I am modifying the recently-added tst_wrf_reads.c to emulate this case. That way we can try a few things...

@gsjaardema
Copy link
Contributor Author

gsjaardema commented Jan 14, 2019 via email

@DennisHeimbigner
Copy link
Collaborator

oops, you are correct. It doubles only if no size is specified, otherwise
it grows to be exactly the size specified.

@DennisHeimbigner
Copy link
Collaborator

To be precise,

O(nvars * ndims) scale behavior.

ndims is rank of each var, so it is not O(nvars * alldims) but rather much less.

@edwardhartnett
Copy link
Contributor

Once your changes in #2047 are merged, this should improve a lot...

@gsjaardema
Copy link
Contributor Author

@DennisHeimbigner I disagree, without the coord dimids on the 1D variables, the behavior scales as O(nvars * ndims) where ndims == alldims not the rank of each var. You can see this with the 10,000 dims and 7,000 vars calling ncindexth and nclistget each about 60,000,000 times.

I think it actually scales as O( (1D nvars) * ndims)

@DennisHeimbigner
Copy link
Collaborator

Odd. I was looking at the code fore rec_match_dimscales and it
is appears to iterate over the dim ids associated with the variable
and then doing` a direct lookup of the corresponding dimension object.

@gsjaardema
Copy link
Contributor Author

I think that if it has the dim ids associated with the variable, then it can do the direct lookup, but if it is a 1D variable, then it has to search all of the dims for the dimid. That is the speedup of #2407 is that it gives us the coordinate dimid attribute associated with all variables and then we do get the direct lookup instead of the current behavior without the patch where 1D variables do not have any way of storing the dimid and therefore there isn't a direct lookup.

Maybe I am misunderstanding your question/assertion.

@DennisHeimbigner
Copy link
Collaborator

If you have the NC_VAR_INFO_T* pointer to the var info,
then the field "dimids" should have the set of dimids associated
with the variable whatever the rank.
So if you are seeing nvars*ndims behavior, there is something seriously
wrong with rec_match_dimscales. I will take a closer look.

@edwardhartnett
Copy link
Contributor

IIRC rec_match_dimscales should be called only when our own hidden attribute metadata does not know the dimids. In that case, rec_match_dimscales() does a very slow recursive search through the file for matching dimscales.

Since something like 4.6.x the dim info is always written to our internal metadata hidden attributes, but before that, it was not.

@WardF WardF modified the milestones: 4.8.0, 4.8.2 Aug 30, 2021
@WardF WardF modified the milestones: 4.8.2, 4.9.1 Jun 15, 2022
@WardF WardF modified the milestones: 4.9.1, 4.9.2 Feb 13, 2023
@WardF WardF modified the milestones: 4.9.2, 4.9.3 May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants