Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading large netcdf file with python 3 #535

Closed
liliang8606 opened this issue Mar 4, 2016 · 31 comments
Closed

reading large netcdf file with python 3 #535

liliang8606 opened this issue Mar 4, 2016 · 31 comments

Comments

@liliang8606
Copy link

I have a netcdf file about 30gb size. When i try to read the file with python 3.5, it give an error:

File "netCDF4_netCDF4.pyx", line 1795, in netCDF4._netCDF4.Dataset.init (netCDF4_netCDF4.c:12278)
RuntimeError: Unknown error

I also tried the different options for the netcdf operator, such as versions. But nothing helps. The strange thing is when i read the same file with python 2, it works. Is there a compatible issue with the netcdf4 libraries and python 3?

@jswhit
Copy link
Collaborator

jswhit commented Mar 4, 2016

No, there is no know issue with python 3 compatibility. It's difficult to say without more to go on. I'd suggest posting the file somewhere, but at 30Gb that would be difficult.

@jswhit
Copy link
Collaborator

jswhit commented Mar 4, 2016

Since the error is occuring when opening the dataset, the variable data has not been read yet (only the metadata about the variables, dimensions and groups). Could you create a version of the file without the data written to the variables (just the variable, dimension and attributes defined)? If compression is turned on, the filesize should be small since all the variable data would set to the _FillValue and would compress down to nearly nothing.

@liliang8606
Copy link
Author

I have tried with the kitchen sink tool, to reduce the file size. With a file size of 8MB, it works fine with the python 3.5. With another file size of 6GB, it returns the same error.

@WardF
Copy link
Member

WardF commented Mar 4, 2016

What platform are you on?

@liliang8606
Copy link
Author

I am using anaconda 3 64bit on windows

@WardF
Copy link
Member

WardF commented Mar 4, 2016

Since you are on Windows, I wonder if it is related to Unidata/netcdf-c#188 , the fix for which should go in today.

@jswhit
Copy link
Collaborator

jswhit commented Mar 4, 2016

When you trimmed the file size with the nco tool, did you retain the same number of variables, dimensions and attributes?

@liliang8606
Copy link
Author

No, actually, i trimmed the file size by reducing the number of variables and dimensions.

@jswhit
Copy link
Collaborator

jswhit commented Mar 10, 2016

Unidata/netcdf-c#188 has been merged into master. Since you're using Anaconda Windows, I understand it may be difficult to try this - but if you have the ability to rebuild the C library from source it would be much appreciated if you could try this fix and let us know if it works.

@ritviksahajpal
Copy link

ritviksahajpal commented May 16, 2016

I am having same issue with a large netCDF file (~6 GB) using python 3.5.1 on windows 10. I can open the same file using python 2.7.11 just fine. I get the error: *** OSError: Unknown error

The netCDF file is here: https://www.dropbox.com/s/3ia5wrh5u8z9spr/states.nc?dl=0 (please note that it is ~6 GB in size)

@ReneSalhab
Copy link

ReneSalhab commented Sep 21, 2016

Although redundant, I think it´s necessary to push this issue a little. I have the same issue on Windows 10. I try to append an already existing file that is about 5 GB big. It gives also
*** OSError: Unknown error
and
netCDF4\_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__ (netCDF4\_netCDF4.c:12626)().

As 1.2.4 (which I´m using with Anaconda) was released after the fix, the problem seems still to exist.

Edit: Tested it with ritviksahajpal's file. Here the error also occurs when reading it.

@mrksr
Copy link

mrksr commented Oct 14, 2016

I do experience the same with current Anaconda and Windows 7. The error seems to exist in both the versions in "conda" and in "conda-forge".
I also tried building netcdf4-python from source using the dependencies supplied by Anaconda (libnetcdf-4.3.3.1-3 and hdf5-1.8.17-vc14_6 from conda-forge) and the error still exists on current master.

@bflatmaj7
Copy link

Any update on this? The error still persists.
I am using Anaconda with netcdf Version 1.2.4 on Windows 7. Cannot read files of size 5GB and larger. Smaller files (<1GB) work just fine.
It would be great if this could be fixed anytime soon.
Thanks!

@msquared6
Copy link

I can report that as of yesterday, the problem still exists. My circumstances are:
Platforms: windows 8.1 and 7 64 bit
Python: 3.5.2 | packaged by conda-forge | (default, Sep 8 2016, 14:25:50) [MSC v.1900 64 bit (AMD64)]
and
Python: 3.5.2 | packaged by conda-forge | (default, Jan 23 2017, 20:04:35) [MSC v.1900 64 bit (AMD64)] with the IOOS3 environment installed as per:
https://github.com/ioos/notebooks_demos/wiki/Installing-Conda-Python-with-the-IOOS-environment

For example: two netcdf4 files, a big file with over 3 million points in the time series (3.2GB). A small file with 9999 points in the time series (9.8 MB). This code will open the small file (using xarray):
ds = xr.open_dataset(smallfile, chunks={'rec': 3600}, decode_times=False)
With the big file, I get the unknown error described previously on this thread.
Thanks!

@jswhit
Copy link
Collaborator

jswhit commented Feb 17, 2017

Has anybody tried @WardF's suggestion of upgrading the netcdf-c library to the current master, which includes the fix for Unidata/netcdf-c#188? It sure sounds like this could fix the issue.

@msquared6
Copy link

Here, we've demonstrated that my files, large and small, can be opened on a MAC, and not in windows.

@ocefpaf
Copy link
Collaborator

ocefpaf commented Feb 17, 2017

IOOS3 environment installed as per:
https://github.com/ioos/notebooks_demos/wiki/Installing-Conda-Python-with-the-IOOS-environment

In those instruction we pin libnetcdf to 4.4.0 because of an OPeNDAP bug. (See conda-forge/libnetcdf-feedstock#14)

But conda-forge does ship version 4.4.1.1 which is reported to work with large files.

@WardF I can backport the fix for Unidata/netcdf-c#188 on conda-forge via a patch if you can point me to the PR that fixed it. Unfortunately we cannot unpin libnetcdf in our envs b/c OPeNDAP access is crucial for our work.

@mmartini-usgs
Copy link

How can I double check if I have the offending versions noted in Unidata/netcdf-c#188? It looks - from my Anaconda Navigator listing - like I have msvc 14 runtime as vs2015_runtime version 14.0.25420, and netcdf4 version 1.2.7, neither indicating an update is needed. I'm trying a conda update --all anyway.

@dopplershift
Copy link
Member

Try this:
conda list libnetcdf

@mmartini-usgs
Copy link

Thanks Ryan, I can confirm that it is not working on libnetcdf 4.4.0. I mangled my python installation trying to upgrade, so need to reinstall.

@jswhit
Copy link
Collaborator

jswhit commented Feb 17, 2017

@msquared6, Unidata/netcdf-c#188 fixes a Windows-specific issue with large files.

@dopplershift
Copy link
Member

So getting libnetcdf >= 4.4.1 should be enough to resolve--and will be manageable once they figure out what's going on with some opendap links on windows.

@rsignell-usgs
Copy link

@WardF or @jswhit, I just want to reiterate what @ocefpaf said so it doesn't get lost here:

I can backport the fix for Unidata/netcdf-c#188 on conda-forge via a patch if you can point me to the PR that fixed it.

The problem on Windows right now is:

  • libnetcdf 4.4.0 has libnetcdf issue
  • libnetcdf 4.4.1 has opendap issue

So @ocefpaf is offering to back port the 4.4.1 fix for libnetcdf back to 4.4.0, but he can't find the relevent code.

@dopplershift
Copy link
Member

You're assuming they were done in a PR--don't do that.

I'm guessing a81f150e886239 and b19b807e8bbe81.

@mmartini-usgs
Copy link

OK, I have reinstalled my conda, freshly downloaded as per IOOS3 instructions and I now have this: 3.6.0 | packaged by conda-forge | (default, Feb 9 2017, 14:54:13) [MSC v.1900 64 bit (AMD64)]
with this: libnetcdf 4.4.0 vc14_2 [vc14] conda-forge

Please help a python neophyte properly update to libnetcdf 4.4.1.1, I wrecked my installation thinking I knew the right command. conda update ???

Thanks.

@ocefpaf
Copy link
Collaborator

ocefpaf commented Feb 17, 2017

@mmartini-usgs I just pushed a patched version for libnetcdf 4.4.0 that should work with all OPeNDAP URLs and has the proper fix for large files on Windows. Can you re-create your env, check with conda list that you have libnetcdf 4.4.0 build 3 (should be vc14_3), and test it with your large file please?

Please help a python neophyte properly update to libnetcdf 4.4.1.1, I wrecked my installation thinking I knew the right command. conda update ???

I don't recommend using conda update! I usually just delete recreate my envs. (But this is a discussion for another place/time. We already hijacked the netcdf4-python thread for too long 😄)

@mmartini-usgs
Copy link

@ocefpaf hmm... sorry to continue the hijack - I need a step by step tutorial on how to re-create my env. I thought I knew what I was doing before and clearly... I still don't understand anaconda and python install structure very well. I am thinking this is delete C:\Users\username\AppData\Local\Continuum\Miniconda3\envs\envinquestion and then redo the following that er- includes conda update?
conda config --add channels conda-forge --force
conda update --yes --all
conda env create --quiet --file environment.yml

@ocefpaf
Copy link
Collaborator

ocefpaf commented Feb 17, 2017

I need a step by step tutorial on how to re-create my env

No need to delete nor to re-configure. All you need to do is:

deactivate
conda env remove -n IOOS3
conda env create --file environment.yml

The deactivate is only to exit the env in case you are inside it.

@mmartini-usgs
Copy link

Many thanks, large files now work on my installation.
I did get this error:
ModuleNotFoundError: No module named 'jupyter_client.kernelspec'
WARNING conda.core.link:run_script(510): pre-unlink script failed for package co
nda-forge::nb_conda_kernels-2.0.0-py36_0
consider notifying the package maintainer

@jswhit
Copy link
Collaborator

jswhit commented Feb 22, 2017

That's great - so that confirms that the problem is fixed by recent updates to the C lib.

Closing the issue now.

@jswhit jswhit closed this as completed Feb 22, 2017
@TonyXiang8787
Copy link

TonyXiang8787 commented Mar 6, 2017

Dear fellows,

I have encountered the same problem as in this issue. As I understand, the libnetcdf version 4.4.0/4.4.1 and netCDF4 version 1.2.7 solved this problem.

However, although I have the latest version of Anaconda for Python 3.6, when I tried to install netCDF4 via "conda install netCDF4", the conda installer still installed libnetcdf 4.3.3.1 and netCDF4 1.2.4. How can I tell the conda installer to install the latest version of libnetcdf and netCDF4?

Any hints are appreciated.

update

I figured out myself: I need to use conda-forge channel to get the latest version. Sorry for bothering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests