Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility with Pandas 1.1.0 #25

Open
lizziel opened this issue Aug 13, 2020 · 0 comments
Open

Incompatibility with Pandas 1.1.0 #25

lizziel opened this issue Aug 13, 2020 · 0 comments

Comments

@lizziel
Copy link

lizziel commented Aug 13, 2020

I ran into an error reading a GEOS-Chem bpch file after upgrading to Pandas to 1.1.0. I traced the problem to this section of code in util/diaginfo.py:

    tracer_df = (
        tracer_df
            .apply(_assign_hydrocarbon, axis=1)
            .assign(chemical=lambda x: x['molwt'].astype(bool))
    )

Before that code is executed tracer_df correctly stores tracerinfo.dat content:

       name                       full_name    molwt  C  tracer         scale  \
0      ACET                     ACET tracer  0.01200  3       1  1.000000e+09   
1      ACTA                     ACTA tracer  0.06006  1       2  1.000000e+09   
2      AERI                     AERI tracer  0.12690  1       3  1.000000e+09   
3      ALD2                     ALD2 tracer  0.01200  2       4  1.000000e+09  

Following the apply, all rows are for ACET which is wrong:

      name    full_name  molwt  C  tracer         scale  unit  hydrocarbon  \
0     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   
1     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   
2     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   
3     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   

I was able to fix it by initializing the new column 'hydrocarbon' prior to the apply:

    tracer_df['hydrocarbon']=False                                                                     
    tracer_df = (
        tracer_df
            .apply(_assign_hydrocarbon, axis=1)
            .assign(chemical=lambda x: x['molwt'].astype(bool))
    )

I downgraded my pandas version to 0.25.1 and verified this was not necessary in that older version, but it is in the new version.

Here is the error message I got to help others find this issue via search:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'name'
The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call last)
<ipython-input-43-2d7bd5a6928f> in <module>
      2     ds = xb.open_bpchdataset(filename=gcc_bpch,
      3                              tracerinfo_file=tracerinfo_f,
----> 4                              diaginfo_file=diaginfo_f)
      5 except FileNotFoundError:
      6     print('Could not find file {}'.format(bpchfile))
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/core.py in open_bpchdataset(filename, fields, categories, tracerinfo_file, diaginfo_file, endian, decode_cf, memmap, dask, return_store)
     79         tracerinfo_file=tracerinfo_file,
     80         diaginfo_file=diaginfo_file, endian=endian,
---> 81         use_mmap=memmap, dask_delayed=dask
     82     )
     83     ds = xr.Dataset.load_store(store)
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/core.py in __init__(self, filename, fields, categories, fix_cf, mode, endian, diaginfo_file, tracerinfo_file, use_mmap, dask_delayed)
    278 
    279         # Parse the binary file and prepare to add variables to the DataStore
--> 280         self._bpch._read_var_data()
    281 
    282         # Create storage dicts for variables and attributes, to be used later
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/bpch.py in _read_var_data(self)
    312             var_attr['unit'] = unit
    313 
--> 314             vname = diag['name']
    315             fullname = category_name.strip() + "_" + vname
    316 
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2897             if self.columns.nlevels > 1:
   2898                 return self._getitem_multilevel(key)
-> 2899             indexer = self.columns.get_loc(key)
   2900             if is_integer(indexer):
   2901                 indexer = [indexer]
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 
   2893         if tolerance is not None:
KeyError: 'name'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant