Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.to_dict() accepts orient which are not in the list of options #32515

Closed
elmonsomiat opened this issue Mar 7, 2020 · 11 comments · Fixed by #32516
Closed

df.to_dict() accepts orient which are not in the list of options #32515

elmonsomiat opened this issue Mar 7, 2020 · 11 comments · Fixed by #32516
Assignees
Labels
DataFrame DataFrame data structure good first issue
Milestone

Comments

@elmonsomiat
Copy link
Contributor

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame(
    {'name': ['alice', 'bob'],
     'age': [30, 28]})

df.to_dict(orient='racoon')

returns:

[{'age': 30, 'name': 'alice'}, {'age': 28, 'name': 'bob'}]

instead of ValueError

Problem description

The current version of pandas accepts orient= any word which starts with r, l, s, sp, i and d instead of just the actual options in the documentation.

The function should be returning ValueError if the orient option is not any of the following: {'dict', 'list', 'series', 'split', 'records', 'index'}.

Not fixing this might lead to mistakes by using an orient which might point to a wrong output. I would not consider this a bug but this should be consistent with other methods such as df.to_json(). The part of the code which needs to be fixed in master is:

        if orient.lower().startswith("d"):
            return into_c((k, v.to_dict(into)) for k, v in self.items())
        elif orient.lower().startswith("l"):
            return into_c((k, v.tolist()) for k, v in self.items())
        elif orient.lower().startswith("sp"):
            return into_c(
                (
                    ("index", self.index.tolist()),
                    ("columns", self.columns.tolist()),
                    (
                        "data",
                        [
                            list(map(com.maybe_box_datetimelike, t))
                            for t in self.itertuples(index=False, name=None)
                        ],
                    ),
                )
            )
        elif orient.lower().startswith("s"):
            return into_c((k, com.maybe_box_datetimelike(v)) for k, v in self.items())
        elif orient.lower().startswith("r"):
            columns = self.columns.tolist()
            rows = (
                dict(zip(columns, row))
                for row in self.itertuples(index=False, name=None)
            )
            return [
                into_c((k, com.maybe_box_datetimelike(v)) for k, v in row.items())
                for row in rows
            ]
        elif orient.lower().startswith("i"):
            if not self.index.is_unique:
                raise ValueError("DataFrame index must be unique for orient='index'.")
            return into_c(
                (t[0], dict(zip(self.columns, t[1:])))
                for t in self.itertuples(name=None)
            )
        else:
            raise ValueError(f"orient '{orient}' not understood")

by replacing the .lower().startswith() with == and the corresponding string
I with now open a PR to fix the issue. THE BUG IS STILL IN MASTER

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-88-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.24.2
pytest: None
pip: 19.3.1
setuptools: 41.6.0
Cython: None
numpy: 1.17.4
scipy: None
pyarrow: None
xarray: None
IPython: 7.9.0
sphinx: None
patsy: None
dateutil: 2.8.1
pytz: 2019.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.3.11
pymysql: 0.9.3
psycopg2: None
jinja2: 2.10.3
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@ShaharNaveh
Copy link
Member

take

@elmonsomiat
Copy link
Contributor Author

@MomIsBestFriend will put a PR up now

@ShaharNaveh ShaharNaveh added DataFrame DataFrame data structure good first issue labels Mar 7, 2020
@ShaharNaveh
Copy link
Member

@elmonsomiat You want to take a shot at it?

You can assign yourself by commenting the word take (and nothing else, just take).

@ShaharNaveh ShaharNaveh removed their assignment Mar 7, 2020
@elmonsomiat
Copy link
Contributor Author

take

@elmonsomiat
Copy link
Contributor Author

@MomIsBestFriend Thanks! this is my first contribution to pandas, let's try it out! :)

@ShaharNaveh
Copy link
Member

ShaharNaveh commented Mar 7, 2020

@elmonsomiat Good luck!


I'm here if you need a hint :)

@MillanSharma
Copy link

take

@elmonsomiat
Copy link
Contributor Author

take

@elmonsomiat
Copy link
Contributor Author

@MillanSharma there is a PR (#32516) awaiting for approval for this issue

@jreback jreback added this to the 1.1 milestone Mar 14, 2020
@luggie
Copy link

luggie commented Dec 7, 2020

df.to_dict(orient='records')
df.to_dict('records')
both raise
Using short name for 'orient' is deprecated. Only the options: ('dict', list, 'series', 'split', 'records', 'index') will be used in a future version. Use one of the above to silence this warning.
with pandas 1.1.4

@simonjayhawkins
Copy link
Member

@luggie have you got a minimal reproducible example? If so please open a new issue.

I'm not seeing any warnings or exceptions raised.

>>> import pandas as pd
>>> pd.__version__
'1.1.4'
>>>
>>> df = pd.DataFrame(
...     {'name': ['alice', 'bob'],
...      'age': [30, 28]})
>>>
>>> df.to_dict('records')
[{'name': 'alice', 'age': 30}, {'name': 'bob', 'age': 28}]
>>>
>>> df.to_dict(orient='records')
[{'name': 'alice', 'age': 30}, {'name': 'bob', 'age': 28}]
>>>

jennydaman added a commit to jennydaman/codecarbon that referenced this issue Jul 23, 2023
to_dict("rows") is not a valid value for parameter orient.

pandas-dev/pandas#32515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants