Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.to_string truncates long strings #9784

Closed
michaelaye opened this issue Apr 2, 2015 · 23 comments · Fixed by #28052
Closed

DataFrame.to_string truncates long strings #9784

michaelaye opened this issue Apr 2, 2015 · 23 comments · Fixed by #28052
Labels
good first issue Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@michaelaye
Copy link
Contributor

I am calling to_string() without any parameters and it beautifully fixed-formatted my dataframe apart from my very wide filename column, that is being truncated with "...". How can I avoid that?

                                            FILENAME  OBS_ID  XUV  
0  'mvn_iuv_l1a_IPH3-cycle00007-mode040-muvdark_2...      40  MUV  
1  'mvn_iuv_l1a_IPH2-cycle00047-mode050-muvdark_2...      50  MUV  
2  'mvn_iuv_l1a_apoapse-orbit00127-mode2001-muvda...    2001  MUV  
3  'mvn_iuv_l1a_APP1-orbit00087-mode1031-fuvdark_...    1031  FUV  
4  'mvn_iuv_l1a_IPH2-cycle00005-mode060-fuvdark_2...      60  FUV  

I tried calling it like this, but to no avail (same output):

with open('test_summary_out.txt','w') as f:
    f.write(summarydf.head().to_string(formatters={'filename':lambda x: "{:100}".format(x)}))

Version: 0.16 with Python 3.4

@dsm054
Copy link
Contributor

dsm054 commented Apr 2, 2015

I think it picks that option up from display.max_colwidth. Does pd.set_option("display.max_colwidth", 10000) have an effect?

@michaelaye
Copy link
Contributor Author

yes, that solved it, thx. I guess it should not pick that up for a to_string() operation, as that is not display? Or, well it is, but maybe one needs a different to_textfile() method that avoids this to be picked up.

@jreback
Copy link
Contributor

jreback commented Apr 2, 2015

@dsm054 I think it might be worthwhile to point this out in here, maybe in a note box?

@jreback jreback added this to the Next Major Release milestone Apr 2, 2015
@jreback jreback modified the milestones: 0.18.1, Next Major Release Apr 4, 2016
@jreback jreback closed this as completed in 610d3d5 Apr 4, 2016
@JaysonSunshine
Copy link

Is this issue resolved? I am getting this same issue on 0.20.3

@gfyoung
Copy link
Member

gfyoung commented Aug 9, 2017

There was no changes, just further documentation.

@JaysonSunshine
Copy link

Is the solution to modify the display settings? That seems pretty unsatisfactory.

@gfyoung
Copy link
Member

gfyoung commented Aug 9, 2017

@jreback : Thoughts?

@michaelaye
Copy link
Contributor Author

i would argue that the "to_string" method should be independent from a display setting for real-time analysis. A string object is not necessarily being used for display purposes.

@JaysonSunshine
Copy link

In my present case, I am accessing a Redshift admin table to get a table DDL. The data frame has just that column/DDL, but I want to modify it in memory using a string operation -- split(';'). I think the to_string operation should definitely not carryover any display settings.

@JaysonSunshine
Copy link

Or, at least, to have a parameter we can toggle. That could work.

@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.18.1 Aug 10, 2017
@jorisvandenbossche
Copy link
Member

Yes, I don't think the documentation addition really solved this issue.

The max_colwidth option is used by the DataFrameFormatter.to_string without being able to change it. At least we could add a keyword to be able to override it without needing to change the display settings.
But if you look at another option like display.max_rows or max_columns, those are ignored by to_string. So it even makes sense to ignore max_colwidth as well I think (anyhow, to be able to ignore the option, it will have to be added as a keyword anyhow, so the output formatting code can pass the correct setting).

@jorisvandenbossche
Copy link
Member

So for me, PR welcome for this!

@jorisvandenbossche
Copy link
Member

#1852 is probably a duplicate of this

@matanox
Copy link

matanox commented Jan 19, 2018

I can hardly see how the coupling of the display limit with any other processing helps in solving this benign scenario. And not really how padding the strings which I notice takes place as well, helps, outside the display scenario. If there's too much history behind it, would you recommend using the plain csv package of python, for reading strings without modifying them?

Here's a naive code sample, if it helps anyone ―

import csv
messages = []
with open("csv-file") as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',', quotechar='"')
    for row in reader:
        messages.append(row['message'])
messages

messages_df = pd.DataFrame(messages, columns=['message'])

# then concat to your main DataFrame...

This seems to avoid the truncation (but not the padding, which hurts a little with right-to-left text, as it pads as if the text is left-to-right, which kind of skews the semantics of the text more in the case of right-to-left text)

@Momut1
Copy link

Momut1 commented Mar 25, 2019

two days debugging to find out this was the issue. i'm sad...

@simonjayhawkins
Copy link
Member

see also #24841 for fix in to_html

@addahlin
Copy link

This was incredibly frustrating to debug. I was executing the code below and getting "..." in my output. I assumed it was just printing the "..." to the console, not in the dataframe! While I'm sure this isn't a
"good" approach to wrapping text in a tag, it's the most obvious way when starting out. I'm sure many people will do this and no one would expect this behavior.

df["ValueType"] = "<strong>" + df["ValueType"] == "Portfolio"] + "</strong>"

If this were my first experience with Pandas, I'd promptly throw it in the trash. Note: I love Pandas and thank you everyone for amazing work you do! I just wanted to share my experience.

@rswgnu
Copy link

rswgnu commented Jul 29, 2019

yes, that solved it, thx. I guess it should not pick that up for a to_string() operation, as that is not display? Or, well it is, but maybe one needs a different to_textfile() method that avoids this to be picked up.

With Pandas 0.25.0, setting display.max_colwidth to a large number stops the truncation but when trying to left justify columns with df.to_string(justify='left'), that same display setting somehow pads columns on the left so they are not left aligned. Is there any present way to prevent truncation and get left justified string columns when output to a terminal? I know a pull request is in process but I would like to do this now. Thanks.

@yamen321
Copy link

May I ask what the use case of having the to_string method dependent on the display.max_colwidth option? I can't seem to understand why one would ever ask for a DataFrame row as a string with truncated column values

@TomAugspurger
Copy link
Contributor

@yamen321 I think it's agreed that to_string shouldn't truncate. Are you interested in working on it?

@lshepard
Copy link

Hi! I jumped in on the "good first issue" label and put up a PR to solve this. Feedback very welcome.

@yamen321
Copy link

Thanks a lot for taking the initiative on this @lshepard!

@simonjayhawkins simonjayhawkins added the Output-Formatting __repr__ of pandas objects, to_string label Aug 25, 2019
@TomAugspurger TomAugspurger modified the milestones: Contributions Welcome, 1.0 Aug 30, 2019
@santhoshnumberone
Copy link

santhoshnumberone commented Oct 21, 2019

Hey

I have a dataframe column with url file name like

0 http://address/filename1.jpg
1 http://address/filename2.jpg
Name: fileUrl, dtype: object

I want to extract the filenames from the url

so

from pathlib import Path
filenamelist = df.apply(lambda x: Path(x.to_string()).name if x.name == 'fileUrl' else x)

I just want the file

0 filename1.jpg
1 filename2.jpg

If the filename is long string my output looks like

filena...
filena...

df.fileUrl.max_colwidth = 100 not solving the issue

though using dataframe would be much faster than

select the column
iterate through the column elements
extract the name

Any work around here, instead of this?

filenames_list = [str(Path(x).name) for x in list(df['fileUrl'])]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet