-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #955: Avoid TypeError in read_file #957
base: main
Are you sure you want to change the base?
Conversation
…type consistency
src/deepforest/utilities.py
Outdated
@@ -354,8 +354,8 @@ def read_file(input, root_dir=None): | |||
raise ValueError("No annotations in dataframe") | |||
# If the geometry column is present, convert to geodataframe directly | |||
if "geometry" in df.columns: | |||
df['geometry'] = gpd.GeoSeries.from_wkt(df['geometry']) | |||
df.crs = None | |||
if all(isinstance(x, str) for x in df['geometry']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can solve this issue with out specifically looping through the data frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @henrykironde, we have a couple of options :
1)if isinstance(df['geometry'].iloc[0], str): This only checks the first element in the column.
2)if pd.api.types.infer_dtype(df['geometry'])== 'string': This can analyse the whole column without looping.
If the column has mixed types (e.g: some strings, some 'polygon' objects) the first option could fail, while the second one is more robust, which one do you think we should go with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the option you feel works best and less complexity than looping
Sorry - I hadn't seen the initial issue. |
@Bond099 can you show me the motivating example and confirm it no longer errors from the issue.
|
Hey @bw4sz, When you call read_file for the first time, it works perfectly because there’s no 'geometry' column in your DataFrame yet. The function takes your coordinates (like xmin, ymin, etc.), creates polygon shapes from them, and turns your DataFrame into a GeoDataFrame. That’s why the initial call succeeds without any issues. But on the second call, it crashes with a TypeError. This happens because, by then, the 'geometry' column already exists and contains polygon objects (from the first call). The original function tries to process this column using gpd.GeoSeries.from_wkt, which expects text strings in WKT format—like "POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))". Since the column already has polygon objects instead of strings, it fails. fix: I’ve added a simple check: pd.api.types.infer_dtype(df['geometry']) == 'string'. This looks at the entire 'geometry' column to determine its data type: -If it’s strings (WKT format), the function converts them into polygon objects. Hope this clears things up! Let me know if you have any questions or need further clarification |
Great, just for reproducibility, please show that code passing. |
![]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm changes debug_predictions.datagrid
I have added "all(isinstance(x, str) for x in df['geometry'])" which checks all elements in the geometry column are strings ,before applying
gpd.GeoSeries.from_wkt
.