Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #955: Avoid TypeError in read_file #957

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Bond099
Copy link

@Bond099 Bond099 commented Mar 5, 2025

I have added "all(isinstance(x, str) for x in df['geometry'])" which checks all elements in the geometry column are strings ,before applying gpd.GeoSeries.from_wkt.

@@ -354,8 +354,8 @@ def read_file(input, root_dir=None):
raise ValueError("No annotations in dataframe")
# If the geometry column is present, convert to geodataframe directly
if "geometry" in df.columns:
df['geometry'] = gpd.GeoSeries.from_wkt(df['geometry'])
df.crs = None
if all(isinstance(x, str) for x in df['geometry']):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can solve this issue with out specifically looping through the data frame?

Copy link
Author

@Bond099 Bond099 Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @henrykironde, we have a couple of options :
1)if isinstance(df['geometry'].iloc[0], str): This only checks the first element in the column.
2)if pd.api.types.infer_dtype(df['geometry'])== 'string': This can analyse the whole column without looping.
If the column has mixed types (e.g: some strings, some 'polygon' objects) the first option could fail, while the second one is more robust, which one do you think we should go with?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the option you feel works best and less complexity than looping

@ethanwhite
Copy link
Member

ethanwhite commented Mar 5, 2025

@Bond099 - what was the situation where you ended up with non-string data in the geometry column?

Sorry - I hadn't seen the initial issue.

@henrykironde henrykironde requested a review from bw4sz March 10, 2025 05:45
@bw4sz
Copy link
Collaborator

bw4sz commented Mar 11, 2025

@Bond099 can you show me the motivating example and confirm it no longer errors from the issue.

import pandas as pd
from deepforest import utilities
from deepforest import get_data 
import os

# Create a sample dataframe with annotations
df = pd.DataFrame({
    'xmin': [100],
    'ymin': [100], 
    'xmax': [200],
    'ymax': [200],
    'label': ['Tree'],
    'image_path': ['OSBS_029.tif']
})

root_dir = os.path.dirname(get_data("OSBS_029.tif"))
# First call to read_file works fine
df = utilities.read_file(df, root_dir=root_dir)

# Second call to read_file raises error if it's a dataframe
df = utilities.read_file(pd.DataFrame(df), root_dir=root_dir)

@Bond099
Copy link
Author

Bond099 commented Mar 11, 2025

Hey @bw4sz,

When you call read_file for the first time, it works perfectly because there’s no 'geometry' column in your DataFrame yet. The function takes your coordinates (like xmin, ymin, etc.), creates polygon shapes from them, and turns your DataFrame into a GeoDataFrame. That’s why the initial call succeeds without any issues.

But on the second call, it crashes with a TypeError. This happens because, by then, the 'geometry' column already exists and contains polygon objects (from the first call). The original function tries to process this column using gpd.GeoSeries.from_wkt, which expects text strings in WKT format—like "POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))". Since the column already has polygon objects instead of strings, it fails.

fix: I’ve added a simple check: pd.api.types.infer_dtype(df['geometry']) == 'string'. This looks at the entire 'geometry' column to determine its data type:

-If it’s strings (WKT format), the function converts them into polygon objects.
-If it’s already polygon objects, the function leaves them as is.

Hope this clears things up! Let me know if you have any questions or need further clarification

@bw4sz
Copy link
Collaborator

bw4sz commented Mar 13, 2025

Great, just for reproducibility, please show that code passing.

@Bond099
Copy link
Author

Bond099 commented Mar 13, 2025

image @bw4sz I have attached the screenshot of the code and its output .

Copy link
Contributor

@henrykironde henrykironde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm changes debug_predictions.datagrid

@henrykironde henrykironde added the Awaiting author contribution Waiting on the issue author to do something before proceeding label Mar 13, 2025
@henrykironde henrykironde removed the Awaiting author contribution Waiting on the issue author to do something before proceeding label Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants