Fix #955: Avoid TypeError in read_file #957

Bond099 · 2025-03-05T00:15:24Z

I have added "all(isinstance(x, str) for x in df['geometry'])" which checks all elements in the geometry column are strings ,before applying gpd.GeoSeries.from_wkt.

…type consistency

henrykironde · 2025-03-05T05:41:34Z

src/deepforest/utilities.py

@@ -354,8 +354,8 @@ def read_file(input, root_dir=None):
            raise ValueError("No annotations in dataframe")
        # If the geometry column is present, convert to geodataframe directly
        if "geometry" in df.columns:
-            df['geometry'] = gpd.GeoSeries.from_wkt(df['geometry'])
-            df.crs = None
+            if all(isinstance(x, str) for x in df['geometry']):


Is there a way we can solve this issue with out specifically looping through the data frame?

Hey @henrykironde, we have a couple of options :
1)if isinstance(df['geometry'].iloc[0], str): This only checks the first element in the column.
2)if pd.api.types.infer_dtype(df['geometry'])== 'string': This can analyse the whole column without looping.
If the column has mixed types (e.g: some strings, some 'polygon' objects) the first option could fail, while the second one is more robust, which one do you think we should go with?

Use the option you feel works best and less complexity than looping

ethanwhite · 2025-03-05T16:09:07Z

~~@Bond099 - what was the situation where you ended up with non-string data in the geometry column?~~

Sorry - I hadn't seen the initial issue.

bw4sz · 2025-03-11T20:57:00Z

@Bond099 can you show me the motivating example and confirm it no longer errors from the issue.

import pandas as pd
from deepforest import utilities
from deepforest import get_data 
import os

# Create a sample dataframe with annotations
df = pd.DataFrame({
    'xmin': [100],
    'ymin': [100], 
    'xmax': [200],
    'ymax': [200],
    'label': ['Tree'],
    'image_path': ['OSBS_029.tif']
})

root_dir = os.path.dirname(get_data("OSBS_029.tif"))
# First call to read_file works fine
df = utilities.read_file(df, root_dir=root_dir)

# Second call to read_file raises error if it's a dataframe
df = utilities.read_file(pd.DataFrame(df), root_dir=root_dir)

Bond099 · 2025-03-11T23:08:17Z

Hey @bw4sz,

When you call read_file for the first time, it works perfectly because there’s no 'geometry' column in your DataFrame yet. The function takes your coordinates (like xmin, ymin, etc.), creates polygon shapes from them, and turns your DataFrame into a GeoDataFrame. That’s why the initial call succeeds without any issues.

But on the second call, it crashes with a TypeError. This happens because, by then, the 'geometry' column already exists and contains polygon objects (from the first call). The original function tries to process this column using gpd.GeoSeries.from_wkt, which expects text strings in WKT format—like "POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))". Since the column already has polygon objects instead of strings, it fails.

fix: I’ve added a simple check: pd.api.types.infer_dtype(df['geometry']) == 'string'. This looks at the entire 'geometry' column to determine its data type:

-If it’s strings (WKT format), the function converts them into polygon objects.
-If it’s already polygon objects, the function leaves them as is.

Hope this clears things up! Let me know if you have any questions or need further clarification

bw4sz · 2025-03-13T17:08:14Z

Great, just for reproducibility, please show that code passing.

Bond099 · 2025-03-13T19:12:05Z

@bw4sz I have attached the screenshot of the code and its output .

henrykironde

rm changes debug_predictions.datagrid

Fix weecology#955: Avoid TypeError in read_file by checking geometry …

a412abc

…type consistency

henrykironde reviewed Mar 5, 2025

View reviewed changes

Use infer_dtype instead of looping

94dd769

henrykironde requested a review from bw4sz March 10, 2025 05:45

henrykironde requested changes Mar 13, 2025

View reviewed changes

henrykironde added the Awaiting author contribution Waiting on the issue author to do something before proceeding label Mar 13, 2025

Removed debug_predictions.datagrid

d4ee93c

henrykironde approved these changes Mar 14, 2025

View reviewed changes

henrykironde removed the Awaiting author contribution Waiting on the issue author to do something before proceeding label Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #955: Avoid TypeError in read_file #957

Fix #955: Avoid TypeError in read_file #957

Bond099 commented Mar 5, 2025

henrykironde Mar 5, 2025

Bond099 Mar 5, 2025 •

edited

Loading

henrykironde Mar 5, 2025

ethanwhite commented Mar 5, 2025 •

edited

Loading

bw4sz commented Mar 11, 2025

Bond099 commented Mar 11, 2025

bw4sz commented Mar 13, 2025

Bond099 commented Mar 13, 2025

henrykironde left a comment

Fix #955: Avoid TypeError in read_file #957

Are you sure you want to change the base?

Fix #955: Avoid TypeError in read_file #957

Conversation

Bond099 commented Mar 5, 2025

henrykironde Mar 5, 2025

Choose a reason for hiding this comment

Bond099 Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

henrykironde Mar 5, 2025

Choose a reason for hiding this comment

ethanwhite commented Mar 5, 2025 • edited Loading

bw4sz commented Mar 11, 2025

Bond099 commented Mar 11, 2025

bw4sz commented Mar 13, 2025

Bond099 commented Mar 13, 2025

henrykironde left a comment

Choose a reason for hiding this comment

Bond099 Mar 5, 2025 •

edited

Loading

ethanwhite commented Mar 5, 2025 •

edited

Loading