Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

11578 make search-filters request.method agnostic #11582

Open
wants to merge 14 commits into
base: dev/7.6.x
Choose a base branch
from

Conversation

whatisgalen
Copy link
Member

@whatisgalen whatisgalen commented Oct 27, 2024

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Description of Change

Issues Solved

Partly addresses #11578

Checklist

  • I targeted one of these branches:
    • dev/8.0.x (under development): features, bugfixes not covered below
    • dev/7.6.x (main support): regressions, crashing bugs, security issues, major bugs in new features
    • dev/6.2.x (extended support): major security issues, data loss issues
  • I added a changelog in arches/releases
  • I submitted a PR to arches-docs (if appropriate)
  • Unit tests pass locally with my changes
  • I added tests that prove my fix is effective or that my feature works
  • My test fails on the target branch

Accessibility Checklist

Developer Guide

Topic Changed Retested
Color contrast
Form fields
Headings
Links
Keyboard
Responsive Design
HTML validation
Screen reader

Ticket Background

  • Sponsored by:
  • Found by: @
  • Tested by: @
  • Designed by: @

Further comments

@whatisgalen whatisgalen changed the title 11578 search req method 11578 make search-filters request.method agnostic Oct 27, 2024
@chiatt
Copy link
Member

chiatt commented Oct 28, 2024

This change doesn't seem like it addresses a crashing bug or anything that wasn't already in previous versions of Arches. I could be wrong, but it seems like the target should be 8.0.

@whatisgalen whatisgalen changed the base branch from dev/7.6.x to dev/8.0.x October 28, 2024 22:18
@whatisgalen whatisgalen marked this pull request as draft October 28, 2024 22:18
@whatisgalen
Copy link
Member Author

whatisgalen commented Oct 29, 2024

This change doesn't seem like it addresses a crashing bug or anything that wasn't already in previous versions of Arches. I could be wrong, but it seems like the target should be 8.0.

One could argue that issues like #10630 are crashing bugs. Previously, to crash the vanilla map-filter on the frontend you would have had to draw an unrealistic number of vertices. However, given the new "feature by filter" functionality in 7.6.0, you can select an existing geometry that might have a very high vertex-count and this breaks the entire search request as it could exceed the number of characters permitted in the GET header.

The vanilla map filter looks like this on the backend:

class MapFilter(BaseSearchFilter):
    def append_dsl(self, search_query_object, **kwargs):
        permitted_nodegroups = kwargs.get("permitted_nodegroups")
        include_provisional = kwargs.get("include_provisional")
        search_query = Bool()
        querystring_params = kwargs.get("querystring", "{}")

so it's actually already request-agnostic; a developer could make a search-view that sends a POST request instead of a GET request. The issue is that other filters are hard-coded to expect a GET request, so if those other filters were used in combination with a map-filter POST request, only part of the search request would be fulfilled. Those other search filters are: paging-filter, sort-results, and term-filter. The standard search-view already looks for a POST request or a GET request so the scope is just limited to making sure that gets honored fully in all search filters.

@whatisgalen whatisgalen changed the base branch from dev/8.0.x to dev/7.6.x October 29, 2024 19:44
@whatisgalen whatisgalen marked this pull request as ready for review October 29, 2024 19:49
@whatisgalen
Copy link
Member Author

whatisgalen commented Nov 14, 2024

Regarding my comment on PR #10827:

Per my comment, the featureid was removed from the request payload to the backend (map-filter) because if the user edited the drawn polygon on the map, (for example to cover a larger spatial query area) the feature identified by featureid would differ from the query geometry, and there wasn't a great way to confirm whether the geometry had actually changed (given potential precision differences). This would have yielded search results out of sync with the search query.

However, because the only payload is the geojson feature itself, this means large coordinate sets can exceed the character limit on the GET request header (and url length in the browser). Unfortunately, those requests then fail, hence #11578 and the need for #11582

...There's now a solid category of search requests that will always fail whenever a high-vertex feature is used as a map filter. This means that the feature is inherently bugged until the search request can be sent as a POST request rather than GET.

@apeters apeters self-requested a review November 15, 2024 00:09
Copy link
Member

@apeters apeters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments about structure.

  1. I think we should push most of the logic to parse the request into the base_search_view.py module and centralize it there.
  2. Just like we put the the request object on self, we should put the search_request (currently called search_request_object) on self as well.
  3. We should then not have to pass it into the methods on the BaseSearchFilter

Do that first and then I'll re-review.

@@ -132,15 +133,16 @@ def append_dsl(self, search_query_object, **kwargs):
search_query_object["query"].include("map_popup")
search_query_object["query"].include("provisional_resource")
search_query_object["query"].include("permissions")
load_tiles = get_str_kwarg_as_bool("tiles", self.request.GET)
load_tiles = get_str_kwarg_as_bool("tiles", search_request_object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clarity, and to improve on something that should have been done in the first place, this should default to False and not to search_request_object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually already does default to False, see full method from string_utils.py:

def get_str_kwarg_as_bool(
    key, request_dict: Union[Dict[str, Any], QueryDict], default: bool = False
) -> bool:
    value = request_dict.get(key, str(default))
    if isinstance(value, bool):
        return value
    return str_to_bool(str(value))

total = int(self.request.GET.get("total", "0"))
resourceinstanceid = self.request.GET.get("id", None)
search_request_object = kwargs.get("search_request_object", self.request.GET)
for_export = get_str_kwarg_as_bool("export", search_request_object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, should default to False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

@whatisgalen
Copy link
Member Author

whatisgalen commented Nov 20, 2024

General comments about structure.

  1. I think we should push most of the logic to parse the request into the base_search_view.py module and centralize it there.

✅ (moved it into SearchFilterFactory which leverages the SearchView.create_query_dict() method anyhow)

  1. Just like we put the the request object on self, we should put the search_request (currently called search_request_object) on self as well.

  1. We should then not have to pass it into the methods on the BaseSearchFilter

Do that first and then I'll re-review.

@whatisgalen whatisgalen requested a review from apeters November 20, 2024 08:00
@@ -50,8 +50,9 @@ define([
doQuery: function() {
const queryObj = JSON.parse(this.queryString());
if (self.updateRequest) { self.updateRequest.abort(); }
const requestMethod = JSON.stringify(queryObj).length > 1800 ? "POST" : "GET";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did this number come from? 2048 seems like the most Edge can handle but most other browsers much more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves a generous buffer in case the host is really long as can sometimes be the case with development hosts using Azure, for example: "https://longsubdomain.azure.cloudapp.westus3.developmentserverlongname." Is there a reason you think we should increase this limit + forego compatibility with Edge?

Copy link
Contributor

@bferguso bferguso Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another option is to build the full GET URL string and compare that to the 2048 limit? May need to URL Encode the query string to as that can add a large number of characters to the URL (eg: " " -> "%20", etc).

Something like this might work to get the full URL length?:

const urlLength = (window.location.origin+arches.urls.export_results+"?"+$.param(payload)).length

Also - there's at least one other place that we'll need to handle this on the front end. Maybe worth wrapping the logic in a separate util function as it feels like this may be implemented in a number of places?

$.ajax({
type: "GET",
url: arches.urls.export_results,
data: payload

@@ -95,6 +102,7 @@ def __init__(self, request=None, user=None, componentname=None):
item.componentname, float("inf")
),
)
self.search_request = self.create_query_dict(self.search_request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this is neccessary.

Copy link
Member Author

@whatisgalen whatisgalen Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_query_dict is a searchview class method, so if your search view has logic, for example, whitelisting/blacklisting certain query parameters, this method is where that's happening. I can see however that because this variable is called search_request, it should probably have some of the other properties of the request object like user etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also due to the inheritance logic of search-view classes, setting the correct search-view on a blank request here:

def create_query_dict(self, query_dict):
        # check that all searchview required linkedSearchFilters are present
        query_dict[self.searchview_component.componentname] = True # set the database default search-view

... is what enables the python class of that search-view to be initialized. I know it looks circular but this came about as an intentional fix.

@whatisgalen
Copy link
Member Author

making a shortlist of outstanding items on this PR to remind myself later and also lead in to further review:

  • need to add csrf excemption decorator to search_results()
  • need to reconfigure the SearchResultsExporter class' __init__ method also be request-method-agnostic

@whatisgalen whatisgalen requested a review from apeters January 2, 2025 08:02
@apeters
Copy link
Member

apeters commented Jan 10, 2025

I realize that we're going to need some tests for this.

Comment on lines 91 to +94
if not self.request:
searchview_component_name = None
elif self.request.method == "POST":
searchview_component_name = self.request.POST.get("search-view", None)
else:
searchview_component_name = self.request.GET.get("search-view", None)
searchview_component_name = self.search_request.get("search-view", None)
Copy link
Member

@apeters apeters Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need the "else" portion of this statement. self.request is already assumed to exist in the __init__ method. If it doesn't exist, then that method will fail. Will self.request ever be None?

Comment on lines +49 to +70
if search_request.method == "GET":
search_request.GET = search_request.GET.copy()
search_request.GET["tiles"] = True
search_request.GET["export"] = True
self.report_link = search_request.GET.get("reportlink", False)
self.format = search_request.GET.get("format", "tilecsv")
self.export_system_values = get_str_kwarg_as_bool(
"exportsystemvalues", search_request.GET
)
self.compact = search_request.GET.get("compact", True)
self.precision = int(search_request.GET.get("precision", 5))
else:
search_request.POST = search_request.POST.copy()
search_request.POST["tiles"] = True
search_request.POST["export"] = True
self.report_link = search_request.POST.get("reportlink", False)
self.format = search_request.POST.get("format", "tilecsv")
self.export_system_values = get_str_kwarg_as_bool(
"exportsystemvalues", search_request.POST
)
self.compact = search_request.POST.get("compact", True)
self.precision = int(search_request.POST.get("precision", 5))
Copy link
Member

@apeters apeters Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this if/else could be boiled down to a single chunk of code using something like this:

        the_request = getattr(search_request, method).copy()
        the_request["tiles"] = True
        the_request["export"] = True
        self.report_link = the_request.get("reportlink", False)
        self.format = the_request.get("format", "tilecsv")
        self.export_system_values = get_str_kwarg_as_bool(
            "exportsystemvalues", the_request
        )
        self.compact = the_request.get("compact", True)
        self.precision = int(the_request.get("precision", 5))
        setattr(search_request, method, the_request)

@@ -349,6 +350,7 @@ def get_dsl_from_search_string(request):
return JSONResponse(dsl)


@csrf_exempt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the need for this?

@@ -16,7 +16,7 @@

class SortResults(BaseSearchFilter):
def append_dsl(self, search_query_object, **kwargs):
sort_param = self.request.GET.get(self.componentname, None)
sort_param = kwargs.get("querystring", None)
Copy link
Member

@apeters apeters Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be

sort_param = self.search_request.get(self.componentname, None)

request_object = (
self.request.GET if self.request.method == "GET" else self.request.POST
)
self.search_request = request_object.dict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should rename this to "self.querystring" because essentially that's what it is. This would reduce confusion with "self.request"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants