Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector PRF #25

Merged
merged 12 commits into from
Nov 26, 2024
Merged

Vector PRF #25

merged 12 commits into from
Nov 26, 2024

Conversation

cmacdonald
Copy link
Collaborator

WIP.

Our results:
image

Results from Pyserini:
image

(I think they use FAISS as a retrieval backend)

@seanmacavaney
Copy link
Collaborator

Looks great, thanks! I prefer class-based Transformer definitions for stuff that isn't ad hoc. I'm happy to convert it to that format if you're okay with it?

@cmacdonald
Copy link
Collaborator Author

I guess the advantage of a transformer is that its parameters can be tuned in a gridsearch.

I think we should commit a notebook to the examples folder. I have one that can be associted.

@seanmacavaney
Copy link
Collaborator

Sounds good. Yeah, there are other benefits too, like you can get a clearer repr.

I think more extensive tests would also be good. I just put in one basic one to make sure I didn’t mess something up in the new version.

@seanmacavaney
Copy link
Collaborator

Thoughts on replacing this boilerplate with a decorator?

# BEFORE:

    def transform(self, inp: pd.DataFrame) -> pd.DataFrame:
        """Transforms the input DataFrame query-by-query."""
        return pt.apply.by_query(self.transform_by_query, add_ranks=False)(inp)

    def transform_by_query(self, inp: pd.DataFrame) -> pd.DataFrame:
        pta.validate.result_frame(inp, extra_columns=['query', 'query_vec', 'doc_vec'])
        ...

# AFTER:

    @pta.transform.by_query(add_ranks=False)
    def transform(self, inp: pd.DataFrame) -> pd.DataFrame:
        pta.validate.result_frame(inp, extra_columns=['query', 'query_vec', 'doc_vec'])
        ...

It could live in alpha as we refine on the idea/name/etc.

@cmacdonald
Copy link
Collaborator Author

sure.

@cmacdonald
Copy link
Collaborator Author

It's a fairly niche style, thus far. Perhaps setup a gist so we can continue discussion there. I guess this in effect making transform() methods that can be slightly specialised (eg guaranteed to be called once per query) without needing to have multiple more transform_*() methods in the API specification

@seanmacavaney
Copy link
Collaborator

Yeah, I'd rather not have any more transform_*() definitions in pt.Transformer if possible.

I'm also no stranger to just doing a groupby in the definition. It ends up being a few lines of boilerplate, but nothing crazy.

@seanmacavaney
Copy link
Collaborator

Alright, I'm happy with this version.

I went ahead and added @pta.transform.by_query to alpha. I'm happy to discuss further for a version to include in the core API, but I didn't want to hold up this PR on those discussions.

@cmacdonald
Copy link
Collaborator Author

ok, I'll make a separate PR with a notebook.

@cmacdonald
Copy link
Collaborator Author

Going to merge it?

@seanmacavaney
Copy link
Collaborator

Ok! Sounds like it's urgent.

@seanmacavaney seanmacavaney merged commit 6fe2564 into master Nov 26, 2024
3 checks passed
@seanmacavaney seanmacavaney deleted the prf branch November 26, 2024 11:29
@seanmacavaney seanmacavaney mentioned this pull request Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants