Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose similarity matching item pairs in Python library (aka crosswalk table) #62

Closed
woodthom2 opened this issue Oct 29, 2024 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@woodthom2
Copy link
Contributor

woodthom2 commented Oct 29, 2024

Task make a function called generate_crosswalk_table(all_questions, similarity, threshold) which takes the output of match_instruments and gives the pairs that match above a threshold.

Description

The web UI allows users to see the matching item pairs above a given threshold

Can we make the Python library also return the matching pairs above a threshold? This is called the crosswalk table

image

A crosswalk table is the same information as is currently coming back in the similarity matrix but just in a different format

It is a long-format data frame that shows each matching pair of questions above a certain threshold, along with their respective IDs, question texts, and match scores. Here's an example structure:

# Example structure of crosswalk table DataFrame:

# tibble [n × 6]

# $ pair_name      : chr  # Name of the survey pair

# $ question1_no   : chr  # ID of question from first survey

# $ question1_text : chr  # Text of question from first survey

# $ question2_no   : chr  # ID of question from second survey

# $ question2_text : chr  # Text of question from second survey

# $ match_score    : num  # Similarity score between the questions

See also equivalent issue in R: harmonydata/harmony_r#4

@vkrithika25
Copy link
Contributor

Hi! I would love to work on this issue if possible?

@woodthom2
Copy link
Contributor Author

woodthom2 commented Nov 5, 2024

Hi @vkrithika25 thanks so much for offering. Nobody is doing this yet so please feel free to take it on! Let me know if you'd like to have a call about it or if it's clear.

I also have an FAQ entry illustrating the crosswalk table which might be helpful.

Feel free to message me in Discord - my name is Thomas Wood

Also, do you know any R? If you're able to take on the related R issue that would be so helpful too! thanks!

woodthom2 added a commit that referenced this issue Nov 17, 2024
#62 Added crosswalk table + unit tests
@woodthom2
Copy link
Contributor Author

Thanks Krithika!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants