Salting Partitions problem #2626
Unanswered
brianfhead
asked this question in
Q&A
Replies: 1 comment 4 replies
-
The problem does seem to be with |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm experiencing a problem after adding salting_partitions. I'm wondering if anything has changed with the syntax or you can help identify a problem with my code. I'm using this page to inform my approach. For a previous problem in the same step it was a change in syntax, which is why I'm asking.
Here's my code:
settings = { "link_type": "link_only" ,"blocking_rules_to_generate_predictions" : [ {"blocking_rule": "substr(l.mlg_addr,1,4) = substr(r.mlg_addr,1,4) and l.mlg_zip = r.mlg_zip", "salting_partitions": 200} ] "comparisons" : [ cl.DamerauLevenshteinAtThresholds("mlg_addr") ,cl.ExactMatch("mlg_zip").configure(term_frequency_adjustments=True) }
(this is a snippet of code I pulled out after commenting a bunch of other code out to try to isolate the problem).
Here's the error:
Py4JJavaError: An error occurred while calling o1318.checkpoint. : java.util.NoSuchElementException: key not found: 765
The error is basically the same one I was getting on the page I shared above where the problem was a change in syntax.
I'm dealing with a large data skew problem, thus the need for salting. Also, I've tried multiple values for the partitions. Given my skew problem I think what I have now is reasonable, but I'm pointing out I don't think that's the problem here as I've tried a small number (e.g., 4) too.
Beta Was this translation helpful? Give feedback.
All reactions