-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SparseLinear_v2
, fixing indexing issues
#754
Add SparseLinear_v2
, fixing indexing issues
#754
Conversation
`SparseLinear` does not correctly index the gradient/weight matrix (explosion#752). This change fixes the indexing, so that the full matrix is used. To retain compatibility with existing models that use `SparseLinear`, which works relatively well if there are not too many hash collisions, the fixed version is renamed to `SparseLinear_v2`. Thanks to @sriram7797 for reporting this issue!
The output of MurMur hashes were mapped to array indices as follows: ``` idx = hash & (nr_weight-1) ``` This works well when `nr_weight` is a power of two. For instance, if we have 16 buckets: ``` idx = hash & 15 idx = hash & 0b1111 ``` However, when the user uses a bucket count that is not a power of two, this breaks down. For instance, if we have 15 buckets: ``` idx = hash & 14 idx = hash & 0b1110 ``` This would mask out all odd indices. We fix this by using the modulus instead. To preserve compatibility with existing models, this change is only added to `SparseLinear_v2`.
Do you really need to do the older version of the hash mapping at all? |
I don't know. If someone used
their models would break by changing from bit masking to modulo. |
But with this as |
I am not sure I follow? There may be existing uses for |
Ah, I thought more was redefined than actually is. I think it would be clearer to call it something like |
Changed the name to |
You'll need to merge in the latest from |
Merged. |
@explosion-bot please test_slow_gpu |
URL: https://buildkite.com/explosion-ai/thinc-slow-gpu-tests/builds/27 |
Would the slow GPU tests pass after the changes Madeesh made for tensorflow? |
The fix for the failing tests is in this commit. So, performing a new rebase on |
@explosion-bot please test_slow_gpu |
URL: https://buildkite.com/explosion-ai/thinc-slow-gpu-tests/builds/36 |
@explosion-bot please test_slow_gpu |
URL: https://buildkite.com/explosion-ai/thinc-slow-gpu-tests/builds/37 |
A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two.
* Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
…13149) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: explosion/thinc#754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import
Introduce
SparseLinear_v2
to fix indexing issuesSparseLinear
does not correctly index the gradient/weight matrix (#752). This change fixes the indexing, so that the full matrix is used.To retain compatibility with existing models that use
SparseLinear
, which works relatively well if there are not too many hash collisions, the fixed version is renamed toSparseLinear_v2
.While at it, fix another indexing-related bug:
The output of MurMur hashes were mapped to array indices as follows:
This works well when
nr_weight
is a power of two. For instance,if we have 16 buckets:
However, when the user uses a bucket count that is not a power of two, this breaks down. For instance, if we have 15 buckets:
This would mask out all odd indices. We fix this by using the modulus instead. To preserve compatibility with existing models, this change is only added to
SparseLinear_v2
.