Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative Loss on gpu ALS model #367

Closed
sikhad opened this issue Jul 14, 2020 · 6 comments · Fixed by #663
Closed

Negative Loss on gpu ALS model #367

sikhad opened this issue Jul 14, 2020 · 6 comments · Fixed by #663

Comments

@sikhad
Copy link

sikhad commented Jul 14, 2020

I'm getting a negative loss value when running ALS using GPU (loss = -.0346) regardless of varying all parameters. When running the same data/parameters on CPU, I'm getting a positive loss. I'm confused why loss could be negative.

It's a ~6500 x 1m csr matrix.

params = {'factors':64, 
          'use_gpu':True, 
          'use_native':True, 
          'use_cg':True, 
          'regularization':0, 
          'num_threads':0,
          'iterations':5,
          'calculate_training_loss':True}

# initialize a model
model = implicit.als.AlternatingLeastSquares(**params)

# train the model on a sparse matrix of item/user/confidence weights
model.fit(csr, show_progress=True)
@benfred
Copy link
Owner

benfred commented Jan 13, 2022

Its looking like the GPU loss calculation might be buggy (See also #441 )

@benfred benfred changed the title Negative Loss Negative Loss on gpu ALS model Jan 22, 2022
benfred added a commit that referenced this issue Jun 6, 2023
The GPU ALS model would sometimes return incorrect results with the
`calculate_training_loss` parameter enabled. This happend when the
number_of_users * number_of_items was bigger than 2**31 due to
an overflow in the loss function calculation.

Fix and tests that would have caught this bug

Closes #367
Closes #441
@benfred
Copy link
Owner

benfred commented Jun 6, 2023

There was a bug with the calculate_training_loss parameter - when the number_of_items * number_of_users was bigger than 2**31. This will be fixed by #663 in the next release.

thanks for reporting - sorry about the lengthy delay in getting this resolved.

benfred added a commit that referenced this issue Jun 6, 2023
The GPU ALS model would sometimes return incorrect results with the
`calculate_training_loss` parameter enabled. This happend when the
number_of_users * number_of_items was bigger than 2**31 due to
an overflow in the loss function calculation.

Fix and add tests that would have caught this bug

Closes #367
Closes #441
@gallir
Copy link

gallir commented Jun 6, 2023

Thanks. Will you release a new pip module version?

@benfred
Copy link
Owner

benfred commented Jun 6, 2023

@gallir - I'm working on getting a new version together - I also want to get changes like #661 and #656 pushed out to people too.

I'd also like to fix the conda packaging errors with this version - once I have a handle on that I'll push out a new release.

@benfred
Copy link
Owner

benfred commented Jun 13, 2023

@gallir - fix is in v0.7.0

@gallir
Copy link

gallir commented Jun 13, 2023

v0.7.0

Thank you very much. I had modified your build yml to use your latest version, it worked better than before https://github.com/gallir/implicit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants