-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csr_matrices #122
Comments
Hi @kwchurch, thank you for the detailed dev log! I slightly edited the format to further improve the readability. At a first glance, it looks to me like an issue of incompatible dtype. More specifically, the I think to resolve the type issue, the most straightforward solution is to enforce the desired types (i.e., Lines 432 to 438 in 49d6063
I will first try to reproduce the error here using the example script you provided, and then see if my proposed solution actually fixes the issue. As we also discussed, I will add the option for implicitly assigning node IDs if it is not found in the |
Hi @kwchurch, I've created a new branch (see #124) implementing my suggestions above (explicit dtype setting and implicit node IDs setting). The scipy csr karate test case works fine on my end.
In the meantime, if you would like to give the new changes a try and let me know if this resolves your issue, that would be great. You can run it as before using pecanpy --input demo/karate.bool.npz --output demo/karate.int.emb --mode SparseOTF which will warn you about the implicit node IDs setting. To suppress that, you can set the pecanpy --input demo/karate.bool.npz --output demo/karate.int.emb --mode SparseOTF --implicit_ids |
ok
do you think it could check the datatypes and make the necessary
conversions automatically?
…On Wed, Jun 29, 2022 at 4:04 AM Remy Liu ***@***.***> wrote:
Hi @kwchurch <https://github.com/kwchurch>, thank you for the detailed
dev log! I slightly edited the format to further improve the readability.
At a first glance, it looks to me like an issue of incompatible dtype. More
specifically, the csr used by PecanPy uses uint32 for both the index and
indptr fields, rather than int32 as used by scipy.sparse.csr. Similarly,
PecanPy uses float32 instead of float64 for the data field in the csr
object.
I think to resolve the type issue, the most straightforward solution is to
enforce the desired types (i.e., float32 for data; uint32 for indices and
`indptr) at loading time:
https://github.com/krishnanlab/PecanPy/blob/49d60630b4589eeab992eef2da9c2eaf6b19fab8/src/pecanpy/graph.py#L432-L438
I will first try to reproduce the error here using the example script you
provided, and then see if my proposed solution actually fixes the issue.
As we also discussed, I will add the option for implicitly assigning node
IDs if it is not found in the .csr.npz file. I will make it so that it
requires a "soft confirmation" from the user that the implicit assignment
is desired by printing a warning message about the implicit assignment,
unless a specific flag (e.g., --implicit_node_ids) is set.
—
Reply to this email directly, view it on GitHub
<#122 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEKUDKLY6PB4MGDDAPQ45GTVRQUSTANCNFSM52C2UW3Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
great
…On Wed, Jun 29, 2022 at 7:48 AM Remy Liu ***@***.***> wrote:
@kwchurch <https://github.com/kwchurch> yes it is doing that now
https://github.com/krishnanlab/PecanPy/blob/a12f27c608bb5b72651481b80380bffdf42053ab/src/pecanpy/graph.py#L443-L445
—
Reply to this email directly, view it on GitHub
<#122 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEKUDKJJCHYTZUN422CETSLVRRO4BANCNFSM52C2UW3Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
let me know when you have something ready to try out
…On Wed, Jun 29, 2022 at 7:48 AM Remy Liu ***@***.***> wrote:
@kwchurch <https://github.com/kwchurch> yes it is doing that now
https://github.com/krishnanlab/PecanPy/blob/a12f27c608bb5b72651481b80380bffdf42053ab/src/pecanpy/graph.py#L443-L445
—
Reply to this email directly, view it on GitHub
<#122 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEKUDKJJCHYTZUN422CETSLVRRO4BANCNFSM52C2UW3Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@kwchurch it is ready to be tried out, but it is not on the |
I have some graphs with nodes that have no edges
Is that a problem?
init pecanpy: p = 1, q = 1, workers = 16, verbose = True, extend = True,
gamma = 0, random_state = None
/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/rw/sparse_rw.py:30:
RuntimeWarning: Mean of empty slice.
data[indptr[i] : indptr[i + 1]].mean()
/home/k.church/venv/gft/lib/python3.8/site-packages/numpy/core/_methods.py:189:
RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/home/k.church/venv/gft/lib/python3.8/site-packages/numpy/core/_methods.py:262:
RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/k.church/venv/gft/lib/python3.8/site-packages/numpy/core/_methods.py:222:
RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/home/k.church/venv/gft/lib/python3.8/site-packages/numpy/core/_methods.py:254:
RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/var/spool/slurm/d/job27656002/slurm_script", line 8, in <module>
sys.exit(main())
File
"/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/cli.py", line
333, in main
walks = simulate_walks(args, g)
File
"/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/wrappers.py",
line 18, in wrapper
result = func(*args, **kwargs)
File
"/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/cli.py", line
320, in simulate_walks
return g.simulate_walks(args.num_walks, args.walk_length)
File
"/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/pecanpy.py",
line 153, in simulate_walks
walk_idx_mat = self._random_walks(
File
"/home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/dispatcher.py",
line 468, in _compile_for_args
error_rewrite(e, 'typing')
File
"/home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/dispatcher.py",
line 409, in error_rewrite
raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step:
nopython frontend)
^[[1m^[[1m^[[1m^[[1mFailed in nopython mode pipeline (step: nopython
frontend)
^[[1m^[[1m^[[1m^[[1mFailed in nopython mode pipeline (step: nopython
frontend)
^[[1m^[[1mNo implementation of function Function(<built-in function imul>)
found for signature:
>> imul(array(bool, 1d, C), array(float64, 1d, C))
There are 8 candidate implementations:
^[[1m - Of which 4 did not match due to:
Overload of function 'imul': File: <numerous>: Line N/A.
With argument(s): '(array(bool, 1d, C), array(float64, 1d, C))':^[[0m
^[[1m No match.^[[0m
^[[1m - Of which 2 did not match due to:
Overload in function 'NumpyRulesInplaceArrayOperator.generic': File:
numba/core/typing/npydecl.py: Line 244.
With argument(s): '(array(bool, 1d, C), array(float64, 1d, C))':^[[0m
^[[1m Rejected as the implementation raised a specific error:
AttributeError: 'NoneType' object has no attribute 'args'^[[0m
raised from
/home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/typing/npydecl.py:255
^[[1m - Of which 2 did not match due to:
Operator Overload in function 'imul': File: unknown: Line unknown.
With argument(s): '(array(bool, 1d, C), array(float64, 1d, C))':^[[0m
…On Wed, Jun 29, 2022 at 8:37 AM Remy Liu ***@***.***> wrote:
@kwchurch <https://github.com/kwchurch> it is ready to be tried out, but
it is not on the main branch. you'll need to checkout the scipy-csr
branch, and you will find the new changes there.
—
Reply to this email directly, view it on GitHub
<#122 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEKUDKNOBVOUVKHZ674HFRDVRRUSXANCNFSM52C2UW3Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I have a large csr_matrix in npz format. I'd like to use that as input as is, but it doens't have IDs field
added this to graph.py (but it doesn't work)
Created edg2npz.py with this:
called it with
python edg2npz.py demo/karate.bool.npz bool < demo/karate.edg
Unfortunately, I can't use this kind of csr_matrix...
I can write out my matrix to text and then run pecanpy on that, but my matrix is very large and it will take a long time to write it out and read it back. My matrix has N = 300M nodes and E=2B nonzero edges.
There are 6 candidate implementations:
Overload in function 'NumpyRulesInplaceArrayOperator.generic': File: numba/core/typing/npydecl.py: Line 244.
With argument(s): '(array(bool, 1d, C), int64)':
Rejected as the implementation raised a specific error:
AttributeError: 'NoneType' object has no attribute 'args'
raised from /home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/typing/npydecl.py:255
Operator Overload in function 'itruediv': File: unknown: Line unknown.
With argument(s): '(array(bool, 1d, C), int64)':
No match for registered cases:
Overload of function 'itruediv': File: numba/core/typing/npdatetime.py: Line 94.
With argument(s): '(array(bool, 1d, C), int64)':
No match.
The text was updated successfully, but these errors were encountered: