-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PseudoPotentialData
: allow str
or Path
for source
argument
#98
Conversation
There were a few tricky implementation details to consider, but I think this implementation could work. Question for me would be if we should still support |
Thanks @sphuber.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left an important comment.
The rest seems fine to me. I tested it in some siesta calculations and works well.
aiida_pseudo/data/pseudo/pseudo.py
Outdated
if isinstance(source, (str, pathlib.Path)): | ||
with open(source, 'rb') as handle: | ||
source = io.BytesIO(handle.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here something needs to be added in order to give a name to the source. Without a name, the attribute filename
would be set to file.txt
when the absolute path is passed, while to the actual file name when the steam is passed.
I think something like that is sufficient:
if isinstance(source, (str, pathlib.Path)):
save_name = str(source)
with open(source, 'rb') as handle:
source = io.BytesIO(handle.read())
source.name = save_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, although we probably only want to do this when the filename
argument is None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should just give the name to the stream as in the case an external stream is passed. If this name will be used or not shouldn't matter here, it is decided within the SinglefileData.set_file
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, that's true
Regarding this I now realized that in the |
I am not convinced this is correct. I gave the accessing of filename attributes as an example, but there are other things that could break. For example the base class checks whether modification is allowed. In general, one should call the superclass as soon as possible and I think here the same applies. The slight increase in efficiency in case an invalid file is passed does not outweigh the potential problems that could arise by calling the super too late.
The check on the existence of the file we already do in fact, albeit indirect. As the |
Nothing comes in my mind that should justify why relative paths are not accepted. Ok to leave it as it is! |
Another idea that is coming to me now. Just an idea we can discuss another time. |
The argument against this approach is that |
6fcfd80
to
0fbd7fc
Compare
Thanks for the review @bosonie . I addressed the question of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to you! Also in this case would be useful to have a release soon...is it possible? Sorry for that!
I will give @mbercx some more time to respond and then I could make a release, we have some other fixes on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sphuber! Gave the code a quick pass and it all looks good. Also did a bit of field testing for the UpfData
class and all 👍 . Just left a few nitpicky comments regarding the docstrings (which still need to be propagated from the JthXmlData
class to others).
aiida_pseudo/data/pseudo/jthxml.py
Outdated
|
||
:param stream: a filelike object with the binary content of the file. | ||
.. note:: this method will first analyse the type of the ``source`` and if it is a filepath will convert it | ||
to a binary stream of the content located at that filepath. This is then passed on to the superclass. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to a binary stream of the content located at that filepath. This is then passed on to the superclass. This | |
to a binary stream of the content located at that filepath, which is then passed on to the superclass. This |
aiida_pseudo/data/pseudo/jthxml.py
Outdated
:param stream: a filelike object with the binary content of the file. | ||
.. note:: this method will first analyse the type of the ``source`` and if it is a filepath will convert it | ||
to a binary stream of the content located at that filepath. This is then passed on to the superclass. This | ||
needs to be done first, because it will properly set the file and filename attributes that are expected to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be done first, because it will properly set the file and filename attributes that are expected to | |
needs to be done first, because it will properly set the file and filename attributes that are expected |
aiida_pseudo/data/pseudo/jthxml.py
Outdated
.. note:: this method will first analyse the type of the ``source`` and if it is a filepath will convert it | ||
to a binary stream of the content located at that filepath. This is then passed on to the superclass. This | ||
needs to be done first, because it will properly set the file and filename attributes that are expected to | ||
be there by other methods. Straight after the superclass call, the source seeker needs to be reset to zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be there by other methods. Straight after the superclass call, the source seeker needs to be reset to zero | |
by other methods. Straight after the superclass call, the source seeker needs to be reset to zero |
aiida_pseudo/data/pseudo/jthxml.py
Outdated
this way the conversion from filepath to byte stream will be performed only once. Otherwise, each subclass | ||
would perform the conversion over and over again. | ||
|
||
:param source: the source pseudopotential content, either a binary stream, or a `str` or `Path` to the absolute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param source: the source pseudopotential content, either a binary stream, or a `str` or `Path` to the absolute | |
:param source: the source pseudopotential content, either a binary stream, or a ``str`` or ``Path`` to the absolute |
would perform the conversion over and over again. | ||
|
||
:param source: the source pseudopotential content, either a binary stream, or a `str` or `Path` to the absolute | ||
path of the file on disk. | ||
:param filename: optional explicit filename to give to the file stored in the repository. | ||
:raises ValueError: if the element symbol is invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method also raises a TypeError
in case the source
type is not valid. Should we "bubble up" this raises
information to this docstring from the prepare_source
docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
aiida_pseudo/data/pseudo/pseudo.py
Outdated
"""Get pseudopotenial data node from database with matching md5 checksum or create a new one if not existent. | ||
|
||
:param stream: a filelike object with the binary content of the file. | ||
:param source: the source pseudopotential content, either a binary stream, or a `str` or `Path` to the absolute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param source: the source pseudopotential content, either a binary stream, or a `str` or `Path` to the absolute | |
:param source: the source pseudopotential content, either a binary stream, or a ``str`` or ``Path`` to the absolute |
Also: Does the path have to be absolute? I did some field testing and it seemed to work just fine for relative paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it shouldn't. The SinglefileData
base class actually imposes this requirement, but I didn't see a good reason for it and so I made it such that it supports relative paths as well. If it is incorrect, it will simply raise FileNotFoundError
. I will update the docstrings.
aiida_pseudo/data/pseudo/pseudo.py
Outdated
|
||
.. note:: if the ``source`` is a valid file on disk, its content is read and returned as a stream of bytes. | ||
|
||
:raises TypeError: if the source is not a str, ``pathlib.Path`` instance or binary stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:raises TypeError: if the source is not a str, ``pathlib.Path`` instance or binary stream. | |
:raises TypeError: if the source is not a ``str``, ``pathlib.Path`` instance or binary stream. |
aiida_pseudo/data/pseudo/pseudo.py
Outdated
""" | ||
if not isinstance(source, (str, pathlib.Path)) and not cls.is_readable_byte_stream(source): | ||
raise TypeError( | ||
f'`source` should be a str or `pathlib.Path` of a filepath on disk or a stream of bytes, got: {source}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f'`source` should be a str or `pathlib.Path` of a filepath on disk or a stream of bytes, got: {source}' | |
f'`source` should be a `str` or `pathlib.Path` of a filepath on disk or a stream of bytes, got: {source}' |
0fbd7fc
to
9af79fe
Compare
Currently, the source of the pseudopotential passed at construction time of the `PseudoPotentialData` that represents it had to be a binary stream. Here we introduce the possibility to allow the caller that pass a `str` or `Path` object with the absolute path to the file content on disk that should be wrapped. Note that, as opposed to the base class `SinglefileData`, the filepath is not required to be absolute. In addition, it raises a `FileNotFoundError` instead of a `ValueError` if the provided filepath does not exist. There is a potential inefficiency in the implementation after this change. It is not only the constructor of `SinglefileData`, and therefore our plugin `PseudoPotentialData` and all its subclasses, that needs to be able to support the different source types and convert the paths to a stream if that is what passed, the `set_file` method, which is called by the constructor needs to apply the same checks and conversions. Here, we need to therefore make sure that the conversion from filepath to byte stream is only done once and not in the `set_file` call of every subclass. Therefore it is important that the `set_file` method first calls the `prepare_source` class method, which will only convert to bytestream if isn't already one, and that should then be passed to the superclass call. Note that this should be called straight after the `prepare_source` since the base class will set the actual file and filename attributes that are methods that may be invoked afterwards will rely on.
9af79fe
to
5200c49
Compare
Fixes #87
Currently, the source of the pseudopotential passed at construction time
of the
PseudoPotentialData
that represents it had to be a binarystream. Here we introduce the possibility to allow the caller that pass
a
str
orPath
object with the absolute path to the file content ondisk that should be wrapped.
There is a potential inefficiency in the implementation after this change.
It is not only the constructor of
SinglefileData
, and therefore ourplugin
PseudoPotentialData
and all its subclasses, that needs to beable to support the different source types and convert the paths to a
stream if that is what passed, the
set_file
method, which is called bythe constructor needs to apply the same checks and conversions. Here, we
need to therefore make sure that the conversion from filepath to byte
stream is only done once and not in the
set_file
call of everysubclass.
Therefore it is important that the
set_file
method first calls theprepare_source
class method, which will only convert to bytestream ifisn't already one, and that should then be passed to the superclass
call. Note that this should be called straight after the
prepare_source
since the base class will set the actual file and filename attributes
that are methods that may be invoked afterwards will rely on.