Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[simple-api] simplified method for JNI bridges #221

Closed
wants to merge 4 commits into from

Conversation

NPavie
Copy link
Contributor

@NPavie NPavie commented Nov 20, 2023

  • Handle URIs in options
  • Return the job object for external monitoring
  • Generalize withArgument to allow the use of java objects and raw types instead of string in options
  • handle null arguments
  • when writing result, decode the idx to avoid url-encoded file names

@bertfrees
Copy link
Member

Generalize withArgument to allow the use of java objects and raw types instead of string in options

@NPavie I wonder why you need this. Also I'm not sure whether your change actually did what you intended: I don't see any Java objects provided to startJob() actually going into BoundScript.Builder without being first converted into strings. Or am I missing something? After all, script options are always strings. This is how the script API currently works. You currently can't pass an integer value as an int or a boolean value as a boolean.

Also note that CommandLineJobParser is intended to be a simplfied API on top of the script API for users who want to parse command line arguments. Command line arguments are strings, that is why SimpleAPI doesn't support non-String values for inputs and outputs either.

If you want to provide inputs or fetch outputs through other ways than through (string) paths, CommandLineJobParser is not your friend. For these cases it probably makes more sense to use BoundScript.Builder directly.

handle null arguments

Why do you need this? Actually I'd rather throw an IllegalArgumentException for null values, because null values are more likely to indicate an error than be intended.

Handle URIs in options

Why do you need this? The idea of CommandLineJobParser was to simplify the API for use cases where inputs are always files on the local file system. I'm not totally against making the API a bit more robust against wrong use though. But I'd like to understand your use case better. (For one thing, I don't know which options still exist today with type anyFileURI or anyDirURI and that are not inputs or outputs.)

when writing result, decode the idx to avoid url-encoded file names

Why do you do that? In which case did you get an URL-encoded idx? If this happens, that might be an error in the Pipeline itself (and not something we need to fix in SimpleAPI).

@@ -334,7 +358,7 @@ else if (file.list().length > 0)
if (file.isDirectory()) {
if (file.list().length > 0)
throw new IllegalArgumentException("Directory is not empty: " + file);
resultLocations.put(port, URI.create(file.toURI() + "/"));
resultLocations.put(port, file.toURI());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

@NPavie NPavie Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i recall, It could lead to invalid/malformed path on windows ending by "//" or "\/"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Shouldn't we instead change the

if (result.endsWith("/")) {

above to

if (result.endsWith("/") || result.endsWith("\\")) {

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I did not treat this in the results side (only the parameters initialization side was creating issues during developement of the connection between word addin and pipeline)

@NPavie
Copy link
Contributor Author

NPavie commented Nov 23, 2023

Generalize withArgument to allow the use of java objects and raw types instead of string in options

@NPavie I wonder why you need this. Also I'm not sure whether your change actually did what you intended: I don't see any Java objects provided to startJob() actually going into BoundScript.Builder without being first converted into strings. Or am I missing something? After all, script options are always strings. This is how the script API currently works. You currently can't pass an integer value as an int or a boolean value as a boolean.

Also note that CommandLineJobParser is intended to be a simplfied API on top of the script API for users who want to parse command line arguments. Command line arguments are strings, that is why SimpleAPI doesn't support non-String values for inputs and outputs either.

If you want to provide inputs or fetch outputs through other ways than through (string) paths, CommandLineJobParser is not your friend. For these cases it probably makes more sense to use BoundScript.Builder directly.

Oh Ok, I might need to change the plugin code in that case, or make an alternative entrypoint.
I use Java object because the JNI layer i use autmatically converts csharp raw types and objects to Java on the JVM side equivalent.
I can't use string directly without adding a lot of boilercode in the addin on chsarp side as Java toString and CSharp toString are different (like boolean tostring would produce 'False' on csharp, not 'false').
Using Java object directly on my side seemed the safest as to ensure the correct value is reported for the options at the end.
If you want i can revert back my changes on that and make an alternative copy of the changes in a "JNIJob" and "JNIJobParser" specifically for that.

handle null arguments

Why do you need this? Actually I'd rather throw an IllegalArgumentException for null values, because null values are more likely to indicate an error than be intended.

If i recall, at first i had "unspecified" options in some scripts, and i don't remember why but in that case it would produce "null" values for the options and it was making the script crash.
So i handled it just in case.

Handle URIs in options

Why do you need this? The idea of CommandLineJobParser was to simplify the API for use cases where inputs are always files on the local file system. I'm not totally against making the API a bit more robust against wrong use though. But I'd like to understand your use case better. (For one thing, I don't know which options still exist today with type anyFileURI or anyDirURI and that are not inputs or outputs.)

URI handling in the code was incorrect in the previous code, it would raised exception on windows, especially when trying to check if path is absolute.
I discovered that passing the URI directly to the file object without casting it to URI first was at the origin of the issue.

when writing result, decode the idx to avoid url-encoded file names

Why do you do that? In which case did you get an URL-encoded idx? If this happens, that might be an error in the Pipeline itself (and not something we need to fix in SimpleAPI).

When i checked, the idx values are URI encoded and you create a system path in the code (not an URI path):
for example if you have spaces in file names, they are written as %20 in the final result.
The "path" in the jobresult on the other end is correct, but i was not able to find in time the correct way to retrieve the internal job path as to extract and reuse the relative path of the result in it.

@bertfrees
Copy link
Member

So if I understand correctly you made the withArgument function accept any object because you want to use Java's Object.toString() method, so as to get string values that Pipeline will interpret correctly? That feels a bit of a hack to me. Accepting specific types such as integer, boolean, File and URI would be acceptable, but accepting any object and then converting it to string does not feel right.

Perhaps the real issue we need to solve is that Pipeline should be more relaxed about which values to accept. For example, a "boolean" value should currently be either "true", "false", "1" or "0". In addition we could allow also "True", "False", "TRUE", "FALSE", etc.

URI handling in the code was incorrect in the previous code, it would raised exception on windows, especially when trying to check if path is absolute. I discovered that passing the URI directly to the file object without casting it to URI first was at the origin of the issue.

I think an example would help. What kind of input is causing exceptions?

When i checked, the idx values are URI encoded and you create a system path in the code (not an URI path)

Aha I see. You're right. I think I would fix it as follows though:

File dest = new File(u.resolve(r.strip().getIdx()));

@NPavie
Copy link
Contributor Author

NPavie commented Nov 30, 2023

Hum i have rested the simple API from the current version and i can't reproduce the issue i had before when forwarding URIs.

I realised that the URI fix and most changes i made was based on the first version of the SimpleAPI and with an older JRE so it's possible the fixes i made was only needed with that version, and/or that some fixes like the absolute check for URI path was done at the JRE level.

I'll rework the PR to only provided the fixes that i think are really of use for (my) usage of the SimpleAPI through JNI :

  • the _startJob will now return the job object it creates
  • I'll propose a new static method called startMonitorableJob to start a command line job and return it for external monitoring
  • i'll add your proposed fix on file destination (tested, it works)

@NPavie NPavie changed the title [simple-api] fixes for usage through jni [simple-api] simplified method for JNI bridges Nov 30, 2023
@bertfrees
Copy link
Member

@NPavie So you prefer to use the JobMonitor object then the getNewMessages() convenience method? Then there is no reason to keep getNewMessages() and getLastJobStatus().

Regarding the new startMonitorableJob() method: I think it makes sense to return a CommandLineJob object from startJob() as well, especially if we drop getNewMessages() and getLastJobStatus(). Would it be OK if I replace the startMonitorableJob() method with a new static method to help with the correct serializing?

public static Map<String,Iterable<String>> stringifyOptions(Map<String,Object> options) { ... }

@bertfrees
Copy link
Member

Perhaps the real issue we need to solve is that Pipeline should be more relaxed about which values to accept. For example, a "boolean" value should currently be either "true", "false", "1" or "0". In addition we could allow also "True", "False", "TRUE", "FALSE", etc.

I still think this problem can also (at least partly) be solved by being less strict in the validation of option values. Can you show me what a typical option map coming from the Word add-in looks like?

@NPavie
Copy link
Contributor Author

NPavie commented Dec 13, 2023

@NPavie So you prefer to use the JobMonitor object then the getNewMessages() convenience method? Then there is no reason to keep getNewMessages() and getLastJobStatus().

Yes i prefer, mainly because i have a project in house to do some batch conversions from another C# project with possibly multiple jobs running.

Regarding the new startMonitorableJob() method: I think it makes sense to return a CommandLineJob object from startJob() as well, especially if we drop getNewMessages() and getLastJobStatus(). Would it be OK if I replace the startMonitorableJob() method with a new static method to help with the correct serializing?

public static Map<String,Iterable<String>> stringifyOptions(Map<String,Object> options) { ... }

Oh indeed it would be a good alternative !
I could use that to rely on the original startJob method while having the correction conversion of parameters from the C# JNI code.
i'll update the code on both side to do that.

I still think this problem can also (at least partly) be solved by being less strict in the validation of option values. Can you show me what a typical option map coming from the Word add-in looks like?

I'll try to make an example, but i remember i was facing 2 problems when i had the need of going with the more generic Map<String, Object> options type :

  • An issue with boolean value that are stringified as "True" or "False"
    (that is easy to fix on the addin side for this specifc type, but that made me worry about other types stringification)

  • Something that we did not mention, but there was a "Math equation" map as parameter of the word-to-dtbook script under development at some point that i was testing through the addin at the time.

I think the second point was the main issue that made me change the start function to accept more generic objects at the time.
But I realise just now that with the progress on the word-to-dtbook side with the ongoing port of preprocessing stage; this could not be an issue anymore (and if i recall, the map type for parameters was not ready yet to be used in calabash).

Anyway, i think the "stringifyOptions" approach will help ensure parameters are correctly stringified for usage of SimpleAPI Through JNI, while keeping the normal startJob behavior as targeted for command line usage.

@NPavie NPavie force-pushed the simple-api-jni branch 2 times, most recently from 4a837c6 to 4228251 Compare December 14, 2023 09:05
@NPavie
Copy link
Contributor Author

NPavie commented Dec 14, 2023

@bertfrees I added a bit more documentation to the SimpleAPI and did the following:

  • Replaced the previous startMonitorableJob by a stringifyOptions method to get a map compatible with the startJob method
  • Made the startJob method to return a CommandLineJob object
  • Moved the getNewMessages, andgetErrors to the CommandLineJob object
  • Removed static getLastStatus in favour of the CommandLineJob.getStatus method
  • I kept the getMonitor with a small warning that it should be reserved for advance usage if needing a finer grained monitoring
    (I simplified a bit the API for my usage to get the new messages and the errors through a java function in the CommandLineJob object itself, but i'm planning to use the monitor getter in debug mode to get debug messages)
  • Updated the provided main to reflect the changes

It makes the API more CommandLineJob centric for logging and status checking.
This should ease the use of the API for JNI calls and also for batching if others want to use the pipeline through this API.

(And I also changed the job info messages output of the main to be on the standard output)

I'm thinking of making on the side a little C or C++ version of the main just to illustrate the usage through JNI, maybe in another repo for demo.

@bertfrees
Copy link
Member

Something that we did not mention, but there was a "Math equation" map as parameter of the word-to-dtbook script under development at some point that i was testing through the addin at the time.

I think there has been a bit of a misunderstanding. Calabash indeed supports options with any type of XDM value, and the word-to-dtbook step indeed had an option that accepted maps. However Pipeline scripts are an abstraction layer on top of XProc steps, and script options are unfortunately simple strings. It is of course also possible to create JNI bindings for the XProc API, and if I remember correctly we discussed that, and at one point I had even added it to the SimpleAPI class. But I changed my mind when I realized it was not a good idea. There is a reason we have the abstraction layer, and we should not try to work around it unless we have a very good reason.

NPavie added a commit to daisy/word-save-as-daisy that referenced this pull request Jun 10, 2024
Also prepare pipeline engine rebuild from submodule
(requires the merge of a daisy/pipeline-assembly#221 for finalization)
@NPavie
Copy link
Contributor Author

NPavie commented Sep 25, 2024

Hi @bertfrees, I was wondering if something remained to be changed here regarding the proposal for easier interoperability with JNI ?

@bertfrees
Copy link
Member

I still don't like the stringifyOptions method, I would prefer it if we could eliminate it.

I don't like what it does. It just feels like a hack to me:

  • it accepts any object, rather than only specific types that make sense (e.g. strings, integers, booleans, files and URIs)
  • it uses Object.toString(), rather than specifying exactly the serialization that is needed to match Pipeline's job input parser.

and I don't like the reason for adding it:

  • You mentioned "True" and "False", and I suggested that I make the job input parser more relaxed, by accepting "True", "False", "TRUE", "FALSE", etc. in addition to "true", "false", "1" and "0". IMO it's a much nicer solution. I thought I had already done the change, but it seems I hadn't. I will do it now.
  • You said that you were worried about other types, however I would first like to see that there are actually other issues, before resorting to this hack.

@NPavie
Copy link
Contributor Author

NPavie commented Sep 25, 2024

Ok, we can remove this function then for the PR (I figured how to redo it on the jni side if need be)

@NPavie
Copy link
Contributor Author

NPavie commented Sep 26, 2024

@bertfrees the stringifyOptions is now removed

@bertfrees
Copy link
Member

Thanks.

bertfrees added a commit to daisy/pipeline-framework that referenced this pull request Sep 26, 2024
Accept "True" and "False" in addition to "true", "false", "1" and "0".

see daisy/pipeline-assembly#221
@bertfrees
Copy link
Member

NPavie and others added 4 commits September 26, 2024 23:13
- Return the job object for external monitoring
- The job is wrapped in a CommandLineJob object that has convenience
  methods for handling messages. The CommandLineJob.getNewMessages() method replaces
  the static method from before for per-job logging.

Co-authored-by: Bert Frees <bertfrees@gmail.com>
@bertfrees
Copy link
Member

I did some clean up and rebased onto master.

@NPavie
Copy link
Contributor Author

NPavie commented Sep 27, 2024

Everything works with my jni-side of options' stringification, so I would say its fine for me.

(I assume the version of the framework with the commit you mentioned is not yet referenced in the assembly master branch, if I remove the stringification i get a "value "False" not accepted" error)

@bertfrees
Copy link
Member

No, it is not included yet. Perhaps we need to test it first, just to be sure I did it right. But I'm unsure how. Shall I deploy the snapshot?

@NPavie
Copy link
Contributor Author

NPavie commented Sep 27, 2024

The PR does not strictly depend on it, but if you do I can run a quick test just to be sure

@bertfrees
Copy link
Member

I deployed framework-bom 1.14.20-SNAPSHOT.

@NPavie
Copy link
Contributor Author

NPavie commented Oct 4, 2024

I finally got the time to launch the test with the new bom, and I confirm : it works :)

@bertfrees
Copy link
Member

Nice. I'll merge now.

@bertfrees
Copy link
Member

Merged in ff701bc

@bertfrees bertfrees closed this Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants