Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-2464: exposing scryer prolog functionality in libscryer_prolog.so for client library consumption #2465

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

jjtolton
Copy link

@jjtolton jjtolton commented Aug 3, 2024

Closes: #2464

My barbarian efforts to make the WASM eval_code functionality available from the DLL for use in other languages. I genuinely hope I did not insult any rust or prolog users here with these efforts!

Outstanding tasks from discussion:

**Updated TODOs in response to discussion:**
** C-API goal (future enhancement) **
  • change error reporting from JSON to error codes
  • create a set of bindings with C-style APIs
** lower level pure scryer prolog goals (future enhancement) **
**Possible additional goals:** - provide a way for multiple simultaneous generative queries, either via WAM cloning or scryer continuations?
Possible addition to `README.md` (going to be moved to subdirectory instead of `README.md`)

Steps to Use the Shared Library:

  1. Locate the Library: After building Scryer Prolog, you'll find the shared library in:

    <PATH-TO>/scryer-prolog/target/release.

    Replace <PATH-TO> with the actual path where you installed Scryer Prolog on your computer. The library will have a name like libscryer_prolog.so (Linux), libscryer_prolog.dylib (macOS), or scryer_prolog.dll (Windows).

  2. Load the Library: Here are basic examples for common languages:

    • C/C++ (Linux/macOS):

      #include <dlfcn.h>
      
      void* handle = dlopen("libscryer_prolog.so", RTLD_LAZY); // Load the library
      if (!handle) {
          // Handle error: library not found
      }
      
      // ... Get function pointers from the loaded library ...
    • Python: You'll typically use libraries like ctypes:

      import ctypes
      
      lib = ctypes.cdll("./libscryer_prolog.so") # Replace with correct path
      
      # ... Access functions in lib ...
  3. Access Functions: Once the library is loaded, you can call its functions (like consult_module_string, run_query) just like you would any other function in your code. See the Python reference implementation below.

Important Notes:

  • Memory Management: Be extra careful about memory management when interacting with C libraries from other languages (especially strings). See below.

The following functions are exposed:

Shared Library Functions Summary

This table summarizes the Rust functions you provided, detailing their names and descriptions:

Function Name Description
machine_new() Initializes a new Prolog machine. It is the responsibility of the caller to call machine_free().
machine_free() Deallocates memory for the Prolog machine created with machine_new().
start_new_query_generator() Initializes a new query generator with provided Prolog source code (C string). Returns JSON status.
cleanup_query_generator() Cleans up resources associated with the query generator. Returns JSON status.
run_query_generator() Runs the query generator, returning results as JSON. Expects initialization from previous calls.
load_module_string() Loads Prolog source code into the "facts" module. Returns JSON status.
consult_module_string() Consults (loads and runs) facts from a C string. Returns JSON status.
run_query() Executes a single Prolog query with provided C-style string input. Returns JSON results.
free_c_string() Frees memory allocated for C strings returned by other functions. IMPORTANT: Always call after using the result string!

Shared Library Memory Management Considerations and Usage Overview

  • Ownership Transfer:

    The Rust library uses C-style strings (CString) to return data. To avoid premature cleanup by Rust's garbage collector, ownership of the string pointer is transferred to the client application. This means the client becomes responsible for freeing the memory using free_c_string(). Failure to do so will cause memory leaks.

  • Thread-Local State:

    The library utilizes thread-local storage (thread_local!) to maintain state between function calls within the same thread. You must initialize this state with machine_new() and release it with machine_free() when finished. Neglecting to call machine_free() will also lead to memory leaks.

Using the Library: A Step-by-Step Guide

  1. Initialization (machine_new()): Before using any other function, initialize the Prolog machine using machine_new(). This creates a thread-local state object necessary for subsequent operations.

  2. Loading Data: You have two options for loading Prolog data:

    • Direct Consultation: Use consult_module_string() to load and immediately run Prolog source code from a C string. Think of it as directly providing the source code to Scryer Prolog.

    • Query Generation:

      • Start a new query generator using start_new_query_generator(), passing your query (in the form ?- ...) as input.
      • Use run_query_generator() repeatedly until it returns { "status": "ok", "result": ["false"] }. This indicates that all results have been generated.
  3. Running Queries:

    • For non-generative queries (those with finite results), use run_query().
    • For potentially infinite result sets, utilize the query generator approach described above.
  4. Cleanup:

    • Always call cleanup_query_generator() after finishing with a query generator to release associated resources (unless it's done internally for you).
    • Call machine_free() once you are done using the Prolog machine, regardless of the loading and querying methods used.

Example Python Client Implementation

# example.py
import contextlib
import ctypes
import json

lib = ctypes.cdll.LoadLibrary("<PATH-TO>/target/release/libscryer_prolog.so")

lib.run_query.argtypes = (ctypes.c_char_p,)
lib.run_query.restype = ctypes.POINTER(ctypes.c_char)

lib.free_c_string.argtypes = (ctypes.POINTER(ctypes.c_char),)
lib.free_c_string.restype = None

lib.machine_new.argtypes = []
lib.machine_new.restype = None

lib.machine_free.argtypes = []
lib.machine_free.restype = None

lib.consult_module_string.argtypes = (ctypes.c_char_p,)
lib.consult_module_string.restype = ctypes.POINTER(ctypes.c_char)

lib.load_module_string.argtypes = (ctypes.c_char_p,)
lib.load_module_string.restype = ctypes.POINTER(ctypes.c_char)

lib.start_new_query_generator.argtypes = (ctypes.c_char_p,)
lib.start_new_query_generator.restype = ctypes.POINTER(ctypes.c_char)

lib.cleanup_query_generator.argtypes = []
lib.cleanup_query_generator.restype = ctypes.POINTER(ctypes.c_char)

lib.run_query_generator.argtypes = []
lib.run_query_generator.restype = ctypes.POINTER(ctypes.c_char)


class ScryerPanicException(RuntimeError):
    pass


class ScryerError(Exception):
    pass


def handle_scryer_result(result: str):
    result = json.loads(result)
    if result["status"] == "ok":
        return result["result"] if 'result' in result else True
    elif result["status"] == "error":
        raise ScryerError(result["error"])
    elif result["status"] == "panic":
        raise ScryerPanicException()


def eval_and_free(query: str):
    res_ptr = lib.run_query(query.encode('utf-8'))
    res = ctypes.cast(res_ptr, ctypes.c_char_p).value.decode('utf-8')
    lib.free_c_string(res_ptr)
    return handle_scryer_result(res)


def run_generator_step():
    res_ptr = lib.run_query_generator()
    res = ctypes.cast(res_ptr, ctypes.c_char_p).value.decode('utf-8')
    lib.free_c_string(res_ptr)
    return handle_scryer_result(res)


def handle_loader_deloader(ptr):
    res = ctypes.cast(ptr, ctypes.c_char_p).value.decode('utf-8')
    try:
        return handle_scryer_result(res)
    finally:
        lib.free_c_string(ptr)


@contextlib.contextmanager
def query_generator(query):
    try:
        lib.start_new_query_generator(query.encode('utf-8'))

        def generator():
            seen_answer = False
            while True:
                answer = run_generator_step()
                if answer is False:
                    if seen_answer:
                        break
                    yield answer[0]
                    break
                seen_answer = True
                yield answer[0]

        yield generator()
    finally:
        lib.cleanup_query_generator()


def load_facts(facts: str):
    handle_loader_deloader(lib.load_module_string(facts.encode('utf-8')))


def consult(facts: str):
    handle_loader_deloader(lib.consult_module_string(facts.encode('utf-8')))


def result_set(query, n):
    return f"Result Set {n}:\n{eval_and_free(query)}"


class ScryerMachine:

    def __init__(self, source=None):
        self.source = source

    def __enter__(self, query: str = None):
        lib.machine_new()
        if self.source:
            load_facts(self.source)
        if query:
            return self.lazy_eval(query)
        return self

    def __exit__(self, *_):
        lib.machine_free()

    def lazy_eval(self, query):
        with query_generator(query) as g:
            for answer in g:
                yield answer

    def consult(self, facts):
        consult(facts)

    def eval(self, fact):
        return eval_and_free(fact)


if __name__ == '__main__':
    source = '''
    dynamic(fact/1).
    dynamic(bread/1).
    fact(1).
    fact(2).
    fact(3).
    fact(7).
    sock(1).
    sock(2).
    
    :- use_module(library(dcgs)).
    
    as --> [].
    as --> [a], as.
    
    '''
    with ScryerMachine(source) as wam:
        for answer in wam.lazy_eval('fact(Fact).'):
            print(answer)

        wam.consult('fact(4). fact(5).')
        for answer in wam.lazy_eval('fact(Fact).'):
            print(answer)
            break

        for answer in wam.lazy_eval('sock(Sock).'):
            print(answer)
            break

        wam.eval('assertz(bread(1)).')
        for answer in wam.lazy_eval('bread(Bread).'):
            print(answer)
$ python example.py 
{'Fact': 1}
{'Fact': 2}
{'Fact': 3}
{'Fact': 7}
% Warning: overwriting fact/1 because the clauses are discontiguous
{'Fact': 4}
{'Sock': 1}
{'Bread': 1}
# Concerns
  • I'm not sure if there's really a difference between load_module_string and consult_module_string -- I included both because maybe there is a nuanced difference, but I can't tell which one is which.
  • I'm not a rust specialist, and I feel like I could have done the feature/cfg targets much better. (being moved to inline modules)
  • Still not a rust specialist, and I am a bit concerned about using the thread-local "globals" for maintaining state, but that was the best way I could figure out how to do it. Please let me know if anyone has any better ideas. (refactored to individual machine references)

@jjtolton
Copy link
Author

jjtolton commented Aug 3, 2024

(Edit: I really wasn't sure I was going to be able to figure out how to do this, but the puzzle began to progressively come together. The original description of the PR was more of a cry for help, it is now somewhat organized. The following comments are the ramblings leading to the eventual victory of sorts.)

Oh wow, it DOES work!! I think?? The problem was actually in the Python code (sort of). I examined the scryer playground code and realize there was more going on "under the hood" of the playground page than I had supposed

When the python code is modified as follows:
import ctypes

lib = ctypes.cdll.LoadLibrary("/home/jay/programs/scryer-prolog/target/release/libscryer_prolog.so")

lib.eval_code_c.argtypes = (ctypes.c_char_p,)
lib.eval_code_c.restype = ctypes.POINTER(ctypes.c_char)

lib.free_c_string.argtypes = (ctypes.POINTER(ctypes.c_char),)
lib.free_c_string.restype = None

def eval_and_free(source, query):
    playground = f"""
    {source}
    :- use_module(library(charsio)).
      :- use_module(library(iso_ext)).
      :- use_module(library(format)).
      :- use_module(library(dcgs)).
      :- initialization(playground_main).

      playground_main :-
          QueryStr = "{query}",
          read_term_from_chars(QueryStr, Query, [variable_names(Vars)]),
          bb_put(playground_first_answer, true),
          bb_put(playground_pending, true),
          (   setup_call_cleanup(true, Query, NotPending = true),
              % Query succeeded
              bb_get(playground_first_answer, FirstAnswer),
              (   FirstAnswer == true ->
                  format("   ", []),
                  bb_put(playground_first_answer, false)
              ;   true
              ),
              phrase(playground_answer(Vars), Answer),
              format("~s", [Answer]),
              (   NotPending == true ->
                  bb_put(playground_pending, false)
              ;   format("~n;  ", [])
              ),
              false
          ;   % Query maybe failed
              bb_get(playground_pending, Pending),
              (   Pending == true ->
                  % Query failed
                  bb_get(playground_first_answer, FirstAnswer),
                  (   FirstAnswer == true ->
                      format("   ", []),
                      bb_put(playground_first_answer, false)
                  ;   true
                  ),
                  format("false", [])
              ;   % Query just terminated
                  true
              )
          ),
          format(".", []).

      playground_answer([]) --> "true".
      playground_answer([Var|Vars]) -->
          playground_answer_([Var|Vars]).

      playground_answer_([]) --> "".
      playground_answer_([VarName=Var|Vars]) -->
          {{ write_term_to_chars(Var, [double_quotes(true), quoted(true)], VarChars) }},
          format_("~a = ~s", [VarName, VarChars]),
          (   {{ Vars == [] }} ->
              []
          ;   ", ",
              playground_answer_(Vars)
          ).

    """


    res_ptr = lib.eval_code_c(playground.encode('utf-8'))
    res = ctypes.cast(res_ptr, ctypes.c_char_p).value.decode('utf-8')
    lib.free_c_string(res_ptr)
    return res


if __name__ == '__main__':
    print(eval_and_free(
        """
        fact(1).
        fact(2).
        fact(3).
        """,
        "fact(X)."
    ))
we get
/home/jay/_project/tai/gamedev/ffi-test/env/bin/python /home/jay/_project/tai/gamedev/ffi-test/src/ffigo/ffgo.py 
   X = 1
;  X = 2
;  X = 3.

🥳🥳🥳🥳

Still, I am not a rust (nor scryer/prolog) expert, so I would certainly love feedback (or even disrespect) on this shared library addition so we can make it better.

@jjtolton
Copy link
Author

jjtolton commented Aug 3, 2024

I suppose the next issue is that there does not seem to be any persistence of data between eval calls. If I modify the python code as follows:

def result_set(source, query, n):
    return f"Result Set {n}:\n{eval_and_free(source, query)}"


if __name__ == '__main__':
    print(result_set(
        '''
        fact(1).
        fact(2).
        fact(3).
        ''',
        'fact(X).',
        0))

    print(result_set("", "fact(X).", 1))
    eval_and_free("", "assertz(fact(4)).")
    print(result_set("", "fact(X).", 2))
/home/jay/_project/tai/gamedev/ffi-test/env/bin/python /home/jay/_project/tai/gamedev/ffi-test/src/ffigo/ffgo.py 
Result Set 0:
   X = 1
;  X = 2
;  X = 3.
Result Set 1:
   error(existence_error(procedure,fact/1),fact/1).

Result Set 2:
   error(existence_error(procedure,fact/1),fact/1).

I would have been surprised and delighted had this worked. As you can imagine the lack of persistence between calls would make this a deal breaker for any sort of serious work.

(after this I'm going to start wondering if there isn't some sort of zero-copy pathway available between the scryer system and the calling system 🤔 )

@jjtolton
Copy link
Author

jjtolton commented Aug 4, 2024

Ok, I _think_ I may be close to a way to preserve state between calls.
#[cfg(not(target_arch = "wasm32"))]
use std::ffi::{c_char, CStr, CString};

#[cfg(not(target_arch = "wasm32"))]
use std::collections::HashMap;

#[cfg(not(target_arch = "wasm32"))]
use std::sync::Mutex;

#[cfg(not(target_arch = "wasm32"))]
use lazy_static::lazy_static;

#[cfg(not(target_arch = "wasm32"))]
use machine::mock_wam::*;

#[cfg(not(target_arch = "wasm32"))]
thread_local! {
    pub static MACHINE: RefCell<Option<Machine>> = RefCell::new(None);
}


#[cfg(not(target_arch = "wasm32"))]
#[no_mangle]
pub extern "C" fn machine_new() {
    println!("Engaging the machine!");
    MACHINE.with(|m| *m.borrow_mut() = Some(Machine::with_test_streams()));
}

#[cfg(not(target_arch = "wasm32"))]
#[no_mangle]
pub extern "C" fn machine_free() {
    println!("Releasing the machine!");
    MACHINE.with(|m| *m.borrow_mut() = None);
}


#[cfg(not(target_arch = "wasm32"))]
#[no_mangle]
pub extern "C" fn eval_code_c(input: *const c_char) -> *mut c_char {
    let c_str = unsafe {
        assert!(!input.is_null());
        CStr::from_ptr(input)
    };
    let r_str = c_str.to_str().expect("Not a valid UTF-8 string");

    let bytes = MACHINE.with(|m| {
        let mut machine = m.borrow_mut();
        let machine = machine.as_mut().expect("Machine not initialized.");

        machine.test_load_string(r_str) 
    });
    let result_str = String::from_utf8_lossy(&bytes).to_string();

    println!("Result: {}", result_str);

    let c_string = CString::new(result_str).expect("Failed to convert to CString");
    c_string.into_raw()
}
#[no_mangle]
pub extern "C" fn free_c_string(ptr: *mut c_char) {
    unsafe {
        if ptr.is_null() {
            return;
        }
        let _ = CString::from_raw(ptr);
    };
}

the only issue is with machine.test_load_string(r_str) -- I'm not sure this doesn't wipe out the context between calls.

I can't really tell what this is doing:
    pub fn test_load_string(&mut self, code: &str) -> Vec<u8> {
        let stream = Stream::from_owned_string(code.to_owned(), &mut self.machine_st.arena);

        self.load_file("<stdin>", stream);
        self.user_output.bytes().map(|b| b.unwrap()).collect()
    }

because I can't tell what this is doing:

    pub fn load_file(&mut self, path: &str, stream: Stream) {
        self.machine_st.registers[1] = stream_as_cell!(stream);
        self.machine_st.registers[2] =
            atom_as_cell!(AtomTable::build_with(&self.machine_st.atom_tbl, path));

        self.run_module_predicate(atom!("loader"), (atom!("file_load"), 2));
    }

but I've realize I have some fundamental misunderstanding about how consults work and how they are different from source code.

The way the playground code gets around this is:

  {source}
  :- use_module(library(charsio)).
      :- use_module(library(iso_ext)).
      :- use_module(library(format)).
      :- use_module(library(dcgs)).
      :- initialization(playground_main)

      playground_main :-
          QueryStr = "{query}",
          read_term_from_chars(QueryStr, Query, [variable_names(Vars)]),
          bb_put(playground_first_answer, true),
          bb_put(playground_pending, true),
          ... snip ...

but hypothetically, in python, I should be able to do something like:

if __name__ == '__main__':
    source = '''
            fact(1).
            fact(2).
            fact(3).
            '''
    lib.machine_new()
    eval_and_free(source)
    print(eval_and_free('?- fact(X).'))
    print(eval_and_free('?- assertz(fact(4)).'))
    eval_and_free('?- fact(X).')
    lib.machine_free()

this raises two problems --

  1. we don't want to prepend the source every time
  2. can't use the initialization trick every time.

I was looking at sockets.pl to try to figure out how persistent sessions with scryer work, but I haven't been able to figure it out yet :/

@hurufu
Copy link
Contributor

hurufu commented Aug 4, 2024

Hi! I'm very new here, so I can't help you with your efforts, but using Prolog code in a library is a non-trivial tasks, because of non-determinism and highly dynamic nature of Prolog, it is deceptively easy to evaluate some simple cases, but for a complex Prolog code there even might be no good representation in your host language! For example, do you have a plan what to do if Prolog program doesn't terminate after giving you couple of answers?

Imho, you can always spawn Scryer as a separate process and communicate with it using pipes or sockets.

@jjtolton
Copy link
Author

jjtolton commented Aug 4, 2024

Great point! I want to use this in a C#/Unity project, and since it's for an off-line single player video game it needs to be embedded.

Regarding representation, I'm fine working with strings, and the output from calls to the scryer shared library could be unparsed as json by scryer itself, and that's fine for me.

@jjtolton jjtolton changed the title add eval_code_c and free_mem to lib.rs for use in other langs exposing scryer prolog functionality in libscryer_prolog.so for client library consumption Aug 5, 2024
@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

Wow, amazing. Rust is really great for contributing to, and the code was well organized so it was possible to follow what was happening enough to hack this together. Great work!

The secret ended up being using the actual machine designed for shared libraries in lib_machine.rs!

Let me know if you want me to squash the commits.

@jjtolton jjtolton marked this pull request as ready for review August 5, 2024 01:45
@jjtolton jjtolton changed the title exposing scryer prolog functionality in libscryer_prolog.so for client library consumption ISSUE-2464: exposing scryer prolog functionality in libscryer_prolog.so for client library consumption Aug 5, 2024
@bakaq
Copy link
Contributor

bakaq commented Aug 5, 2024

This looks amazing! Lack of iterator-like generation of answer substitution is one of the biggest limitations of the current approach to using Scryer as a library, and this seems to fix it (I will test it latter).

As for the global thread-local storage, I think it would be best if you represented the state of the machine and query generator as actual objects that need to be passed to functions. This could be done in the C API with opaque pointers, and it's a common pattern in C libraries. This would also enable using multiple machines at the same time if I'm not mistaken. Also, while returning JSON is convenient when used from Python, as a general C library it would be best to actually return the data in a format usable in C. If you need to return multiple values, you can use out pointers. So the header could be something like that:

typedef struct Machine_s Machine;
typedef struct QueryState_d QueryState;

Machine *machine_new();
void machine_free(Machine* machine);

// From what I've seen, the errors that you are emitting in these functions
// would just be undefined behavior in this implementation
QueryState *start_new_query_generator(Machine *machine, char *query);
void cleanup_query_generator(QueryState *query_state);

// If returns NULL, check error_msg
// (What exactly should this return? Should we have a struct for variable
// substitutions and return a list of that?)
char *run_query_generator(Machine *machine, QueryState *query_state, char **error_msg);

// Both return NULL on success, else returns error message
char *load_module_string(Machine *machine, char *module);
char *consult_module_string(Machine *machine, char *module);

// If returns NULL, check error_msg 
char *run_query(Machine *machine, char *query, char **error_msg);

// Run on all the (non-NULL) strings generated by the other functions
void free_c_string(char *str);

I think you already did the hard part of actually threading through the low level implementation (though @mthom and @lucksus may want to check if there is anything wrong there), so tidying up the API is basically just plumbing. I think it's very important to have a decently good C API if we can, although it's probably best to just do something usable if the "right way" would be too complicated (like I think is the case for run_query_generator()).

Another thing is, if I understood how this works, run_query_generator() signals that it ends by returning a result of false. I think this is a bit problematic because there is a meaningful (procedural) difference between a query that returns N answers, and a predicate that returns N answers and then a false. This is sometimes called determinism and I think it would be a good idea for this interface to preserve that difference. Maybe something like this:

// Returns:
//      - 0 on success
//      - 1 on end of results
//      - 2 on error, check error_msg
// (What exactly should this return? Should we have a struct for variable
// substitutions and return a list of that?)
int run_query_generator(Machine *machine, QueryState *query_state, char **error_msg, char **results);

@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

Great thinking! The thread local storage of the machine state I think was the biggest weakness I could see in the implementation, and I think the approach of giving the client a reference to the machine makes perfect sense. I don't work this close to the metal very often and so I don't have much of an imagination for pointers, thank you for demonstrating this pattern! I will give that a shot when I get some time.

@Skgland
Copy link
Contributor

Skgland commented Aug 5, 2024

You will probably want to heap allocate the machine in a box and then use Box::into_raw to turn the box into a raw pointer and then use Box::from_raw to turn it back into a box when freeing the machine.

@bakaq
Copy link
Contributor

bakaq commented Aug 5, 2024

Another thing: as C doesn't have namespacing, it's probably a good idea to give all the C API functions a prefix like scryer_ to avoid name colisions. This is a common practice in C libraries (see SDL for example, where everything is prefixed by SDL_)

@bakaq
Copy link
Contributor

bakaq commented Aug 5, 2024

Ok, I played with this for a bit and everything seems to be working fine (apart from the things I pointed at in my previous comments). Even library(clpz) seems to be working fine (although without residual goals, which is not a big deal), and that already makes it better than the Wasm interface.

@Skgland
Copy link
Contributor

Skgland commented Aug 5, 2024

Maybe instead of putting the c-interface into the crate root (i.e. lib.rs) it would make sense to put it in a submodule e.g. c_interface.rs
then the #[cfg(not(target_arch = "wasm32"))] could be on the module declaration mod c_interface; in lib.rs instead of on each individual item.

@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

You will probably want to heap allocate the machine in a box and then use Box::into_raw to turn the box into a raw pointer and then use Box::from_raw to turn it back into a box when freeing the machine.

Thanks for this. Embarrassingly, I didn't even know it was possible to do what @bakaq suggested

here:

typedef struct Machine_s Machine;
typedef struct QueryState_d QueryState;

Machine *machine_new();
void machine_free(Machine* machine);

// From what I've seen, the errors that you are emitting in these functions
// would just be undefined behavior in this implementation
QueryState *start_new_query_generator(Machine *machine, char *query);
void cleanup_query_generator(QueryState *query_state);

// If returns NULL, check error_msg
// (What exactly should this return? Should we have a struct for variable
// substitutions and return a list of that?)
char *run_query_generator(Machine *machine, QueryState *query_state, char **error_msg);

// Both return NULL on success, else returns error message
char *load_module_string(Machine *machine, char *module);
char *consult_module_string(Machine *machine, char *module);

// If returns NULL, check error_msg 
char *run_query(Machine *machine, char *query, char **error_msg);

// Run on all the (non-NULL) strings generated by the other functions
void free_c_string(char *str);

particularly in regards to the pointer for the machine, so I will dabble with the boxing technique and you can let me know if I got it right.

@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

Ok, I played with this for a bit and everything seems to be working fine (apart from the things I pointed at in my previous comments). Even library(clpz) seems to be working fine (although without residual goals, which is not a big deal), and that already makes it better than the Wasm interface.

I can see now why the WASM is structured the way it is for the scryer playground, but we could/should extend the WASM code (probably separate ticket) to support the same paradigm for folks who wanted to use scryer as a library in javascript without revaluating all the source code on every call and requiring the prelude with the :- initialization that the WASM currently requires to function correctly.

@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

Maybe instead of putting the c-interface into the crate root (i.e. lib.rs) it would make sense to put it in a submodule e.g. c_interface.rs then the #[cfg(not(target_arch = "wasm32"))] could be on the module declaration mod c_interface; in lib.rs instead of on each individual item.

could you possibly elaborate on this a little more? Ohhh I see. The only thing I have to do is "decorate" (Python terminology, not sure what the rust terminology is) the mod c_interface; and then it will apply to everything. That makes sense. Am I correct?

The only reason why we might not do it that way that I can think of is because I can see it being very possible to use some feature directives to make the WASM code and the DLL code be generated from the same file. Of course we could factor out the differences into their own submodules and then conditionally include them into lib.rs.

Or if that's overly complicated we could do two separate pathways with a small amount of code duplication.

@bakaq
Copy link
Contributor

bakaq commented Aug 5, 2024

I can see now why the WASM is structured the way it is for the scryer playground, but we could/should extend the WASM code (probably separate ticket) to support the same paradigm for folks who wanted to use scryer as a library in javascript without revaluating all the source code on every call and requiring the prelude with the :- initialization that the WASM currently requires to function correctly.

Yes! Indeed this should definitively be available (and be the default recommended way) in every way to embed Scryer (Rust library, Wasm and C library). After your work here, it shouldn't be very difficult to adapt for the other cases. This will be immediately useful in the Scryer Playground, although to implement the best toplevel experience there we will need more than this (some interface to pipe stdin and stdout of the embedded machine), but this is out of scope for this PR.

@Skgland
Copy link
Contributor

Skgland commented Aug 5, 2024

Maybe instead of putting the c-interface into the crate root (i.e. lib.rs) it would make sense to put it in a submodule e.g. c_interface.rs then the #[cfg(not(target_arch = "wasm32"))] could be on the module declaration mod c_interface; in lib.rs instead of on each individual item.

could you possibly elaborate on this a little more? Ohhh I see. The only thing I have to do is "decorate" (Python terminology, not sure what the rust terminology is) the mod c_interface; and then it will apply to everything. That makes sense. Am I correct?

yeah, I think you got what I was trying to say

The only reason why we might not do it that way that I can think of is because I can see it being very possible to use some feature directives to make the WASM code and the DLL code be generated from the same file. Of course we could factor out the differences into their own submodules and then conditionally include them into lib.rs.

Or if that's overly complicated we could do two separate pathways with a small amount of code duplication.

maybe it then makes sense to have the following module structure (using inline modules for simplicity)

// in lib.rs
mod native_interface {
   #[cfg(not(target_arch = "wasm32"))]
   mod c_interface {
     // non-wasm stuff goes here
   }
   
   #[cfg(target_arch = "wasm32")]
   mod wasm_interface {
     // wasm specific stuff goes here
   }
   
   #[cfg(not(target_arch = "wasm32"))]
   use c_interface::*;

   #[cfg(target_arch = "wasm32")]
   use wasm_interface ::*;
   
   // common stuff goes here
}

using non-inline modules this would result in a file structure like

src\
|
+ lib.rs
+ native_interface.rs
+ native_interface/
|  |
|  + c_interface.rs
|  + wasm_interface.rs

@Skgland
Copy link
Contributor

Skgland commented Aug 5, 2024

Based on the docs they are called attributes though I am not sure what the standard verb is that is used when applying them, I tend to use annotate, the docs appear to use apply.

@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

tl;dr requesting help on avoiding some footguns in the implementation and usage of scryer as an embedded library based on past experience doing similar embedded language integrations

Ok, I played with this for a bit and everything seems to be working fine (apart from the things I pointed at in my previous comments). Even library(clpz) seems to be working fine (although without residual goals, which is not a big deal), and that already makes it better than the Wasm interface.

I can see now why the WASM is structured the way it is for the scryer playground, but we could/should extend the WASM code (probably separate ticket) to support the same paradigm for folks who wanted to use scryer as a library in javascript without revaluating all the source code on every call and requiring the prelude with the :- initialization that the WASM currently requires to function correctly.

By the way, what is a "residual goal"?

One thing I noticed while developing this (and having developed similar language hacking efforts in the past) is that there tends to always be a bit of a "leaky abstraction" when embedding languages inside of each other. Generally speaking, you need to be aware of the idiosyncrasies of the client/host and the shared library language.

Fortunately, with scryer, I this will be a robust and smooth embedding experience thanks to the design of the language itself.

One potential footgun in the implementation, for instance, is I have no idea what the consequence of terminating a generative query early is on the WAM and what steps must be done to cleanup.

I tried to stick as close as I could to the run_query() formula and call self.trust_me() on early termination but... I don't trust it!!

This could lead to some very subtle and infuriating bugs.

Similarly, the way to interact with embedded scryer will likely be quite different from interacting with scryer via @triska 's (wonderful) ediprolog, for instance.

Meaning -- should consulting (via the shared library consult_module_string() replace all existing facts/rules or
should it add to the existing rules? If it should add to them, there should be some documentation about whatever a "discontiguous" fact/rule is and the consequences of it. Otherwise, there should be some documentation about the proper use of dynamic/1, because I noticed that if I declare a fact dynamic and then later include a fact in the consult (as I do in the reference python code), attempting to assertz/1 a new fact with run_query() results in an error. So, static and dynamic facts can apparently not be mixed in the way the shared library is currently implemented, and I imagine that adding rules would not be any easier.

However, exposing the consulting and query tools as they are will lead to user confusion unless we clarify usage patterns (or put in some guard rails) as I'm not even sure what the right answer is here and I've been staring at the source code!

I bet @triska has some solid ideas regarding this and the language nuances here.

@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

This looks amazing! Lack of iterator-like generation of answer substitution is one of the biggest limitations of the current approach to using Scryer as a library, and this seems to fix it (I will test it latter).

As for the global thread-local storage, I think it would be best if you represented the state of the machine and query generator as actual objects that need to be passed to functions. This could be done in the C API with opaque pointers, and it's a common pattern in C libraries. This would also enable using multiple machines at the same time if I'm not mistaken. Also, while returning JSON is convenient when used from Python, as a general C library it would be best to actually return the data in a format usable in C. If you need to return multiple values, you can use out pointers. So the header could be something like that:

typedef struct Machine_s Machine;
typedef struct QueryState_d QueryState;

Machine *machine_new();
void machine_free(Machine* machine);

// From what I've seen, the errors that you are emitting in these functions
// would just be undefined behavior in this implementation
QueryState *start_new_query_generator(Machine *machine, char *query);
void cleanup_query_generator(QueryState *query_state);

// If returns NULL, check error_msg
// (What exactly should this return? Should we have a struct for variable
// substitutions and return a list of that?)
char *run_query_generator(Machine *machine, QueryState *query_state, char **error_msg);

// Both return NULL on success, else returns error message
char *load_module_string(Machine *machine, char *module);
char *consult_module_string(Machine *machine, char *module);

// If returns NULL, check error_msg 
char *run_query(Machine *machine, char *query, char **error_msg);

// Run on all the (non-NULL) strings generated by the other functions
void free_c_string(char *str);

I think you already did the hard part of actually threading through the low level implementation (though @mthom and @lucksus may want to check if there is anything wrong there), so tidying up the API is basically just plumbing. I think it's very important to have a decently good C API if we can, although it's probably best to just do something usable if the "right way" would be too complicated (like I think is the case for run_query_generator()).

Another thing is, if I understood how this works, run_query_generator() signals that it ends by returning a result of false. I think this is a bit problematic because there is a meaningful (procedural) difference between a query that returns N answers, and a predicate that returns N answers and then a false. This is sometimes called determinism and I think it would be a good idea for this interface to preserve that difference. Maybe something like this:

// Returns:
//      - 0 on success
//      - 1 on end of results
//      - 2 on error, check error_msg
// (What exactly should this return? Should we have a struct for variable
// substitutions and return a list of that?)
int run_query_generator(Machine *machine, QueryState *query_state, char **error_msg, char **results);

Ok the comments about JSON you made here and in your comment on the issue are starting to sink in. I'm used to consuming the output in languages like Python, C#, or Clojure, and so from that standpoint, JSON is very convenient and I think not offering a JSON option would be a detriment to adoption unless there was some very clear way to marshal the data.

Also, because I don't operate in the C arena very often, I'm not even sure what the alternative would be 😅

So, if we don't return a JSON string -- what do we return, exactly? A pointer to the raw data structure? Which data structure would this be?

I think it would probably be much more efficient than JSON, I just honestly don't what that would look like -- I assume a pointer to some data structure? And then the client would need to make some kind of struct to extract the data? And what scryer would return from queries are... pointers?

Thanks for being patient with me again as most of the languages I work with don't even have pointers 😅

@bakaq
Copy link
Contributor

bakaq commented Aug 5, 2024

By the way, what is a "residual goal"?

When using constaint libraries in Scryer that are implemented using attributed variables, like library(dif) or library(clpz), if the constraints are not enough to actually fix what the variable substitutions should be then "residual goals" are shown that indicate the extra information of the constraints. For example:

?- use_module(library(dif)).
   true.
?- dif(X, 1).
   % This is a residual goal, notice that its not in the usual X = _ substitution form,
   % because X still is just a variable that isn't bound to anything, but we do know it's
   % different from 1.
   dif:dif(X,1).
?- dif(X, 1), X = 2.
   % Here we have all the information to actually know what X is, so there are no
   % residual goals.
   X = 2.

Meaning -- should consulting (via the shared library consult_module_string() replace all existing facts/rules or
should it add to the existing rules? If it should add to them, there should be some documentation about whatever a "discontiguous" fact/rule is and the consequences of it. Otherwise, there should be some documentation about the proper use of dynamic/1, because I noticed that if I declare a fact dynamic and then later include a fact in the consult (as I do in the reference python code), attempting to assertz/1 a new fact with run_query() results in an error. So, static and dynamic facts can apparently not be mixed in the way the shared library is currently implemented, and I imagine that adding rules would not be any easier.

I'm pretty sure "consulting" has a pretty well defined meaning (it's whatever [filename] does in the toplevel), even if it's not defined in the ISO Standard. The discontiguous part, however, is defined in the ISO Standard. Basically when you define a clause (a fact or a rule), they need to be together in a file. In case a clause is defined discontiguously, then only the latest contiguous block is used (at least that's how Scryer does it, I'm not sure if this is ISO):

?- [user].
a(1).
a(2).
a(3).

b(1).

a(4).

% Warning: overwriting a/1 because the clauses are discontiguous
?- a(X).
   X = 4.

You can, however, define a clause to be discontiguous with a directive:

?- [user].
:- discontiguous(a/1).

a(1).
a(2).
a(3).

b(1).

a(4).

?- a(X).
   X = 1 % Now it works!
;  X = 2
;  X = 3
;  X = 4.
?- [user].
a(5).
a(6).

?- a(X).
   X = 1
;  X = 2
;  X = 3
;  X = 4
;  X = 5
;  X = 6. % Even over different consults!

A predicate being discontiguous is linearly independent from it being dynamic. There is also multifile that is sometimes relevant.

@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

By the way, what is a "residual goal"?

When using constaint libraries in Scryer that are implemented using attributed variables, like library(dif) or library(clpz), if the constraints are not enough to actually fix what the variable substitutions should be then "residual goals" are shown that indicate the extra information of the constraints. For example:

?- use_module(library(dif)).
   true.
?- dif(X, 1).
   % This is a residual goal, notice that its not in the usual X = _ substitution form,
   % because X still is just a variable that isn't bound to anything, but we do know it's
   % different from 1.
   dif:dif(X,1).
?- dif(X, 1), X = 2.
   % Here we have all the information to actually know what X is, so there are no
   % residual goals.
   X = 2.

Meaning -- should consulting (via the shared library consult_module_string() replace all existing facts/rules or
should it add to the existing rules? If it should add to them, there should be some documentation about whatever a "discontiguous" fact/rule is and the consequences of it. Otherwise, there should be some documentation about the proper use of dynamic/1, because I noticed that if I declare a fact dynamic and then later include a fact in the consult (as I do in the reference python code), attempting to assertz/1 a new fact with run_query() results in an error. So, static and dynamic facts can apparently not be mixed in the way the shared library is currently implemented, and I imagine that adding rules would not be any easier.

I'm pretty sure "consulting" has a pretty well defined meaning (it's whatever [filename] does in the toplevel), even if it's not defined in the ISO Standard. The discontiguous part, however, is defined in the ISO Standard. Basically when you define a clause (a fact or a rule), they need to be together in a file. In case a clause is defined discontiguously, then only the latest contiguous block is used (at least that's how Scryer does it, I'm not sure if this is ISO):

?- [user].
a(1).
a(2).
a(3).

b(1).

a(4).

% Warning: overwriting a/1 because the clauses are discontiguous
?- a(X).
   X = 4.

You can, however, define a clause to be discontiguous with a directive:

?- [user].
:- discontiguous(a/1).

a(1).
a(2).
a(3).

b(1).

a(4).

?- a(X).
   X = 1 % Now it works!
;  X = 2
;  X = 3
;  X = 4.
?- [user].
a(5).
a(6).

?- a(X).
   X = 1
;  X = 2
;  X = 3
;  X = 4
;  X = 5
;  X = 6. % Even over different consults!

A predicate being discontiguous is linearly independent from it being dynamic. There is also multifile that is sometimes relevant.

Great, makes sense. My guess is the proper balance here would be to provide solid documentation and not do anything surprising like "automatically make every fact in a consult discontiguous/dynamic on behalf of the user" (a thought I had considered) because that would probably piss off people who actually know what they are doing with scryer and prolog. Instead, we should provide adequate documentation for client library developers so that they understand these nuances and provide a good interactive experience to their users.

@bakaq
Copy link
Contributor

bakaq commented Aug 5, 2024

[...] JSON is very convenient and I think not offering a JSON option would be a detriment to adoption [...]

Yes, but when doing a C API for embedding usually you are intending to make it usable in every language (that includes languages like C where JSON is very inconvenient). Providing a JSON (or at least some dict like object) interface for higher level languages makes complete sense, but things like that are usually built on the more raw C API by wrapper libraries. So like, the C API is for everyone, for Python you would use "scryer-python" from pip (whenever someone does that) which provides a Pythonic API built on the low level C API. This low level C API to high level language wrapper pattern is very common because it gives the maximum flexibility for everyone.

So, if we don't return a JSON string -- what do we return, exactly?

That is the difficult part that I think we may have to just ignore for now. Usually at this point we have to create data structures in C that can accurately represent Prolog concepts like terms, and all the machinery to deal with them ((de)allocation, construction, inspection, etc...). As you certainly know, if you are just gonna wrap it in Python or C# then JSON is good enough for now, so I think it's fine if you only implement functions like scryer_query_iterator_next_json() (this is my suggestion for the final name of run_query_generator()) that return JSON and leave the low level and more general API for a later PR. I think that error handling should definitively not depend on JSON though.

EDIT: Actually, now that I thought about it, it wouldn't be as difficult as I imagined, because I suppose Scryer already has all the data structures and the machinery to deal with them implemented in Rust. We would need only to make C wrappers in a way that makes sense, but even this seems very hard and tedious.

@triska
Copy link
Contributor

triska commented Aug 5, 2024

EDIT: Actually, now that I thought about it, it wouldn't be as difficult as I imagined, because I suppose Scryer already has all the data structures and the machinery to deal with them implemented in Rust. We would need only to make C wrappers in a way that makes sense

I think this is a very valuable insight, worth pursuing! An ideal API could maybe find a way to make all residual constraints available, with concrete bindings maybe simply a special case of any goal that can appear in answers. This could be very interesting for users of the API who actually want to reason about remaining constraints, such as in theorem provers etc.

@triska
Copy link
Contributor

triska commented Aug 5, 2024

We can use copy_term/3 from library(iso_ext) to obtain a list of constraints a variable is involved in. For instance:

?- X #> 3, copy_term(X, X, Gs).
   Gs = [clpz:(X in 4..sup)], clpz:(X in 4..sup).

@bakaq
Copy link
Contributor

bakaq commented Aug 5, 2024

We can use copy_term/3 from library(iso_ext) to obtain a list of constraints a variable is involved in.

Yes, the fact that copy_term/3 exists is why I said that it's not a big deal, the API for it can kinda be implemented in Prolog itself, but I can see a lot of cases where this could be very inconvenient.

@jjtolton
Copy link
Author

jjtolton commented Aug 23, 2024

I think we are getting there.

|
+ ffi
| |
| + wasm
| + shared_library (with the content of the current dll module)

4768c01

I believe that meets the requirements you were looking for @Skgland regarding file structure

@jjtolton
Copy link
Author

  • let me know if you want prefixing done via cbindgen or with the export directive.

Based on the documentation of #[export_name], that attribute only works on functions and statics, so for renaming constants and types that must be done via cbindgen.

Also I don't think MAX_ARITY should be excluded as knowing an upper bound for the arity could actually be useful.

464a6b5

@jjtolton
Copy link
Author

I wrote a more elaborate test case that is breaking, not printing results correctly. Should probably be included in this ticket.

tl;dr

?- solve(N, Moves).
%@    N = 1, Moves = [from_to(a,c)]
%@ ;  N = 1, Moves = [from_to(a,d)]
%@ ;  ... .

does not print correctly, @bakaq ?

It's coming out

{"result":[{"Moves":["from_to(a","c)"],"N":1}],"status":"ok"}

Copy link
Contributor

@Skgland Skgland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • A couple comments in the test look like they are outdated.
  • Why is wasm-bindgen now added to the general dependencies in addition to the wasm-only dependencies?

For including the docs it might be easier to use a semi-absolute path by making it relative to the project root.
This can be done by using the environment variable CARGO_MANIFEST_DIR. While building crates cargo sets this to the folder containing the Cargo.toml file.

#![doc = include_str!(concat!(env!("CARGO_MAINFEST_DIR"),"/docs/shared_library/README.md"))]

do we need to document setting this environment variable for building crates or is that already handled in the cargo build?

cargo handles setting the environment variable, no need to document it.

Comment on lines 10 to 11
// uncomment if we can figure out why this isn't working
// use crate::lib::dll::{machine_free};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// uncomment if we can figure out why this isn't working
// use crate::lib::dll::{machine_free};

tests/scryer/shared_library_tests.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated
@@ -80,6 +81,7 @@ ego-tree = "0.6.2"

serde_json = "1.0.122"
serde = "1.0.204"
wasm-bindgen = "0.2.93"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is wasm-bindgen now added as a general dependency in addition to a wasm only dependency below? it should still only be necessary when compiling to wasm

@jjtolton
Copy link
Author

jjtolton commented Aug 23, 2024

I wrote a more elaborate test case that is breaking, not printing results correctly. Should probably be included in this ticket.

tl;dr

?- solve(N, Moves).
%@    N = 1, Moves = [from_to(a,c)]
%@ ;  N = 1, Moves = [from_to(a,d)]
%@ ;  ... .

does not print correctly, @bakaq ?

It's coming out

{"result":[{"Moves":["from_to(a","c)"],"N":1}],"status":"ok"}

Okay, the issue occurs in split_nested_list which happens in the broader context of Value::try_from although it seems I'm not used the code reflect in master hmm


Edit:
Ok, I rebased onto master. Results are slightly better, but not parseable as JSON currently:

"[{\"Moves\":[\"from_to\":[a,c]],\"N\":1}]"

Edit2:

This particular case makes me think we should give up on returning results as JSON and just return the raw results from the heap printer. There is no JSON equivalent for this:

?- solve(N, Moves).
%@    N = 1, Moves = [from_to(a,c)]
%@ ;  N = 1, Moves = [from_to(a,d)]
%@ ;  N = 2, Moves = [fill(c),from_to(a,d)]
%@ ;  N = 2, Moves = [fill(d),from_to(a,c)]
%@ ;  ... .

the best we could do would be return fill(d) and from_to(a,c) as a strings: "fill(d)" and "from_to(a,c)".

Excuse my poor Prolog, but even when I attempted this:

?- solve(N, Moves), maplist(write_canonical, Moves).
%@ from_to(a,c)   N = 1, Moves = [from_to(a,c)]
%@ ;  from_to(a,d)N = 1, Moves = [from_to(a,d)]
%@ ;  fill(c)from_to(a,d)N = 2, Moves = [fill(c),from_to(a,d)]
%@ ;  fill(d)from_to(a,c)N = 2, Moves = [fill(d),from_to(a,c)]
%@ ;  ... .

The canonical form of fill(c) is fill(c)! Given this, I think it makes a lot more sense to return the raw strings and abort on the JSON altogether.

To be really clear about it, it's clear from this example JSON does not support my usecase for making this PR in the first place, I'd have to write my own parser for these strings anyway, so we should probably just return the strings!

Edit3:

It looks like @bakaq 's PR #2493 is actually the perfect solution to this, crisis averted!

Edit4:

I rebased onto #2493 and it works great.

{"result":{"bindings":{"Moves":[{"args":[{"atom":"a"},{"atom":"c"}],"functor":"from_to"}],"N":1}},"status":"ok"}

I couldn't imagine anything better!!

It will even be worth redoing all the documentation. 😑 Again. 😭

@jjtolton jjtolton force-pushed the ISSUE-2464/scryer-prolog-shared-lib-eval-code-c branch from 464a6b5 to 7987c4f Compare August 23, 2024 23:52

See [`scryer_python.py`](scryer_python.py) for a complete example.

* Note that the JSON printer returns results that are not completely accurate when it comes to differentiating strings from symbols, e.g.:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case? I think the JSON printer implemented by @bakaq does this correctly!

Comment on lines 225 to 231
let value: Value = serde_json::from_str(&format!("{}", query_resolution_line)).unwrap();
json!( {
"status": "ok",
"result": value
})
.to_string()

This comment was marked as resolved.

Comment on lines 111 to 117
let obj = serde_json::from_str::<serde_json::Value>(&v).expect("Bad JSON");

let output_string = json!({
"status": "ok",
"result": obj
})
.to_string();

This comment was marked as resolved.

@jjtolton jjtolton force-pushed the ISSUE-2464/scryer-prolog-shared-lib-eval-code-c branch from dc1b52f to 8b307a1 Compare August 26, 2024 15:57
@jjtolton
Copy link
Author

jjtolton commented Aug 26, 2024

@Skgland I finally managed to redo the tests for @bakaq 's parser, when I get a chance I will go through and make the changes based on your comments.

@bakaq I made a few comments on your PR #2493 that came up when I was writing tests

@jjtolton
Copy link
Author

Ok, added more tests, cleaned up the tests. All the documentation is already wrong and out of date, so I need to go through and fix that. I think I will wait until @bakaq's #2493 to make any major changes, because I'm going to have to rewrite everything after that anyway.

@jjtolton
Copy link
Author

jjtolton commented Aug 26, 2024

It's also entirely possible that after @Skgland 's PR #2505, we will also accept queries and code in the form of serialized data -- but depending on timing might be better to put that in another PR. However this would be much more secure, especially for queries, than allowing for raw string injection.

@Skgland
Copy link
Contributor

Skgland commented Aug 27, 2024

If the plan is to leave room for non-string based APIs in the he future, are we fine with the string based APIs taking the current names? It feels like we currently use the preferred names for the non-preferred API.

@jjtolton
Copy link
Author

If the plan is to leave room for non-string based APIs in the he future, are we fine with the string based APIs taking the current names? It feels like we currently use the preferred names for the non-preferred API.

I'm fine very with changing them. Any suggestions?

@Skgland
Copy link
Contributor

Skgland commented Aug 28, 2024

First a list of the functions we might want to rename and why

  • run_query_iter takes a prolog term as a string rather than structured input
  • run_query_next returns a json string rather than structured output
  • run_query all of the above

Not sure what to do about consult_module_string I think its probably fine as is, because I find it unlikely one would want to manually construct the structure of a whole module.


My immediate suggestions would be (though I am not completely happy with them)

Old New structured C-API
run_query_iter run_query_string_iter run_query_iter
run_query_next next_query_result_json next_query_result
run_query collect_query_string_results_as_json collect_query_results

On thing that might make this easier to not provide run_query as is, but instead provide a fn all_query_resuls_json(*mut QueryState) -> *const char.

Instead of

Machine *m = scryer_machine_new();
char* resuls = scryer_run_query("A = 'Scryer Prolog'");
scryer_machine_free(m);
scryer_string_free(result);

one could then do

Machine *m = scryer_machine_new();
char* resuls = scryer_all_query_resuls_json(scryer_run_query_iter("A = 'Scryer Prolog'"));
scryer_machine_free(m);
scryer_string_free(result);

This could help limit the combinatoric explosion and with all_query_resuls_json taking care of freeing the QueryState as it is used up anyways we don't even need to temporary to remember the pointer so that we can free it.

With that we could use the following names (I like these better than the ones above)

Old New structured C-API
run_query_iter run_query_string run_query
run_query_next next_query_result_json next_query_result
run_query
all_query_results_json all_query_results

or with query removed/as a verb

Old New structured C-API
run_query_iter query_string query
run_query_next next_result_json next_result
run_query
all_results_json all_results

If we already had a C-API ready I would suggest not adding the _json/_string variants and instead suggest two/three functions one for turning a string into the structured input and one/two for turning the structured output into json.


On a side note: on a quick glance it looks like everything in shared_libray only use the public interface of the scryer_prolog library so it (the complete module) could be turned into a separate crate (e.g. scryer_prolog_c_lib) that then depends on scryer_prolog.

Splitting it like this could help to make sure that everything that can be done from C can also be done from rust i.e. the shared library doesn't accidentally depends on or exposes scryer_prolog internal things.

Related to that thought but not this PR: Could this also be done for the wasm interface i.e. move it into a separate scryer_prolog_wasm crate?


I think it would be good to here the others opinion on all of this as well.

@bakaq @triska

  • Do you have name suggestions, or opinions on the suggested names?
  • What about spitting the c/wasm library/interface of into a separate crate?

Also should we maybe move the discussion to a Discussion? This PR thread is growing rather long.

@triska
Copy link
Contributor

triska commented Aug 28, 2024

@jjtolton, @Skgland, @bakaq: First, thank you all for everything you are doing here! I am following your work with great interest and I think many excellent points were raised and have also been addressed already! I'm amazed by the quality and intensity of your contributions.

I hope two additional points may be useful:

  1. There is library(ffi), which is for the direction where we want to use a foreign library from Prolog, which (as I understand it), is the converse of what is being discussed here, where Prolog functionality is made available to other programming languages. Somehow, and strangely enough for two different functionalities that are both provided under the name ffi, the current PR and its documentation and file/directory structure seem not (yet) to cause conflicts with the existing ffi module and its Rust-part implemented by @aarroyoc. Still, I think it is worth reviewing this PR also with this consideration in mind, especially when thinking about how to provide it (crate etc.) and also how to document and explain it.
  2. Regarding the names of functions, the terminology answer may be useful (instead of "result"). As of a few days ago, we also have the concept of leaf answer available, which may also be useful.

Thank you a lot, and please keep up the great work!

@Skgland
Copy link
Contributor

Skgland commented Aug 28, 2024

@triska regarding 1, we had a conflict, but the ffi module by @aarroyoc was moved from the crate root to under the machine module. It did not contain any public API and and was basically only used inside the machine, so moving it there to free up the module in the crate root for the new API seemed reasonable.
Screenshot_20240828-221113.jpg

@jjtolton
Copy link
Author

jjtolton commented Aug 28, 2024

@Skgland
I think these are great changes.

Also, to everyone --

One additional point to consider -- if we are going to change the API like this -- is there any reason at all we should ask the user to keep track of the QueryState object?

I want to emphasize the only reason it exists is because I didn't want to step on any toes by making it an attribute of the Machine, it was primarily for prototyping purposes, and I didn't even think it would work!

The only other reason we would potentially want to have a QueryState distinct from a Machine would be in the event that we could have multiple simultaneous QueryStates in some sort of concurrency scenario. However, from what I can tell, that would be an enormous overhaul and it would probably be more beneficial to simply make use of the continuations library.

Personally, I think overall the API would be much nicer if we got rid of the QueryState pointer entirely and made it intrinsic to the Machine. That would also decrease the number of APIs needed.

@bakaq
Copy link
Contributor

bakaq commented Aug 29, 2024

is there any reason at all we should ask the user to keep track of the QueryState object?

In the Rust side it makes sense because something has to implement Iterator.

Personally, I think overall the API would be much nicer if we got rid of the QueryState pointer entirely and made it intrinsic to the Machine. That would also decrease the number of APIs needed.

I'm really not sure about that. The number of APIs would still be the same, but then the user would have to handle the state implicitly.

// Current
Machine *machine = scryer_machine_new();
QueryState *query_state = scryer_machine_query_iter("X = 1; X = 2.");
char *leaf_answer_1 = scryer_query_next(query_state);
char *leaf_answer_1 = scryer_query_next(query_state);
scryer_query_state_free(query_state);
scryer_free_c_string(leaf_answer_1);
scryer_free_c_string(leaf_answer_2);
scryer_machine_free(machine);

// Query state in Machine struct
Machine *machine = scryer_machine_new();
scryer_machine_query_iter("X = 1; X = 2."); // Magic happens here
char *leaf_answer_1 = scryer_query_next(machine);
char *leaf_answer_2 = scryer_query_next(machine);
scryer_query_state_free(machine); // We would still need to finish the query somehow
scryer_free_c_string(leaf_answer_1);
scryer_free_c_string(leaf_answer_2);
scryer_machine_free(machine);
// But what would happen in this situation?
Machine *machine = scryer_machine_new();
// !!??? We haven't initialized a query yet!
// What should this do? Undefined Behavior? Should we check
// if a query has been initialized?
char *leaf_answer_1 = scryer_query_next(machine);

In my experience with C APIs, OpenGL-like APIs with global/implicit state (which is what GNU Prolog seems to do) are terrible footguns in an already footgun infested language. I like SDL/Vulkan-like APIs with explicit state with opaque pointers much more, because they are harder to misuse.

@jjtolton
Copy link
Author

is there any reason at all we should ask the user to keep track of the QueryState object?

I'm really not sure about that. The number of APIs would still be the same, but then the user would have to handle the state implicitly.

In my experience with C APIs, OpenGL-like APIs with global/implicit state (which is what GNU Prolog seems to do) are terrible footguns in an already footgun infested language. I like SDL/Vulkan-like APIs with explicit state with opaque pointers much more, because they are harder to misuse.

That's all fine. In that case, at least from the shared library perspective, I think we should refer to the query state pointer as a leaf answer cursor rather than a query state, I think it will seem less like a chore and line up with more familiar concepts.

@bakaq
Copy link
Contributor

bakaq commented Aug 29, 2024

Yes, and Value as Term, QueryResolution as CompleteAnswer, QueryResolutionLine as LeafAnswer, etc... I'm using the current terminology used in the code, but we should definitely get better names for everything. I'm planning to deal with other issues (implementation, visibility, etc...) before The Great Renaming™ because that will be the most breaking of changes. Maybe we should do that earlier? See also #2490.

@jjtolton
Copy link
Author

The Great Renaming™
🤣

Comment on lines +289 to +292
if self.source:
self.consult(self.source)
if query:
return self.lazy_eval_context(query)
Copy link

@kasbah kasbah Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving this a bit of a test drive since it's perfect for a prototype I am working on, so thanks so much!

I just wanted to point out quickly that returning three different things like this causes some Python typing issues when used in the way of the examples:

image

Cannot access attribute "lazy_eval_context" for class "ContextManager[Generator[SCRYER_QUERY_RESULT, None, None]]"
  Attribute "lazy_eval_context" is unknown (Pyright reportAttributeAccessIssue)
──────────────────────────────────────────────────────────────────────────────
https://github.com/microsoft/pyright/blob/main/docs/configuration.md#reportAttributeAccessIssue


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ISSUE-2464: Make scryer prolog usable as a shared library for other programs
6 participants