Allowing driver and worker processes to load different modules #5775

timholy · 2014-02-12T11:56:36Z

On a single machine, I have a driver process that needs to interact with the user; it makes use of several graphics/plotting packages & modules. This driver process delegates its computational work to several worker processes, each of which relies on quite a large number of compute-focused packages & modules. Loading all this code into a single process takes approximately 1 minute, which is a large barrier to entry. But in principle, there's no reason the driver process needs to load the compute packages, and there's no reason the worker processes need to load the graphics packages. So it seems this time could be shaved down quite a bit by some partitioning.

I did some experiments trying to get this working, and with current master the strategy I found seems a bit sub-optimal (this PR may improve matters). The main challenge is that the driver needs to call functions inside modules that it knows nothing about. C permits this by allowing you to declare a function without defining it, but I'm not aware of anything similar in Julia. So the solution seems to be to rely on eval inside of Main.

With current master, here are the various files I needed:

module MyWorker

export compute

compute(x) = x+1

end

module Glue

export evalfetch

evalfetch(r::RemoteRef) = eval(Main, fetch(r))

end

module MyDriver

using Glue

export main

function main(wpid)
    # Load the code that the worker process needs to do its job. 
    # It needs to be evaled in Main to prevent the serializer from
    # complaining about not knowing about MyDriver on wpid
    # (this is not an issue if you execute this from the REPL)
    wait(eval(Main, :(@spawnat $wpid include("MyWorker.jl"))))

    # Now set up the function call, but wrap it in an expression to avoid
    # errors due to not knowing about MyWorker
    rr = RemoteRef(wpid)
    x = 15
    put(rr, :(MyWorker.compute($x)))
    return remotecall_fetch(wpid, evalfetch, rr)
end

end

Then run this as follows (I'm using include to avoid any possible funny business with require loading things on all workers):

wpid = addprocs(1)[1]

@everywhere using Glue

include("MyDriver")
MyDriver.main(wpid)

and you get back 16.

In an ideal world, I'd even put these first two lines inside MyDriver.main(); it seems better not to have to bother the user with spawning new processes and loading code on them. But it seems hard to get Glue into MyDriver after-the-fact.

With this PR, in addition to MyWorker.jl, here's what you need:

module SimpleDriver

export main

function main()
    wpid = addprocs(1)[1]
    wait(eval(Main, :(@spawnat $wpid include("MyWorker.jl"))))

    remotecall_fetch(wpid, :(MyWorker.compute), 15)
end

end

include("SimpleDriver.jl")
SimpleDriver.main()

I don't (yet!) have a good understanding of how multi.jl works, so this should be viewed as a first attempt.

timholy · 2014-02-12T14:16:48Z

Ah, I did just discover that the following works on master:

module SimpleDriver

export main

function main()
    wpid = addprocs(1)[1]
    wait(eval(Main, :(@spawnat $wpid include("MyWorker.jl"))))
    eval(Main, :(@everywhere compute(x) = MyWorker.compute(x)))
    remotecall_fetch(wpid, Main.compute, 22)
end

end

So this is not necessary. I guess there's still a question of how easy we can/want to make this kind of thing. I can imagine syntax something like this:

MyWorkerRR = @importat 2 MyWorker   # a ModuleRemoteRef
x = MyWorkerRR.compute(22)          # performs the work on 2

JeffBezanson · 2014-02-13T04:57:41Z

I think something like @fetchfrom wpid eval(Main,:f)(x) would also work, so I don't think we need to add this.

timholy · 2014-02-13T12:14:11Z

It's not quite sufficient on its own, because you get errors like these:

julia> MySimpleDriver.main()
fatal error on 2: ERROR: MySimpleDriver not defined
 in deserialize at serialize.jl:333
 in handle_deserialize at serialize.jl:321
 in deserialize at serialize.jl:371
 in handle_deserialize at serialize.jl:321
 in deserialize at serialize.jl:360
 in handle_deserialize at serialize.jl:321
 in anonymous at task.jl:790
Worker 2 terminated.

The problem seems to be that the caller's module becomes part of the serialized data sent to the worker, and that name is not defined on the worker. (This is the same issue that forces you to wrap include inside an eval(Main, ...).)

However, eval(Main, :(@fetchfrom $wpid eval(Main,:(MyWorker.compute))(22))) does work. That's not entirely transparent, so is it worth considering whether we can make that easier to use?

I've come up with one solution that seems promising; I've updated the code and even written some tests (which, naturally, demonstrate how to use this version). To me this seems much easier to use than anything yet considered, so I'm reopening.

timholy · 2014-02-13T12:46:40Z

The test passed with clang; the gcc error seems to be due to the fact that it didn't even seem to try to get a build started. I didn't find a way to force it to try again.

carlobaldassi · 2014-02-13T13:06:23Z

I didn't find a way to force it to try again.

I'd use

git commit --amend --no-edit

which creates a new commit identical to the previous one but with a different sha1 (assuming you didn't git-add anything in the meanwhile, of course). Then force-push the branch, and it should compile again.

tknopp · 2014-02-13T13:06:52Z

I don't fully see what is not working but in Julietta I use the approach:

function execute(term::Terminal, cmd::String)
  Base.parse_input_line(cmd)
  s = @fetchfrom term.id begin
    ex = expand(ex)
    value = eval(Main,ex)
    eval(Main, :(ans = $(Expr(:quote, value))))
    sprint(Base.showlimited,value)
  end
end

When I call the execute with execute(terminal,"import REPLCompletions") this works and I can use the REPLCompletions from the worker.

timholy · 2014-02-13T13:54:48Z

@carlobaldassi, thanks for the tip. I will use that in the future. I think I came up with a better name for the macro anyway.

@tknopp, not sure I fully understand what you're doing, but what I want to do is the following: process A has module AA loaded, and process B has module BB loaded. From inside A, I want to be able to call BB.functionB(args...) to run on process B and return the results to A, without having to load module BB into process A.

tknopp · 2014-02-13T14:16:59Z

Ok I see. And actually I am doing the same and now I understand why my "driver side" cannot be in a module (see https://github.com/tknopp/Julietta.jl/blob/master/src/Julietta.jl line 1,2) So it seems that when you remove the module scope from module A it works.

timholy · 2014-02-13T14:27:43Z

You'd also need to remove the module scope from B as well. And if they have different functions defined in Main, I'm still not sure you can call one from the other, because the serializer needs to be able to know to serialize myfunction as a function.

One thing to note is that your import REPLCompletions may be loading it into all processes, as can be inferred from

julia> addprocs(1)
1-element Array{Any,1}:
 2

julia> import REPLCompletions

julia> reload("REPLCompletions")
Warning: replacing module REPLCompletions
Warning: replacing module REPLCompletions

The fact that there are two warnings is evidence that both know about REPLCompletions.

What I'm shooting for here is the ability to have complete separation. While my motivation was purely to decrease load times, another potential application is that if I have code that I'm not willing to share but willing to let you test-drive, I could open a TCP/IP port on one of my servers and allow you to execute my code remotely.

tknopp · 2014-02-13T14:38:17Z

Was not aware of that. I thought the module would be only loaded locally and one would need @anywhere to import them on all processes. So is this a bug or a feature?

pao · 2014-02-13T14:53:07Z

@tknopp reminder: @anywhere, with backtick quotes to disable the resulting notification.

timholy · 2014-02-13T15:29:24Z

Mostly a feature, I think. But sometimes you want more fine-grained control.

…parate code on workers

JeffBezanson · 2014-02-13T21:51:05Z

Ah, now I get it. The entire problem is the reference to the enclosing module that the serialized closure has. The closure needs to be "rebased" on top of Main.

Some of the issues can be avoided by avoiding closures. For example remotecall_wait(p, Core.include, filename) should work.

Constructs like @spawnat could "rebase" their closures at compile time so the inner module reference is Main, and any globals inside are fully qualified. That way if nothing from the offending module is actually needed, it will just work.

timholy · 2014-02-13T22:06:55Z

In usage this is proving more fragile than I expected, so if you know how to do this...

timholy · 2014-02-13T22:07:21Z

(I keep getting "cannot serialize a pointer" errors, and can't figure out why)

amitmurthy · 2014-02-14T03:32:57Z

You need to write a custom serialize/deserialize for one of your types maybe? Like this - https://github.com/JuliaLang/julia/pull/5788/files

vtjnash · 2015-08-25T23:19:47Z

Constructs like @spawnat could "rebase" their closures at compile time so the inner module reference is Main, and any globals inside are fully qualified. That way if nothing from the offending module is actually needed, it will just work.

this was one of the hardest parts of tricking the parallel tests into not depending on being run from Main:
165cabc

vtjnash · 2019-06-26T21:09:32Z

I think we've now addressed some of these points now (with @everywhere <workers> <expr> for example) and changed the code design of loading

JeffBezanson closed this Feb 13, 2014

timholy reopened this Feb 13, 2014

Add @spawnat_by_name, includeat, and requireat to simplify running se…

d857fc4

…parate code on workers

JeffBezanson mentioned this pull request Feb 14, 2014

put(RemoteRef(), obj) semantics are troublesome #5819

Closed

jiahao force-pushed the master branch 3 times, most recently from 6c7c7e3 to 1a4c02f Compare October 11, 2014 22:06

jiahao force-pushed the master branch from cdde4df to 7fdc860 Compare October 28, 2014 04:20

MikeInnes force-pushed the master branch from 5c60996 to b1c3df3 Compare November 14, 2014 17:07

vtjnash closed this Jun 26, 2019

DilumAluthge deleted the teh/workermodules branch January 12, 2021 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing driver and worker processes to load different modules #5775

Allowing driver and worker processes to load different modules #5775

timholy commented Feb 12, 2014

timholy commented Feb 12, 2014

JeffBezanson commented Feb 13, 2014

timholy commented Feb 13, 2014

timholy commented Feb 13, 2014

carlobaldassi commented Feb 13, 2014

tknopp commented Feb 13, 2014

timholy commented Feb 13, 2014

tknopp commented Feb 13, 2014

timholy commented Feb 13, 2014

tknopp commented Feb 13, 2014

pao commented Feb 13, 2014

timholy commented Feb 13, 2014

JeffBezanson commented Feb 13, 2014

timholy commented Feb 13, 2014

timholy commented Feb 13, 2014

amitmurthy commented Feb 14, 2014

vtjnash commented Aug 25, 2015

vtjnash commented Jun 26, 2019

Allowing driver and worker processes to load different modules #5775

Allowing driver and worker processes to load different modules #5775

Conversation

timholy commented Feb 12, 2014

timholy commented Feb 12, 2014

JeffBezanson commented Feb 13, 2014

timholy commented Feb 13, 2014

timholy commented Feb 13, 2014

carlobaldassi commented Feb 13, 2014

tknopp commented Feb 13, 2014

timholy commented Feb 13, 2014

tknopp commented Feb 13, 2014

timholy commented Feb 13, 2014

tknopp commented Feb 13, 2014

pao commented Feb 13, 2014

timholy commented Feb 13, 2014

JeffBezanson commented Feb 13, 2014

timholy commented Feb 13, 2014

timholy commented Feb 13, 2014

amitmurthy commented Feb 14, 2014

vtjnash commented Aug 25, 2015

vtjnash commented Jun 26, 2019