Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend #108

Closed
wants to merge 17 commits into from
Closed

Backend #108

wants to merge 17 commits into from

Conversation

prabhant
Copy link

@prabhant prabhant commented Oct 8, 2020

Replacing custom backend with dask distributed

@PGijsbers
Copy link
Member

PGijsbers commented Oct 9, 2020

It looks like there are style warnings for some files. Did you install pre-commit? See this documentation to help you get started.

@PGijsbers
Copy link
Member

I tried to look into this today. Unfortunately I did not get very far.
I observe a number of warnings being reported by dask, most commonly:

distributed.worker - WARNING -  Compute Failed
Function:  evaluate_individual
args:      (<gama.genetic_programming.components.individual.Individual object at 0x0000021B53B9E430>)
kwargs:    {}
Exception: TimeoutException()

but (when testing) also variations of

distributed.scheduler - ERROR - Workers don't have promised key: [], evaluate_individual-148f7e8adaded337558582d6544dadd1
NoneType: None

distributed.client - WARNING - Couldn't gather 3 keys, rescheduling {'evaluate_individual-ad040a1c4d388915d6ef85532435447f': (), 'evaluate_individual-dc88f8d70fb4dc6880a3d7f81cb34163': (), 'evaluate_individual-148f7e8adaded337558582d6544dadd1': ()}

I tried to reproduce a minimal working example, but

from dask.distributed import Client, as_completed
import stopit


def stopit_work(base):
    with stopit.ThreadingTimeout(1) as c_mgr:
        do_compute = base
        while True:
            do_compute *= base
            do_compute /= base
    if not c_mgr:
        return "Stopped due to timeout"
    return "Done"


def main(func):
    with stopit.ThreadingTimeout(5) as c_mgr:
        with Client() as client:
            ac = as_completed(client.map(func, range(1, 10_000)))
            for future in ac:
                print(future.result())
    print("done")


if __name__ == '__main__':
    main(stopit_work)

seems to function as expected.

I also tried to modify evaluate_pipeline but no variation truly solved the issue.
Finally I tried to re-integrate dask.distributed myself from develop (referencing your work and dask docs) for RandomSearch only, but still seemed to get the same warnings and errors present in the backend branch.

@PGijsbers
Copy link
Member

Also my host machine seems to close connections [WinError 10054] An existing connection was forcibly closed by the remote host, not sure what is happening there. Also seems to occur randomly (when running the same script multiple times, it only shows up sometimes).

@PGijsbers
Copy link
Member

Looks like the TimeoutException warnings don't occur if the timeout is always set to at least one second. So I'll do that for now. But I am unsure about the other random errors/warnings.

@PGijsbers
Copy link
Member

Closing this PR as the #dask branch is much further along with the integration and addresses most issues raised in this PR. (Though that branch also still has problems, so will possibly never be merged >:| )

With your permission I will also remove the stale branch.

@PGijsbers PGijsbers closed this Sep 14, 2022
@PGijsbers
Copy link
Member

Thanks for the effort and first exploration though, it did still help in creating the second iteration 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants