Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Execution Support #163

Draft
wants to merge 84 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
7f78397
Introduce RunInitializer and RunResult
PhilippvK Mar 26, 2024
6c0ecc4
drop run.session
PhilippvK Mar 26, 2024
e52b9e8
add run.has_target()
PhilippvK Mar 26, 2024
e59f893
drop run.session 2
PhilippvK Mar 26, 2024
69e8e2f
fix mlonmcu/cli/common.py: runs_per_stage not detected on CMDLINE
PhilippvK Mar 26, 2024
9e08e44
add run.has_target()
PhilippvK Mar 26, 2024
802c666
Context: add MlonMcuContextMinimal
PhilippvK Mar 26, 2024
fae52a6
Introduce RunInitializer and RunResult
PhilippvK Mar 26, 2024
4ba7dea
Run: remove unsused self.result
PhilippvK Mar 26, 2024
9b4808f
Run: add missing self.dir = None
PhilippvK Mar 26, 2024
d4abd6e
Introduce RunInitializer and RunResult 3
PhilippvK Mar 26, 2024
693b5c9
Session: experiment with process_pool feature to deal with GIL
PhilippvK Mar 26, 2024
c44fe66
fixes
PhilippvK Mar 27, 2024
9c94f14
fix typo
PhilippvK Mar 27, 2024
e72c816
etiss: skip get_metrics script if trace_memory is false, and only upd…
PhilippvK Mar 30, 2024
080840c
WIP: add workaround for run dir with unknown session
PhilippvK Mar 30, 2024
d7f5cf3
session: yield futures once they are completed
PhilippvK May 2, 2024
48c132c
session: update pbar in _join_workers
PhilippvK May 2, 2024
3d1568f
session: move pbar utilities to progress.py
PhilippvK May 4, 2024
001fbcb
session.py: move lots of code to schedule.py
PhilippvK May 7, 2024
5543f66
mlonmcuscheduler fixes
PhilippvK May 7, 2024
26a8959
schedule.py: add comments
PhilippvK May 12, 2024
460ded5
Merge remote-tracking branch 'origin/main' into feature-as-completed
PhilippvK May 12, 2024
c796801
lint
PhilippvK May 12, 2024
0783680
Merge branch 'feature-as-completed' into feature-process-pool-new
PhilippvK May 13, 2024
bd1caf9
sessionscheduler fixes
PhilippvK May 13, 2024
5d77ddf
sessionscheduler fixes
PhilippvK May 13, 2024
9ce8c74
begin ArchivedRun impl
PhilippvK May 13, 2024
39c78e9
update RunInitializer impl
PhilippvK May 13, 2024
c31aa3e
revert init_directory workaround
PhilippvK May 13, 2024
34ea7f4
add run.initializer() method (untested)
PhilippvK May 13, 2024
0e49923
schedule.py: cleanup imports
PhilippvK May 13, 2024
c80fb9c
update scheduler.postprocess()
PhilippvK May 13, 2024
f33ee6c
introduce session.use_init_stage
PhilippvK May 13, 2024
e8ab479
sessionscheduler fixes
PhilippvK May 13, 2024
3d244f0
artifacts: use IntFlag instead of Enum for formats
PhilippvK May 17, 2024
6d8a460
platforms: add missing returns to init_directory
PhilippvK May 17, 2024
95fc537
run: add save and cleanup methods for Run & RunInitializer
PhilippvK May 17, 2024
5489883
run.init_directory: pass runs_dir instead of session
PhilippvK May 17, 2024
a648370
add_model: add better error handling
PhilippvK May 17, 2024
98f0b74
run: add save and cleanup methods for Run & RunInitializer
PhilippvK May 17, 2024
ea0ae6d
scheduler: add error msg for unsupported session postprocesses
PhilippvK May 17, 2024
c290081
store run results in scheduler
PhilippvK May 17, 2024
9d3fd1d
session: expose executor as cfg
PhilippvK May 17, 2024
eb1c44d
refactor session.get_reports to use runresults instead of runs
PhilippvK May 17, 2024
2fb3243
session: cleanup imports
PhilippvK May 17, 2024
ef81706
session: add assertion for empty session
PhilippvK May 17, 2024
f11209b
cli: allow _ as dummy model name
PhilippvK May 17, 2024
d3ccdef
cli: implement --initializer arg
PhilippvK May 19, 2024
6d0fc43
cli: implement --noop arg
PhilippvK May 19, 2024
84822b2
implement session.shuffle
PhilippvK May 19, 2024
97fe937
scheduler: pass clean and save option as args
PhilippvK May 19, 2024
88928be
scheduler: improve handling of runinitializers and used stages
PhilippvK May 19, 2024
afac9ba
cli: implement --initializer arg 2
PhilippvK May 19, 2024
79828a5
add missing import
PhilippvK May 19, 2024
3a69b9e
lint code
PhilippvK May 19, 2024
21d976d
session: implement batching
PhilippvK May 19, 2024
0ea803b
cli: fix
PhilippvK May 20, 2024
515e29e
scheduler: implement cmdline executor
PhilippvK May 20, 2024
9d48190
scheduler: fixes
PhilippvK May 20, 2024
d3251d5
Refactor executors into classes
PhilippvK May 21, 2024
e1cd791
add mlonmcu_eval.sh script
PhilippvK May 22, 2024
58cc23f
add missing imports
PhilippvK May 22, 2024
2543184
fix print_report handling in cmdline
PhilippvK May 22, 2024
18f55c0
update mlonmcu_eval.sh script
PhilippvK May 22, 2024
d695c86
update mlonmcu_eval.sh script
PhilippvK May 22, 2024
ec311db
wip: add rpc scripts
PhilippvK May 22, 2024
e50a493
stoire session results in class
PhilippvK May 22, 2024
ed3d537
implement RPCSessionExecutor
PhilippvK May 22, 2024
f4b7698
implement RPCSessionExecutor 2
PhilippvK May 22, 2024
666f405
finish rpc implementation
PhilippvK May 22, 2024
65f247c
update mlonmcu_eval.sh script
PhilippvK May 22, 2024
dbec826
comment out prints
PhilippvK May 22, 2024
5df6e7f
cleanup fix
PhilippvK May 23, 2024
674d6f6
update mlonmcu_eval.sh script
PhilippvK May 23, 2024
facec1f
update mlonmcu_eval.sh
PhilippvK May 24, 2024
5d4285e
update mlonmcu_eval.sh
PhilippvK May 24, 2024
904960c
add eval_outs/ directory
PhilippvK May 24, 2024
8dead62
update mlonmcu_eval.sh
PhilippvK May 24, 2024
e284efe
run: fix handling of missing sub artifacts
PhilippvK May 25, 2024
bd3250e
drop default environment cfg from class
PhilippvK May 29, 2024
69f1fc2
context: update create_session to allow custom dir
PhilippvK May 29, 2024
b6b0358
move eval outs from repo
PhilippvK Jun 7, 2024
525adf8
update mlonmcu eval script
PhilippvK Jun 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 34 additions & 18 deletions mlonmcu/artifact.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#
"""Artifacts defintions internally used to refer to intermediate results."""

from enum import Enum
from enum import IntFlag, auto
from pathlib import Path

from mlonmcu.setup import utils
Expand All @@ -28,24 +28,25 @@
# TODO: decide if inheritance based scheme would fit better


class ArtifactFormat(Enum): # TODO: ArtifactType, ArtifactKind?
# class ArtifactFormat(Enum): # TODO: ArtifactType, ArtifactKind?
class ArtifactFormat(IntFlag):
"""Enumeration of artifact types."""

UNKNOWN = 0
SOURCE = 1
TEXT = 2
MLF = 3
MODEL = 4
IMAGE = 5
DATA = 6
NUMPY = 7
PARAMS = 8
JSON = 9 # TODO: how about YAML or more general: DICT?
PATH = 10 # NOT A DIRECTORY?
RAW = 11
BIN = 11
SHARED_OBJECT = 12 # Here: the parent tar archive
ARCHIVE = 13
UNKNOWN = auto()
SOURCE = auto()
TEXT = auto()
MLF = auto()
MODEL = auto()
IMAGE = auto()
DATA = auto()
NUMPY = auto()
PARAMS = auto()
JSON = auto() # TODO: how about YAML or more general: DICT?
PATH = auto() # NOT A DIRECTORY?
RAW = auto()
BIN = RAW
SHARED_OBJECT = auto() # Here: the parent tar archive
ARCHIVE = auto()


def lookup_artifacts(artifacts, name=None, fmt=None, flags=None, first_only=False):
Expand Down Expand Up @@ -100,12 +101,27 @@ def __init__(
self.optional = optional
self.validate()

def serialize(self):
return {
"name": self.name,
"content": self.content,
"path": str(self.path) if self.path else None,
"data": self.data,
"raw": self.raw,
"fmt": self.fmt.value,
"flags": list(self.flags),
"archive": self.archive,
"optional": self.optional,
}

# TODO: unserialize

def __repr__(self):
return f"Artifact({self.name}, fmt={self.fmt}, flags={self.flags})"

@property
def exported(self):
"""Returns true if the artifact was writtem to disk."""
"""Returns true if the artifact was written to disk."""
return bool(self.path is not None)

def validate(self):
Expand Down
7 changes: 5 additions & 2 deletions mlonmcu/cli/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from mlonmcu.cli.common import kickoff_runs
from mlonmcu.cli.load import handle as handle_load, add_load_options
from mlonmcu.context.context import MlonMcuContext
from mlonmcu.session.run import RunStage
from mlonmcu.session.run import RunStage, RunInitializer
from mlonmcu.platform.lookup import get_platforms_targets, get_platforms_backends
from .helper.parse import (
extract_backend_names,
Expand Down Expand Up @@ -73,9 +73,12 @@ def _handle(args, context, require_target=False):
session = context.sessions[-1]
new_runs = []
for run in session.runs:
if isinstance(run, RunInitializer) and run.frozen:
new_runs.append(run)
continue
for target_name in targets:
for backend_name in backends:
new_run = run.copy()
new_run = run.copy(session=session)
if backend_name is not None:
platform_name = None
for platform in platforms:
Expand Down
23 changes: 20 additions & 3 deletions mlonmcu/cli/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import logging
import argparse

from mlonmcu.config import str2bool
from mlonmcu.platform import get_platforms
from mlonmcu.session.postprocess import SUPPORTED_POSTPROCESSES
from mlonmcu.feature.features import get_available_feature_names
Expand Down Expand Up @@ -93,6 +94,14 @@ def add_flow_options(parser):
choices=get_available_feature_names(),
help="Enabled features for target/framework/backend (choices: %(choices)s)",
)
flow_parser.add_argument(
"--initializer",
type=str,
metavar="INITIALIZER",
nargs="+",
# action="append",
help="List of yml files for initializing runs",
)
flow_parser.add_argument(
"-c",
"--config",
Expand Down Expand Up @@ -154,6 +163,11 @@ def add_gen_args(parser, number):
action="store_true",
help="Display progress bar (default: %(default)s)",
)
flow_parser.add_argument(
"--noop",
action="store_true",
help="Skip processing of runs, just initialize (default: %(default)s)",
)
flow_parser.add_argument(
"--resume",
action="store_true",
Expand Down Expand Up @@ -218,16 +232,18 @@ def kickoff_runs(args, until, context):
assert len(context.sessions) > 0
session = context.sessions[-1]
# session.label = args.label
config = extract_config(args)
config, config_gen = extract_config(args)
# TODO: move into context/session
per_stage = True
print_report = True
if "runs_per_stage" in config:
per_stage = bool(config["runs_per_stage"])
value = config["runs_per_stage"]
per_stage = str2bool(value) if isinstance(value, str) else value
elif "runs_per_stage" in context.environment.vars:
per_stage = bool(context.environment.vars["runs_per_stage"])
if "print_report" in config:
print_report = bool(config["print_report"])
value = config["print_report"]
print_report = str2bool(value) if isinstance(value, str) else value
elif "print_report" in context.environment.vars:
print_report = bool(context.environment.vars["print_report"])
with session:
Expand All @@ -239,6 +255,7 @@ def kickoff_runs(args, until, context):
progress=args.progress,
context=context,
export=True,
noop=args.noop,
)
if not success:
logger.error("At least one error occured!")
Expand Down
9 changes: 6 additions & 3 deletions mlonmcu/cli/compile.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
add_build_options,
)
from mlonmcu.context.context import MlonMcuContext
from mlonmcu.session.run import RunStage
from mlonmcu.session.run import RunStage, RunInitializer
from mlonmcu.platform.lookup import get_platforms_targets
from .helper.parse import extract_target_names, extract_platform_names, extract_config_and_feature_names

Expand Down Expand Up @@ -54,13 +54,16 @@ def _handle(args, context):
session = context.sessions[-1]
new_runs = []
for run in session.runs:
if run.target is None:
if isinstance(run, RunInitializer) and run.frozen:
new_runs.append(run)
continue
if not run.has_target():
# assert run.compile_platform is None
targets_ = targets
else:
targets_ = [None]
for target_name in targets_:
new_run = run.copy()
new_run = run.copy(session=session)
if target_name is not None:
platform_name = None
for platform in platforms:
Expand Down
14 changes: 12 additions & 2 deletions mlonmcu/cli/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
#
"""Command line subcommand for the load stage."""

from pathlib import Path

from mlonmcu.cli.common import (
add_common_options,
add_context_options,
Expand All @@ -30,7 +32,7 @@
from mlonmcu.context.context import MlonMcuContext
from mlonmcu.models import SUPPORTED_FRONTENDS
from mlonmcu.models.lookup import apply_modelgroups
from mlonmcu.session.run import RunStage
from mlonmcu.session.run import RunStage, RunInitializer


def add_load_options(parser):
Expand Down Expand Up @@ -62,11 +64,19 @@ def _handle(args, context):
config = context.environment.vars
new_config, features, gen_config, gen_features = extract_config_and_feature_names(args, context=context)
config.update(new_config)
session = context.get_session(label=args.label, resume=args.resume, config=config)
initializers = args.initializer
if initializers is not None:
for initializer_file in initializers:
initializer_file = Path(initializer_file).resolve()
initializer = RunInitializer.from_file(initializer_file)
session.add_run(initializer, ignore_idx=True)
frontends = extract_frontend_names(args, context=context)
postprocesses = extract_postprocess_names(args, context=context)
session = context.get_session(label=args.label, resume=args.resume, config=config)
models = apply_modelgroups(args.models, context=context)
for model in models:
if model == "_":
continue
for f in gen_features:
for c in gen_config:
all_config = {**config, **c}
Expand Down
25 changes: 20 additions & 5 deletions mlonmcu/context/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

from mlonmcu.utils import ask_user
from mlonmcu.logging import get_logger, set_log_file
from mlonmcu.session.run import Run
from mlonmcu.session.run import Run, ArchivedRun

Check failure on line 31 in mlonmcu/context/context.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/context/context.py#L31

'mlonmcu.session.run.Run' imported but unused (F401)
from mlonmcu.session.session import Session
from mlonmcu.setup.cache import TaskCache
import mlonmcu.setup.utils as utils
Expand Down Expand Up @@ -186,9 +186,9 @@
run_directory = runs_directory / str(rid)
# run_file = run_directory / "run.txt"
# run = Run.from_file(run_file) # TODO: actually implement run restore
run = Run() # TODO: fix
run.archived = True
run.dir = run_directory
run = ArchivedRun.from_dir(run_directory)
# run.archived = True
# run.dir = run_directory
runs.append(run)
session = Session(idx=sid, archived=True, dir=session_directory)
session.runs = runs
Expand Down Expand Up @@ -307,7 +307,12 @@
self.cache = TaskCache()
self.export_paths = set()

def create_session(self, label="", config=None):
def create_session(self, label="", config=None, custom_dir=None):
if custom_dir is not None:
logger.debug("Creating a new session with idx %s", idx)

Check failure on line 312 in mlonmcu/context/context.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/context/context.py#L312

Undefined name 'idx' (F821)
session_dir = Path(custom_dir)
session = Session(idx=None, label=label, dir=session_dir, config=config)
return session
try:
lock = self.latest_session_link_lock.acquire(timeout=10)
except filelock.Timeout as err:
Expand Down Expand Up @@ -574,3 +579,13 @@
logger.debug("Releasing lock on context")
self.deps_lock.release()
return False

def get_read_only_context(self):
return MlonMcuContextMinimal(self)


class MlonMcuContextMinimal:

def __init__(self, context: MlonMcuContext):
self.environment = context.environment
self.cache = context.cache
88 changes: 6 additions & 82 deletions mlonmcu/environment/environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#
import logging

from .config import (

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.RepoConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.FrameworkConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.FrameworkFeatureConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.BackendConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.BackendFeatureConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.TargetConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.TargetFeatureConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.PlatformConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.PlatformFeatureConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.FrontendConfig' imported but unused (F401)

Check failure on line 21 in mlonmcu/environment/environment.py

View workflow job for this annotation

GitHub Actions / Flake8

mlonmcu/environment/environment.py#L21

'.config.FrontendFeatureConfig' imported but unused (F401)
DefaultsConfig,
PathConfig,
RepoConfig,
Expand Down Expand Up @@ -368,90 +368,14 @@
PathConfig("./models"),
],
}
self.repos = {
"tensorflow": RepoConfig("https://github.com/tensorflow/tensorflow.git", ref="v2.5.2"),
"tflite_micro_compiler": RepoConfig(
"https://github.com/cpetig/tflite_micro_compiler.git", ref="master"
), # TODO: freeze ref?
"tvm": RepoConfig(
"https://github.com/tum-ei-eda/tvm.git", ref="tumeda"
), # TODO: use upstream repo with suitable commit?
"utvm_staticrt_codegen": RepoConfig(
"https://github.com/tum-ei-eda/utvm_staticrt_codegen.git", ref="master"
), # TODO: freeze ref?
"etiss": RepoConfig("https://github.com/tum-ei-eda/etiss.git", ref="master"), # TODO: freeze ref?
}
self.frameworks = [
FrameworkConfig(
"tflm",
enabled=True,
backends=[
BackendConfig("tflmc", enabled=True, features=[]),
BackendConfig("tflmi", enabled=True, features=[]),
],
features=[
FrameworkFeatureConfig("muriscvnn", framework="tflm", supported=False),
],
),
FrameworkConfig(
"utvm",
enabled=True,
backends=[
BackendConfig(
"tvmaot",
enabled=True,
features=[
BackendFeatureConfig("unpacked_api", backend="tvmaot", supported=True),
],
),
BackendConfig("tvmrt", enabled=True, features=[]),
BackendConfig("tvmcg", enabled=True, features=[]),
],
features=[
FrameworkFeatureConfig("memplan", framework="utvm", supported=False),
],
),
]
self.frontends = [
FrontendConfig("saved_model", enabled=False),
FrontendConfig("ipynb", enabled=False),
FrontendConfig(
"tflite",
enabled=True,
features=[
FrontendFeatureConfig("packing", frontend="tflite", supported=False),
],
),
]
self.vars = {
"TEST": "abc",
}
self.repos = {}
self.frameworks = []
self.frontends = []
self.vars = {}
self.flags = {}
self.platforms = [
PlatformConfig(
"mlif",
enabled=True,
features=[PlatformFeatureConfig("debug", platform="mlif", supported=True)],
)
]
self.platforms = []
self.toolchains = {}
self.targets = [
TargetConfig(
"etiss_pulpino",
features=[
TargetFeatureConfig("debug", target="etiss_pulpino", supported=True),
TargetFeatureConfig("attach", target="etiss_pulpino", supported=True),
TargetFeatureConfig("trace", target="etiss_pulpino", supported=True),
],
),
TargetConfig(
"host_x86",
features=[
TargetFeatureConfig("debug", target="host_x86", supported=True),
TargetFeatureConfig("attach", target="host_x86", supported=True),
],
),
]
self.targets = []


class UserEnvironment(DefaultEnvironment):
Expand Down
3 changes: 2 additions & 1 deletion mlonmcu/platform/espidf/espidf.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def init_directory(self, path=None, context=None):
if self.project_dir is not None:
self.project_dir.mkdir(exist_ok=True)
logger.debug("Project directory already initialized")
return
return self.project_dir
dir_name = self.name
if path is not None:
self.project_dir = Path(path)
Expand All @@ -146,6 +146,7 @@ def init_directory(self, path=None, context=None):
self.project_dir = Path(self.tempdir.name) / dir_name
logger.debug("Temporary project directory: %s", self.project_dir)
self.project_dir.mkdir(exist_ok=True)
return self.project_dir

def get_supported_targets(self):
text = self.invoke_idf_exe("--list-targets", live=self.print_outputs)
Expand Down
Loading
Loading