Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple devices #372

Closed
4 tasks done
mato opened this issue Jun 12, 2019 · 17 comments
Closed
4 tasks done

Support for multiple devices #372

mato opened this issue Jun 12, 2019 · 17 comments
Assignees
Labels
api change Issue and/or PRs require a Solo5 API/ABI change design Design / discussion. enhancement

Comments

@mato
Copy link
Member

mato commented Jun 12, 2019

This issue sets out the high-level design of upcoming support for multiple devices in Solo5, from the point of view of the unikernel application ("developer") and deployment operator ("user"). It is intended to be read/discussed in conjunction with the work in progress implementation in PR #373.

Steps to completion:

  1. High-level design document (this issue).
  2. Implementation for hvt, spt, virtio (partial), genode ((WIP) Implement support for multiple devices #373 Multiple devices support, application manifest #375).
  3. Implementation for muen (Implement Multidevice support for Muen #402).
  4. MirageOS integration (see Solo5: Multiple devices support, spt target mirage/mirage#992).

Application manifest

We introduce the concept of an application manifest, which is defined by the developer at application build time, using the following JSON format and customarily named manifest.json:

{
    "version" = 1,
    "devices" = [
        { "name" = "<NAME>", "type" = "<TYPE>" },
        ...
    ]
}

NAME is a human-readable unique identifier for the device. The intention is that the name conveys some meaning of the intended use ("wiring") of the device, eg. frontend or storage. NAME must be composed of alphanumeric characters only, and within 1..67 characters in length.

TYPE is the type of device being declared, currently BLOCK_BASIC or NET_BASIC.

Note that NAME must be unique, and there is a maximum limit of 64 devices in the manifest.

At build time, manifest.json is pre-processed by the newly introduced solo5-mfttool, generating a C source file with a binary representation. This source file is then compiled using the normal Solo5 tool chain, and linked into the unikernel binary, where it is represented as an ELF NOTE.

API changes

We introduce two new data types, solo5_handle_t to represent an opaque "handle" to a device, and solo5_handle_set_t to represent a set of up to 64 handles represented as a bitmask.

The Solo5 public API will change as follows (abridged):

void solo5_yield(solo5_time_t deadline, solo5_handle_set_t *ready_set);
solo5_result_t solo5_net_acquire(const char *name, solo5_handle_t *handle,
        struct solo5_net_info *info);
solo5_result_t solo5_net_write(solo5_handle_t handle, const uint8_t *buf,
        size_t size);
solo5_result_t solo5_net_read(solo5_handle_t handle, uint8_t *buf,
        size_t size, size_t *read_size);
solo5_result_t solo5_block_acquire(const char *name, solo5_handle_t *handle,
        struct solo5_block_info *info);
solo5_result_t solo5_block_write(solo5_handle_t handle, solo5_off_t offset,
        const uint8_t *buf, size_t size);
solo5_result_t solo5_block_read(solo5_handle_t handle, solo5_off_t offset,
        uint8_t *buf, size_t size);

Notably, solo5_yield() now returns void and, if requested, a solo5_handle_set_t as an out parameter with the set of handles ready for input. The solo5_..._info() calls have been renamed to solo5_..._acquire() ("acquire a handle for name") and in addition to the existing info structures return a solo5_handle_t. The actual block and network I/O calls now require a solo5_handle_t.

Tender changes (spt and hvt)

From the operational side, the command line syntax used by the tenders for attaching devices to the unikernel changes as follows:

Network (TAP) devices

  --net:NAME=IFACE | @NN (attach tap at IFACE or at fd @NN as network NAME)
[ --net-mac:NAME=HWADDR ] (set HWADDR for network NAME)

Block devices

--block:NAME=PATH (attach block device/file at PATH as block storage NAME)

In both cases, NAME must be specified and match the device declared as name in the application manifest. The declared device's type must also match the device being attached.

Before launching the unikernel, the tender will verify that all declared devices have been attached. This is intentional, and enforces the contract between the unikernel and tender, to ensure that a misconfigured combination of devices cannot be provided to the unikernel.


All those affected by this change, please review this design at your earliest convenience for any showstoppers. /cc @djwillia @ricarkol @hannesm @Kensan @ehmry

@mato mato added enhancement api change Issue and/or PRs require a Solo5 API/ABI change design Design / discussion. labels Jun 12, 2019
@mato mato self-assigned this Jun 12, 2019
@hannesm
Copy link
Contributor

hannesm commented Jun 12, 2019

some comments:

  1. while there's a limit of 64 devices in the manifest (which i find sensible), I can't see any documentation about the "name" length (and allowed characters) -- I'd expect "printable ascii characters" (i.e. no control sequences) and the size up to 32 bytes (or whatever number seems suitable to you)?

  2. FWIW I think a breaking change is fine (i.e. no need to support the old --net=yyy command line argument -- instead fail hard and only support --net:<NAME>=yyy)

  3. this solo5_handle_t is now around and constructed by yyy_acquire functions -- are there dynamic or static checks that ensure nobody calls block_read on a network device handle?

  4. the network device can be named or a fd, the block device can only be named, for symmetry shouldn't @NN be supported for block devices as well?

@mato
Copy link
Member Author

mato commented Jun 12, 2019

@hannesm:

while there's a limit of 64 devices in the manifest (which i find sensible), I can't see any documentation about the "name" length (and allowed characters) -- I'd expect "printable ascii characters" (i.e. no control sequences) and the size up to 32 bytes (or whatever number seems suitable to you)?

It's currently 31 characters (32 bytes including string terminator) of [A-Z][a-z][0-9]. Yes, this needs to be documented and enforced some more in mfttool.

FWIW I think a breaking change is fine (i.e. no need to support the old --net=yyy command line argument -- instead fail hard and only support --net:=yyy)

Indeed, that's the idea. Supporting the old syntax would be annoying, and unclear what the behaviour would be.

this solo5_handle_t is now around and constructed by yyy_acquire functions -- are there dynamic or static checks that ensure nobody calls block_read on a network device handle?

Yes.

the network device can be named or a fd, the block device can only be named, for symmetry shouldn't @nn be supported for block devices as well?

Sure, but no one's asked for it. Would this be useful to you?

@hannesm
Copy link
Contributor

hannesm commented Jun 12, 2019

@mato thx for your quick reply

31 characters

great! fine with me

block @ - would this be useful for you?

not right now tbh, so best to wait until someone needs it.

@ricarkol
Copy link
Collaborator

Just one comment. The name acquire suggests a one-time operation and a state transition (the device goes from not-acquired to acquired). Looking at the code, the current implementation of the function is more like a query: "what's the handle for this name?". getinfo seems more appropriate than acquire if that's the intent.

Two questions to clarify things for me; I'm assuming the answers are no. Will devices have to be acquired/opened before using them? will a handle be allocated at every acquire (like an fd in posix open)?

@mato
Copy link
Member Author

mato commented Jun 13, 2019

getinfo seems more appropriate than acquire if that's the intent.

Naming is hard. I'm trying to pick a name based on the purpose of the call, not the implementation. getinfo is too weak in that it doesn't suggest at first glance that you'll need to call this to do something with a net/block. open suggests there is a corresponding close, which there is not. Hence acquire. Feel free to suggest more names.

Two questions to clarify things for me; I'm assuming the answers are no. Will devices have to be acquired/opened before using them? will a handle be allocated at every acquire (like an fd in posix open)?

You seem to be mixing intent/purpose and implementation here. Yes, you have to "acquire a handle to" a device, in order to use it? How else would you expect the libOS to get the handle? (Please ignore the implementation!) In terms of allocation, no. You'll always get the same handle back, but again, that happens to be a property of the implementation right now. Unlikely to change though.

@ricarkol
Copy link
Collaborator

Yes, you have to "acquire a handle to" a device, in order to use it?

Sorry, i was thinking about the initialization of the device, like a TAP device getting initialized lazily in hvt only after the first acquire. That could be interesting to improve startup times even more (if the net device is only needed at the end of the execution for example).

@mato
Copy link
Member Author

mato commented Jun 19, 2019

Sorry, i was thinking about the initialization of the device, like a TAP device getting initialized lazily in hvt only after the first acquire

That would violate the security model and prevent capsicumizing hvt (see #366) and in future doing something similar with seccomp on Linux.

@mato mato mentioned this issue Jun 24, 2019
@ehmry
Copy link
Contributor

ehmry commented Jun 28, 2019

Regarding the size of device names, for block devices I think its worth storing at least 64 bytes, because then a unikernel can reference a static device image with 256 bit hash encoded in hexadecimal. Also, UUIDs have a 36 byte encoding. UUIDs being a good way to refer to specfic GPT partitions or partition types.

@mato
Copy link
Member Author

mato commented Jun 28, 2019

@ehmry:

Regarding the size of device names, for block devices I think its worth storing at least 64 bytes, because then a unikernel can reference a static device image with 256 bit hash encoded in hexadecimal. Also, UUIDs have a 36 byte encoding. UUIDs being a good way to refer to specfic GPT partitions or partition types.

That's a very good point, I was thinking along the same lines myself. Regarding a hex-encoded 256-bit hash, with the final \0 for the string terminator that makes for 65 bytes which wastes some space on alignment. We could up the size of name[] to the nearest 4-byte boundary, so 68 bytes? (67 characters)

@mato
Copy link
Member Author

mato commented Jun 28, 2019

(Edited previous comment with actual alignment values). With a name[68], the total size of a struct mft_entry is 96 bytes (as opposed to the current 64), which is reasonable.

@mato
Copy link
Member Author

mato commented Aug 8, 2019

In #379 (comment), @cfcs brings up a good point, that we have not exhaustively defined how we want to represent the Solo5 and manifest ABI versions in the unikernel binary:

I wonder if we can fit some kind of useful information into the entry rather than having it be a dummy placeholder. The Solo5 ABI version, perhaps?

Currently, we have the following version information in the unikernel binary:

  1. The name ("originator") and type of the manifest ELF NOTE itself, defined as "Solo5" and 0x3154464d ("MFT1") respectively. These define the binary layout of the entire manifest NOTE descriptor from the point of view of the ELF loader/toolchain.
  2. mft->version, exposed in the JSON definition as .version, which is almost but not quite the same as (1), in the sense that any code operating on a struct mft gets to see mft->version but not the ELF note type. Among other things, this ensures that the developer-provided JSON manifest, mfttool and tender are all in agreement on the layout and semantics of the manifest.

What we don't have at the moment is:

  1. Any information about the version of the Solo5 ABI and/or tender/bindings ABI that the binary is expecting. This will become important once we have a stable ABI version and deployments where the tender will be shipped separately from the unikernel.
  2. Any information about the target (spt, hvt, ...) that the unikernel binary is built for. Currently, if you attempt to run a binary for spt with e.g. solo5-hvt, you will just get an obscure error message (fault).

Question: Which of these do we actually want, going forward? My feeling is that to be as future-proof and flexible as possible we should encode all of the above in the binary. That would mean:

  1. Keep as-is. Represents the "outer layer" manifest version.
  2. Keep as-is. Represents the "inner" manifest version, and is coupled to .version in the developer-supplied JSON source.
  3. Add mft->target_abi_version as an uint32_t to denote both the Solo5 and tender/bindings_ ABI version.
  4. Add mft->target_abi as an uint32_t representing an enum of the supported targets (hvt, spt, ...).

This poses the additional question, where in point (3) above, do we want to allow the tender/bindings internal ABI (e.g. in the case of hvt the hypercall ABI) to evolve separately from the Solo5 ABI (function signatures in solo5.h)? If yes, then we need two values here. Irrespective of the option we choose, I would use a single value for each, start with 0 and only set this to 1 once we decide that the ABI is stable.

Thoughts? Especially @ricarkol @cfcs @hannesm. The choices are fairly subtle and will stay with us for some time, so please give this some thought.

@hannesm
Copy link
Contributor

hannesm commented Aug 8, 2019

IIUC, (1) can change when the format needs to change (and existing tenders (solo-hvt) will error if they won't find a MFT1 note section -- there could be an mfttool that emits both MFT1 and MFT2 for smooth transitions), (2) is fine as well (same purpose, different layer).

(4) is as well not very controversial imho (at the moment, I keep this in albatross separately in the certificate (https://github.com/hannesm/albatross/blob/50ed6a8d1ead169b3e322aaccb469e870ad72acc/src/vmm_asn.ml#L54), it would make sense to include this as part of the manifest!

(3) is pretty open-ended -- from an operators point of view, I'd for sure like to have a solo5-hvt binary that can execute as many (differently, separately compiled) unikernels as possible: as long as the core solo5 (solo5_exit, solo5_abort, solo5_clock_monotonic, solo5_clock_wall, solo5_yield, solo5_console_write) ABI does not change, I'd like to use the very same binary -- if some module (block/net) ABI changes, this can be versioned as a different device type (NET_BASIC2, ..). on the other side, keeping all the modules around because someone may use them is maintenance burden in solo5.

(y) I would as well be fine with a single solo5 ABI version (core API + all modules), and a reliable way to extract this number from both the tender (which one it supports -- maybe a solo5-hvt --abi-version, but fine as is for now) and the unikernel (which one it requires - can be taken from the elf section at offset yy) -- so I could match the right solo5-hvt to the unikernel being deployed.

given (y), i don't think we need to evolve the internal ABI separately from the solo5 ABI.

@ricarkol
Copy link
Collaborator

1., 2., and 4. are very clear yesses.

Regarding 3:

deployments where the tender will be shipped separately from the unikernel.

This is important to us.

allow the tender/bindings internal ABI (e.g. in the case of hvt the hypercall ABI) to evolve separately from the Solo5 ABI (function signatures in solo5.h)?

Not required, or at least not worth the maintenance cost.

Regarding 4.

I remember we had some discussion (at Marrakesh) about having universal unikernel binaries that could be executed on all backends. If we ever intend to do so, it might be a good idea to store flags in mft->target_abi instead.

@cfcs
Copy link

cfcs commented Aug 16, 2019

Sorry I'm a bit late with my comments.

the network device can be named or a fd, the block device can only be named, for symmetry shouldn't @nn be supported for block devices as well?
Sure, but no one's asked for it. Would this be useful to you?
This would enable use with Linux' memfd_create, which seems like it could be useful for something. I don't have a specific use case in mind though, but with the sealing API I have a feeling this could be handy for (for instance) sites that need to run many unikernels with read-only data. Another potential use case is passing in stuff when the tender itself is chrooted.

re: 3): I think the main thing is keeping track of compatibility; if a site needs backwards compatiblity they can keep the tender binaries for previous versions around when we make breaking changes where it is not feasible to support multiple ABI versions from the latest tender. I think it's also reasonable to say that the site will always have the latest tender available (meaning not trying to deal with cases where the unikernels are using a more recent ABI version than the tender, and only dealing with cases where the unikernels are older than the tender).

re: 1) 2) 4): That sounds reasonable.

  • I wonder how we intend orchestration frameworks like albatross to learn this information. One possible solution is to use the mfttool to dump the JSON manifest, but having an execve()-based API seems both slow and error-prone to me (coming from the GnuPG world where this is a major problem). Do you think that a libsolo5 exposing this would be appropriate?

@mato
Copy link
Member Author

mato commented Aug 27, 2019

Thanks for the feedback. Having given it some thought, here's what I'll do:

Regarding (1) and (2): i.e. the "outer" and "inner" manifest versions, these will stay as-is.

Regarding (3) and (4): I will add a separate ELF note which will be declared in the bindings (not the manifest!) and contain a major/minor "target abi version" and "target type". These will represent the internal ABI between a tender (or possibly for Muen, the hypervisor) and the bindings. See #386 for the infrastructure needed to support loading different ELF notes from different sources.

Rationale: The internal ABI is represented by everything in hvt_abi.h and spt_abi.h respectively. If those interfaces change in a material way, then the tender is no longer compatible with whichever bindings were linked into the unikernel binary. Therefore, we need a way of versioning this, if for no other purpose than ensuring that the tender errors out on an unsupported ABI version or target mismatch.

Regarding the relationship to the Solo5 unikernel-facing API (solo5.h). The contract there is enforced by the linker as far as it can be with C symbol naming conventions -- while we could do "more" there, I'd prefer to keep it simple/follow POLA. The version of the Solo5 API is the same as the version of Solo5 (bindings) that are used in the build process. Once we get to a 1.0 release, we can declare that we follow semantic versioning and use the Solo5 version number appropriately to indicate breaking API changes to the user (unikernel developer).

Regarding specific comments:

@hannesm:

(y) I would as well be fine with a single solo5 ABI version (core API + all modules), and a reliable way to extract this number from both the tender (which one it supports -- maybe a solo5-hvt --abi-version, but fine as is for now) and the unikernel (which one it requires - can be taken from the elf section at offset yy) -- so I could match the right solo5-hvt to the unikernel being deployed.

This is effectively what you'll get from (3) and (4) in an ABI ELF note, sans the "building a tender with a non-default or custom set of modules" -- If you do that you'll need to invent your own convention of some kind. I will add --abi-version or similar to the tender, and likewise mfttool (bintool? better name?) will be able to extract the expected values from a binary for you.

(3) is pretty open-ended -- from an operators point of view, I'd for sure like to have a solo5-hvt binary that can execute as many (differently, separately compiled) unikernels as possible

It is extremely unlikely that we'll go down the route of tenders supporting multiple ABI or manifest versions, mainly due to maintenance load and not wanting to keep old code around (that way lies QEMU!). But, as they say, "no is temporary, yes is forever" and the design does give us the option of doing this.

@ricarkol:

I remember we had some discussion (at Marrakesh) about having universal unikernel binaries that could be executed on all backends. If we ever intend to do so, it might be a good idea to store flags in mft->target_abi instead.

Unlikely to happen in any near future. If it did, then it'd be a new "universal" target type. What kind of flags did you have in mind? Might be worth adding a couple of uint32_t reserved0, reserved1 for this purpose to the ABI note.

@cfcs:

I wonder how we intend orchestration frameworks like albatross to learn this information. One possible solution is to use the mfttool to dump the JSON manifest, but having an execve()-based API seems both slow and error-prone to me (coming from the GnuPG world where this is a major problem). Do you think that a libsolo5 exposing this would be appropriate?

This is what I had in mind. Loading the ELF notes correctly is fiddly at best -- I certainly don't want to provide a library for it at this stage (and deal with the extra maintenance overhead), so it'd be exec() based at least in the medium term. Of course, if you want to write an ELF NOTE loader in OCaml for albatross, there's nothing stopping you from doing that :-)

@cfcs
Copy link

cfcs commented Aug 27, 2019

@mato that makes sense, I added an issue about sandboxing the mfttool executable. I do actually have some bastardized ELF parsing code lying around, but it's using deprecated parser tooling and would have to undergo a substantial rewrite to be useful again.

@mato
Copy link
Member Author

mato commented Sep 18, 2019

Removed the release-engineering related steps from this issue as they're not directly related. The implementations here are now complete, so all done here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Issue and/or PRs require a Solo5 API/ABI change design Design / discussion. enhancement
Projects
None yet
Development

No branches or pull requests

5 participants