Skip to content

Commit

Permalink
Refactor metadata storage in AOT artifacts (bytecodealliance#5153)
Browse files Browse the repository at this point in the history
* Refactor metadata storage in AOT artifacts

This commit is a reorganization of how metadata is stored in Wasmtime's
compiled artifacts. Currently Wasmtime's ELF artifacts have data
appended after them to contain metadata about the `Engine` as well as
type information for the module itself. This extra data at the end of
the file is ignored by ELF-related utilities generally and is assembled
during the module serialization process.

In working on AOT-compiling components, though, I've discovered a number
of issues with this:

* Primarily it's possible to mistakenly change an artifact if it's
  deserialized and then serialized again. This issue is probably
  theoretical but the deserialized artifact records the `Engine`
  configuration at time of creation but when re-serializing that it
  serializes the current `Engine` state, not the original `Engine`
  state.

* Additionally the serialization strategy here is tightly coupled to
  `Module` and its serialization format. While this makes sense it is
  not conducive for future refactorings to use a similar serialization
  format for components. The engine metadata, for example, does not
  necessarily need to be tied up with type information.

* The storage for this extra metadata is a bit wonky by shoving it at
  the end of the ELF file. The original reason for this was to have a
  compiled artifact be multiple objects concatenated with each other to
  support serializing module-linking-using modules. Module linking is no
  longer a thing and I have since decided that for the component model
  all compilation artifacts will go into one object file to assist
  debugability. This means that the extra stick-it-at-the-end is no
  longer necessary.

To solve these issues this commit splits up the
`module/serialization.rs` file in two, mostly moving the logic to
`engine/serialization.rs`. The engine serialization logic now handles
everything related to `Engine` compatibility such as targets, compiler
flags, wasm features, etc. The module serialization logic is now
exclusively interested in type information.

The engine metadata and serialized type information additionally live in
sections of the final file now instead of at the end. This means that
there are three primary `bincode`-encoded sections that are parsed on
deserializing a file:

1. The `Engine`-specific metadata. This will be the same for both
   modules and components.
2. The `CompiledModuleInfo` structure. For core wasm there's just one of
   these but for the component model there will be multiple, one per
   core wasm module.
3. The type information. For core wasm this is a `ModuleTypes` but for a
   component this will be a `ComponentTypes`.

No true functional change is expected from this commit. Binary artifacts
might get inflated by a small handful of bytes due to using ELF sections
to represent this now.

A related change I made during this commit as well was the plumbing of
the `is_branch_protection_enabled` flag. This is technically
`Engine`-level metadata but I didn't want to plumb it all over the place
as was done now, so instead a new section was added to the final binary
just for this bti information. This means that it no longer needs to be
a parameter to `CodeMemory::publish` and additionally is more amenable
to a `Component`-is-just-one-object world where no single module owns
this piece of metadata.

* Exclude some functions in a cranelift-less build
  • Loading branch information
alexcrichton authored Oct 29, 2022
1 parent 2fb76be commit 434fbf2
Show file tree
Hide file tree
Showing 9 changed files with 747 additions and 793 deletions.
37 changes: 26 additions & 11 deletions crates/jit/src/code_memory.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
//! Memory management for executable code.

use crate::unwind::UnwindRegistration;
use anyhow::{bail, Context, Result};
use anyhow::{anyhow, bail, Context, Result};
use object::read::{File, Object, ObjectSection};
use std::ffi::c_void;
use std::mem::ManuallyDrop;
use wasmtime_jit_icache_coherence as icache_coherence;
use wasmtime_runtime::MmapVec;

/// Name of the section in ELF files indicating that branch protection was
/// enabled for the compiled code.
pub const ELF_WASM_BTI: &str = ".wasmtime.bti";

/// Management of executable memory within a `MmapVec`
///
/// This type consumes ownership of a region of memory and will manage the
Expand Down Expand Up @@ -80,7 +84,7 @@ impl CodeMemory {
/// After this function executes all JIT code should be ready to execute.
/// The various parsed results of the internals of the `MmapVec` are
/// returned through the `Publish` structure.
pub fn publish(&mut self, enable_branch_protection: bool) -> Result<Publish<'_>> {
pub fn publish(&mut self) -> Result<Publish<'_>> {
assert!(!self.published);
self.published = true;

Expand All @@ -92,7 +96,10 @@ impl CodeMemory {
};
let mmap_ptr = self.mmap.as_ptr() as u64;

// Sanity-check that all sections are aligned correctly.
// Sanity-check that all sections are aligned correctly and
// additionally probe for a few sections that we're interested in.
let mut enable_branch_protection = None;
let mut text = None;
for section in ret.obj.sections() {
let data = match section.data() {
Ok(data) => data,
Expand All @@ -108,17 +115,25 @@ impl CodeMemory {
section.align()
);
}
}

// Find the `.text` section with executable code in it.
let text = match ret.obj.section_by_name(".text") {
Some(section) => section,
match section.name().unwrap_or("") {
ELF_WASM_BTI => match data.len() {
1 => enable_branch_protection = Some(data[0] != 0),
_ => bail!("invalid `{ELF_WASM_BTI}` section"),
},
".text" => {
ret.text = data;
text = Some(section);
}
_ => {}
}
}
let enable_branch_protection =
enable_branch_protection.ok_or_else(|| anyhow!("missing `{ELF_WASM_BTI}` section"))?;
let text = match text {
Some(text) => text,
None => return Ok(ret),
};
ret.text = match text.data() {
Ok(data) if !data.is_empty() => data,
_ => return Ok(ret),
};

// The unsafety here comes from a few things:
//
Expand Down
7 changes: 1 addition & 6 deletions crates/jit/src/instantiate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -136,9 +136,6 @@ struct Metadata {
/// Note that even if this flag is `true` sections may be missing if they
/// weren't found in the original wasm module itself.
has_wasm_debuginfo: bool,

/// Whether or not branch protection is enabled.
is_branch_protection_enabled: bool,
}

/// Finishes compilation of the `translation` specified, producing the final
Expand All @@ -163,7 +160,6 @@ pub fn finish_compile(
funcs: PrimaryMap<DefinedFuncIndex, FunctionInfo>,
trampolines: Vec<Trampoline>,
tunables: &Tunables,
is_branch_protection_enabled: bool,
) -> Result<(MmapVec, CompiledModuleInfo)> {
let ModuleTranslation {
mut module,
Expand Down Expand Up @@ -269,7 +265,6 @@ pub fn finish_compile(
has_unparsed_debuginfo,
code_section_offset: debuginfo.wasm_file.code_section_offset,
has_wasm_debuginfo: tunables.parse_wasm_debuginfo,
is_branch_protection_enabled,
},
};
bincode::serialize_into(&mut bytes, &info)?;
Expand Down Expand Up @@ -500,7 +495,7 @@ impl CompiledModule {
dwarf_sections,
};
ret.code_memory
.publish(ret.meta.is_branch_protection_enabled)
.publish()
.context("failed to publish code memory")?;
ret.register_debug_and_profiling(profiler)?;

Expand Down
2 changes: 1 addition & 1 deletion crates/jit/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ mod instantiate;
mod profiling;
mod unwind;

pub use crate::code_memory::CodeMemory;
pub use crate::code_memory::{CodeMemory, ELF_WASM_BTI};
pub use crate::instantiate::{
finish_compile, mmap_vec_from_obj, subslice_range, CompiledModule, CompiledModuleInfo,
SetupError, SymbolizeContext,
Expand Down
5 changes: 3 additions & 2 deletions crates/wasmtime/src/component/component.rs
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ impl Component {
// do so. This should build up a mapping from
// `SignatureIndex` to `VMSharedSignatureIndex` once and
// then reuse that for each module somehow.
Module::from_parts(engine, mmap, info, types.clone())
Module::from_parts(engine, mmap, Some((info, types.clone().into())))
})?;

Ok(modules.into_iter().collect::<PrimaryMap<_, _>>())
Expand All @@ -164,7 +164,7 @@ impl Component {
let static_modules = static_modules?;
let (lowerings, always_trap, transcoders, trampolines, trampoline_obj) = trampolines?;
let mut trampoline_obj = CodeMemory::new(trampoline_obj);
let code = trampoline_obj.publish(engine.compiler().is_branch_protection_enabled())?;
let code = trampoline_obj.publish()?;
let text = wasmtime_jit::subslice_range(code.text, code.mmap);

// This map is used to register all known tramplines in the
Expand Down Expand Up @@ -266,6 +266,7 @@ impl Component {
trampolines?,
&mut obj,
)?;
engine.append_bti(&mut obj);
return Ok((
lower,
traps,
Expand Down
52 changes: 47 additions & 5 deletions crates/wasmtime/src/engine.rs
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
use crate::signatures::SignatureRegistry;
use crate::Config;
use anyhow::Result;
use anyhow::{Context, Result};
use object::write::{Object, StandardSegment};
use object::SectionKind;
use once_cell::sync::OnceCell;
#[cfg(feature = "parallel-compilation")]
use rayon::prelude::*;
use std::path::Path;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
#[cfg(feature = "cache")]
use wasmtime_cache::CacheConfig;
use wasmtime_environ::FlagValue;
use wasmtime_jit::ProfilingAgent;
use wasmtime_runtime::{debug_builtins, CompiledModuleIdAllocator, InstanceAllocator};
use wasmtime_runtime::{debug_builtins, CompiledModuleIdAllocator, InstanceAllocator, MmapVec};

mod serialization;

/// An `Engine` which is a global context for compilation and management of wasm
/// modules.
Expand Down Expand Up @@ -216,9 +221,8 @@ impl Engine {
pub fn precompile_module(&self, bytes: &[u8]) -> Result<Vec<u8>> {
#[cfg(feature = "wat")]
let bytes = wat::parse_bytes(&bytes)?;
let (mmap, _, types) = crate::Module::build_artifacts(self, &bytes)?;
crate::module::SerializedModule::from_artifacts(self, &mmap, &types)
.to_bytes(&self.config().module_version)
let (mmap, _) = crate::Module::build_artifacts(self, &bytes)?;
Ok(mmap.to_vec())
}

pub(crate) fn run_maybe_parallel<
Expand Down Expand Up @@ -292,6 +296,7 @@ impl Engine {
.clone()
.map_err(anyhow::Error::msg)
}

fn _check_compatible_with_native_host(&self) -> Result<(), String> {
#[cfg(compiler)]
{
Expand Down Expand Up @@ -546,6 +551,43 @@ impl Engine {
flag
))
}

#[cfg(compiler)]
pub(crate) fn append_compiler_info(&self, obj: &mut Object<'_>) {
serialization::append_compiler_info(self, obj);
}

#[cfg(compiler)]
pub(crate) fn append_bti(&self, obj: &mut Object<'_>) {
let section = obj.add_section(
obj.segment_name(StandardSegment::Data).to_vec(),
wasmtime_jit::ELF_WASM_BTI.as_bytes().to_vec(),
SectionKind::ReadOnlyData,
);
let contents = if self.compiler().is_branch_protection_enabled() {
1
} else {
0
};
obj.append_section_data(section, &[contents], 1);
}

pub(crate) fn load_mmap_bytes(&self, bytes: &[u8]) -> Result<MmapVec> {
self.load_mmap(MmapVec::from_slice(bytes)?)
}

pub(crate) fn load_mmap_file(&self, path: &Path) -> Result<MmapVec> {
self.load_mmap(
MmapVec::from_file(path).with_context(|| {
format!("failed to create file mapping for: {}", path.display())
})?,
)
}

fn load_mmap(&self, mmap: MmapVec) -> Result<MmapVec> {
serialization::check_compatible(self, &mmap)?;
Ok(mmap)
}
}

impl Default for Engine {
Expand Down
Loading

0 comments on commit 434fbf2

Please sign in to comment.