Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Sanitize AppendVec's file_size #7373

Merged
merged 11 commits into from
Jan 6, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 111 additions & 14 deletions runtime/src/append_vec.rs
Original file line number Diff line number Diff line change
@@ -21,6 +21,8 @@ macro_rules! align_up {
};
}

const MAXIMUM_APPEND_VEC_FILE_SIZE: usize = 16 * 1024 * 1024 * 1024; // 16 GiB
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be much lower considering it seems AccountsDB never increases its .file_size from DEFAULT_FILE_SIZE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't increase, it may allocate a larger one than default if the store account is larger than the default file size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirmation, then maybe we could lower to be close in relation to #7320.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think this is fine as it is now because we can assume the running system will have at least 16GB RAM as per the doc and we can be relieved we'll never be tripped by the limit at that high number depending on current AccountDB's behavior and its possible future changes.

Or, could this be lower, say, a half?: 8GB


/// Meta contains enough context to recover the index from storage itself
#[derive(Clone, PartialEq, Debug)]
pub struct StoredMeta {
@@ -89,6 +91,9 @@ impl Drop for AppendVec {
impl AppendVec {
#[allow(clippy::mutex_atomic)]
pub fn new(file: &Path, create: bool, size: usize) -> Self {
let initial_len = 0;
AppendVec::sanitize_len_and_size(initial_len, size).unwrap();

if create {
let _ignored = remove_file(file);
if let Some(parent) = file.parent() {
@@ -132,12 +137,46 @@ impl AppendVec {
map,
// This mutex forces append to be single threaded, but concurrent with reads
// See UNSAFE usage in `append_ptr`
append_offset: Mutex::new(0),
current_len: AtomicUsize::new(0),
append_offset: Mutex::new(initial_len),
current_len: AtomicUsize::new(initial_len),
file_size: size as u64,
}
}

#[allow(clippy::mutex_atomic)]
mvines marked this conversation as resolved.
Show resolved Hide resolved
fn new_empty_map(current_len: usize) -> Self {
let map = MmapMut::map_anon(1).expect("failed to map the data file");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error handling is made more critical rather undesirably from Err to just move the code here. But combined with this consideration, this can be tolerated to avoid large refactoring. :) And this mmap will almost never fail.


AppendVec {
path: PathBuf::from(String::default()),
map,
append_offset: Mutex::new(current_len),
current_len: AtomicUsize::new(current_len),
file_size: 0, // will be filled by set_file()
}
}

fn sanitize_len_and_size(current_len: usize, file_size: usize) -> io::Result<()> {
if file_size == 0 {
Err(std::io::Error::new(
std::io::ErrorKind::Other,
format!("too small file size {} for AppendVec", file_size),
))
} else if file_size > MAXIMUM_APPEND_VEC_FILE_SIZE {
Err(std::io::Error::new(
std::io::ErrorKind::Other,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know these Errors are not welcomed as explicitly mentioned by #6446.

format!("too large file size {} for AppendVec", file_size),
))
} else if current_len > file_size {
Err(std::io::Error::new(
std::io::ErrorKind::Other,
format!("current_len is larger than file size ({})", file_size),
))
} else {
Ok(())
}
}

pub fn flush(&self) -> io::Result<()> {
self.map.flush()
}
@@ -174,14 +213,25 @@ impl AppendVec {

#[allow(clippy::mutex_atomic)]
pub fn set_file<P: AsRef<Path>>(&mut self, path: P) -> io::Result<()> {
self.path = path.as_ref().to_path_buf();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only update self.path in an atomic way if everything went successful here.

// this AppendVec must not hold actual file;
assert_eq!(self.file_size, 0);

let data = OpenOptions::new()
.read(true)
.write(true)
.create(false)
.open(&path)?;

let current_len = self.current_len.load(Ordering::Relaxed);
assert_eq!(current_len, *self.append_offset.lock().unwrap());

let file_size = std::fs::metadata(&path)?.len();
AppendVec::sanitize_len_and_size(current_len, file_size as usize)?;

let map = unsafe { MmapMut::map_mut(&data)? };

self.file_size = file_size;
Copy link
Contributor Author

@ryoqun ryoqun Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update file_size correctly, will be used later PRs to check the sum of contained accounts doesn't exceed this and moreover exactly matches to the current_len.

Copy link
Contributor Author

@ryoqun ryoqun Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update file_size correctly, will be used later PRs to check the sum of contained acconts doesn't exceed this and moreover exactly matches to the current_len.

As promised by 2 weeks ago's myself, this is done by this and this.

self.path = path.as_ref().to_path_buf();
self.map = map;
Ok(())
}
@@ -406,24 +456,16 @@ impl<'a> serde::de::Visitor<'a> for AppendVecVisitor {
formatter.write_str("Expecting AppendVec")
}

#[allow(clippy::mutex_atomic)]
// Note this does not initialize a valid Mmap in the AppendVec, needs to be done
// externally
fn visit_bytes<E>(self, data: &[u8]) -> std::result::Result<Self::Value, E>
where
E: serde::de::Error,
{
use serde::de::Error;
let mut rd = Cursor::new(&data[..]);
let current_len: usize = deserialize_from(&mut rd).map_err(Error::custom)?;
let map = MmapMut::map_anon(1).map_err(|e| Error::custom(e.to_string()))?;
Ok(AppendVec {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this logic to newly created fn new_empty_map. I can't live with this code. ;)

path: PathBuf::from(String::default()),
map,
append_offset: Mutex::new(current_len),
current_len: AtomicUsize::new(current_len),
file_size: current_len as u64,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: I changed this behavior because this looked just a wrong thing. New code started to initialize this correctly with actual file size.

Or is this intentional to make appencvecs read-only?

})
// Note this does not initialize a valid Mmap in the AppendVec, needs to be done
// externally
Ok(AppendVec::new_empty_map(current_len))
}
}

@@ -440,6 +482,7 @@ impl<'de> Deserialize<'de> for AppendVec {
pub mod tests {
use super::test_utils::*;
use super::*;
use assert_matches::assert_matches;
use log::*;
use rand::{thread_rng, Rng};
use solana_sdk::timing::duration_as_ms;
@@ -451,6 +494,60 @@ pub mod tests {
}
}

#[test]
#[should_panic(expected = "too small file size 0 for AppendVec")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know those should_panics are rather ugly. I'll brush up once this PR's direction is affirmed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other uses are fixed by now. Only this will remain as this is testing a panic from ::new().

fn test_append_vec_new_bad_size() {
let path = get_append_vec_path("test_append_vec_new_bad_size");
let _av = AppendVec::new(&path.path, true, 0);
}

#[test]
fn test_append_vec_set_file_bad_size() {
let file = get_append_vec_path("test_append_vec_set_file_bad_size");
let path = &file.path;
let mut av = AppendVec::new_empty_map(0);

let _data = OpenOptions::new()
.read(true)
.write(true)
.create(true)
.open(&path)
.expect("create a test file for mmap");

let result = av.set_file(path);
assert_matches!(result, Err(ref message) if message.to_string() == *"too small file size 0 for AppendVec");
}

#[test]
fn test_append_vec_sanitize_len_and_size_too_small() {
let result = AppendVec::sanitize_len_and_size(0, 0);
assert_matches!(result, Err(ref message) if message.to_string() == *"too small file size 0 for AppendVec");
}

#[test]
fn test_append_vec_sanitize_len_and_size_maximum() {
let result = AppendVec::sanitize_len_and_size(0, 16 * 1024 * 1024 * 1024);
assert_matches!(result, Ok(_));
}

#[test]
fn test_append_vec_sanitize_len_and_size_too_large() {
let result = AppendVec::sanitize_len_and_size(0, 16 * 1024 * 1024 * 1024 + 1);
assert_matches!(result, Err(ref message) if message.to_string() == *"too large file size 17179869185 for AppendVec");
}

#[test]
fn test_append_vec_sanitize_len_and_size_full_and_same_as_current_len() {
let result = AppendVec::sanitize_len_and_size(1 * 1024 * 1024, 1 * 1024 * 1024);
assert_matches!(result, Ok(_));
}

#[test]
fn test_append_vec_sanitize_len_and_size_larger_current_len() {
let result = AppendVec::sanitize_len_and_size(1 * 1024 * 1024 + 1, 1 * 1024 * 1024);
assert_matches!(result, Err(ref message) if message.to_string() == *"current_len is larger than file size (1048576)");
}

#[test]
fn test_append_vec_one() {
let path = get_append_vec_path("test_append");