Skip to content

Commit

Permalink
Add a new datastore export command (#166)
Browse files Browse the repository at this point in the history
* Add a new `datastore export` command
* Update CHANGELOG
  • Loading branch information
bradlarsen authored Apr 3, 2024
1 parent 3f7c173 commit 1a5d3cd
Show file tree
Hide file tree
Showing 14 changed files with 184 additions and 17 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,25 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
### Additions

- The README now includes several animated GIFs that demonstrate simple example use cases ([#154](https://github.com/praetorian-inc/noseyparker/pull/154)).

- The `report` command now offers a new `--finding-status=STATUS` filtering option, which causes only the findings with the requested status to be reported ([#162](https://github.com/praetorian-inc/noseyparker/pull/162)).

- A new `datastore export` command has been added ([#166](https://github.com/praetorian-inc/noseyparker/pull/166)).
This command exports the essential content from a Nosey Parker datastore as a .tgz file that can be extracted wherever it is needed.

### Changes

- The vendored copy of Boost included in the internal `vectorscan-sys` crate has been removed in favor of using the system-provided Boost ([#150](https://github.com/praetorian-inc/noseyparker/pull/150) from @seqre).
This change is only relevant to building Nosey Parker from source.

- The vendored copy of the Vectorscan regular expression library included in the internal `vectorscan-sys` crate has been removed ([#151](https://github.com/praetorian-inc/noseyparker/pull/151) from @seqre).
Instead, a copy of the Vectorscan 5.4.11 source tarball is included in this repository, and is extracted and patched during the build phase.

- SARIF reporting format is now listed as experimental.

- In the `scan` and `rules` command, the command-line option to load additional rules and rulesets from files has been renamed from `--rules` to `--rules-path`.
The old `--rules` option is still supported as an alias, but this is deprecated and will be removed in the v0.19 release.

- The `rules list` command now includes additional fields when using JSON format ([#161](https://github.com/praetorian-inc/noseyparker/pull/161)).


Expand Down
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion crates/noseyparker-cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ clap_mangen = "0.2"
console = "0.15"
content-guesser = { path = "../content-guesser" }
crossbeam-channel = "0.5"
flate2 = "1.0"
indenter = "0.3"
indicatif = { version = "0.17", features = ["improved_unicode"] }
indoc = "2.0"
Expand All @@ -77,11 +78,13 @@ progress = { path = "../progress" }
rayon = "1.5"
regex = "1.7"
rlimit = "0.10.0"
schemars = { version = "0.8" }
schemars = "0.8"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde-sarif = "0.4"
strum = { version = "0.26", features = ["derive"] }
tar = "0.4"
tempfile = "3.1"
time = "0.3"
tracing = "0.1"
tracing-log = "0.2"
Expand Down
48 changes: 43 additions & 5 deletions crates/noseyparker-cli/src/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ use url::Url;

use noseyparker::git_url::GitUrl;

// -----------------------------------------------------------------------------
// utilities
// -----------------------------------------------------------------------------
#[rustfmt::skip]
fn get_long_version() -> &'static str {
concat!(
Expand Down Expand Up @@ -73,6 +76,13 @@ pub fn validate_github_api_url(github_api_url: &Url, all_github_organizations: b
}
}

fn get_parallelism() -> usize {
match std::thread::available_parallelism() {
Err(_e) => 1,
Ok(v) => v.into(),
}
}

// -----------------------------------------------------------------------------
// command-line args
// -----------------------------------------------------------------------------
Expand Down Expand Up @@ -473,6 +483,9 @@ pub struct DatastoreArgs {
pub enum DatastoreCommand {
/// Initialize a new datastore
Init(DatastoreInitArgs),

/// Export a datastore
Export(DatastoreExportArgs),
}

#[derive(Args, Debug)]
Expand All @@ -489,11 +502,36 @@ pub struct DatastoreInitArgs {
pub datastore: PathBuf,
}

fn get_parallelism() -> usize {
match std::thread::available_parallelism() {
Err(_e) => 1,
Ok(v) => v.into(),
}
#[derive(Args, Debug)]
pub struct DatastoreExportArgs {
#[arg(
long,
short,
value_name = "PATH",
value_hint = ValueHint::DirPath,
env("NP_DATASTORE"),
default_value=DEFAULT_DATASTORE,
)]
/// Datastore to export
pub datastore: PathBuf,

/// Write output to the specified path
#[arg(long, short, value_name = "PATH", value_hint = ValueHint::FilePath)]
pub output: PathBuf,

/// Write output in the specified format
#[arg(long, short, value_name = "FORMAT", default_value = "tgz")]
pub format: DatastoreExportOutputFormat,
}

// -----------------------------------------------------------------------------
// datastore export output format
// -----------------------------------------------------------------------------
#[derive(Copy, Clone, Debug, Display, PartialEq, Eq, PartialOrd, Ord, ValueEnum)]
#[strum(serialize_all = "kebab-case")]
pub enum DatastoreExportOutputFormat {
/// gzipped tarball
Tgz,
}

// -----------------------------------------------------------------------------
Expand Down
65 changes: 57 additions & 8 deletions crates/noseyparker-cli/src/cmd_datastore.rs
Original file line number Diff line number Diff line change
@@ -1,20 +1,69 @@
use anyhow::Result;
use anyhow::{Context, Result};
use tracing::info;

use crate::args;
use crate::args::{DatastoreArgs, DatastoreExportArgs, DatastoreInitArgs, GlobalArgs};
use noseyparker::datastore::Datastore;

pub fn run(global_args: &args::GlobalArgs, args: &args::DatastoreArgs) -> Result<()> {
pub fn run(global_args: &GlobalArgs, args: &DatastoreArgs) -> Result<()> {
use crate::args::DatastoreCommand::*;
match &args.command {
args::DatastoreCommand::Init(args) => cmd_datastore_init(global_args, args),
Init(args) => cmd_datastore_init(global_args, args),
Export(args) => cmd_datastore_export(global_args, args),
}
}

fn cmd_datastore_init(
global_args: &args::GlobalArgs,
args: &args::DatastoreInitArgs,
) -> Result<()> {
fn cmd_datastore_init(global_args: &GlobalArgs, args: &DatastoreInitArgs) -> Result<()> {
let datastore = Datastore::create(&args.datastore, global_args.advanced.sqlite_cache_size)?;
info!("Initialized new datastore at {}", &datastore.root_dir().display());
Ok(())
}

fn cmd_datastore_export(global_args: &GlobalArgs, args: &DatastoreExportArgs) -> Result<()> {
let datastore = Datastore::open(&args.datastore, global_args.advanced.sqlite_cache_size)
.with_context(|| format!("Failed to open datastore at {}", args.datastore.display()))?;
let output_path = &args.output;

// XXX Move this code into datastore.rs?

use crate::args::DatastoreExportOutputFormat::*;
match args.format {
Tgz => {
use flate2::write::GzEncoder;
use std::ffi::OsStr;
use std::path::Path;
use tempfile::NamedTempFile;

let write_tar = |output_path: &Path| -> Result<()> {
let prefix: &OsStr = output_path.file_name().unwrap_or("datastore.tgz".as_ref());

let tmp_output = match output_path.parent() {
Some(p) => NamedTempFile::with_prefix_in(prefix, p),
None => NamedTempFile::with_prefix(prefix),
}?;

let enc = GzEncoder::new(tmp_output, Default::default());
let mut tar = tar::Builder::new(enc);

let root_dir = datastore.root_dir();
tar.append_path_with_name(root_dir.join(".gitignore"), ".gitignore")?;
tar.append_path_with_name(root_dir.join("datastore.db"), "datastore.db")?;
tar.append_dir_all("blobs", datastore.blobs_dir())?;
let tmp_output = tar.into_inner()?.finish()?;

tmp_output.persist(output_path)?;

Ok(())
};

write_tar(&output_path).context("Failed to write tarfile")?;

info!(
"Exported datastore at {} to {}",
&datastore.root_dir().display(),
output_path.display()
);
}
}

Ok(())
}
34 changes: 34 additions & 0 deletions crates/noseyparker-cli/tests/datastore/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
use super::*;

#[test]
fn init() {
let scan_env = ScanEnv::new();
assert_cmd_snapshot!(noseyparker_success!("datastore", "init", "-d", scan_env.dspath()));
}

/// Create a datastore, export it, extract it, and test that Nosey Parker still sees it as a valid
/// datastore.
#[test]
fn export_empty() {
let scan_env = ScanEnv::new();
// create datastore
noseyparker_success!("datastore", "init", "-d", scan_env.dspath());

// export it
let tgz = scan_env.root.child("export.tgz");
noseyparker_success!("datastore", "export", "-d", scan_env.dspath(), "-o", tgz.path());
tgz.assert(predicates::path::is_file());

// extract the archive
let extract_dir = scan_env.root.child("export.np");
std::fs::create_dir(&extract_dir).unwrap();

let file = std::fs::File::open(tgz.path()).unwrap();
let mut archive = tar::Archive::new(flate2::read::GzDecoder::new(file));
archive.unpack(&extract_dir).unwrap();

// make sure the extracted datastore still works
assert_cmd_snapshot!(noseyparker_success!("summarize", "-d", extract_dir.path()));
}

// TODO: add case for exporting to an already-existing output file
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
source: crates/noseyparker-cli/tests/datastore/mod.rs
expression: stdout
---
Rule Total Findings Total Matches
───────────────────────────────────────
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
source: crates/noseyparker-cli/tests/datastore/mod.rs
expression: stderr
---

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
source: crates/noseyparker-cli/tests/datastore/mod.rs
expression: status
---
exit status: 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
source: crates/noseyparker-cli/tests/datastore/mod.rs
expression: stdout
---

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
source: crates/noseyparker-cli/tests/datastore/mod.rs
expression: stderr
---

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
source: crates/noseyparker-cli/tests/datastore/mod.rs
expression: status
---
exit status: 0
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@ Manage datastores
Usage: noseyparker datastore [OPTIONS] <COMMAND>

Commands:
init Initialize a new datastore
help Print this message or the help of the given subcommand(s)
init Initialize a new datastore
export Export a datastore
help Print this message or the help of the given subcommand(s)

Options:
-h, --help
Expand Down Expand Up @@ -70,4 +71,3 @@ Advanced Global Options:

[default: true]
[possible values: true, false]

1 change: 1 addition & 0 deletions crates/noseyparker-cli/tests/test_noseyparker.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
mod common;
use common::*;

mod datastore;
mod github;
mod help;
mod report;
Expand Down

0 comments on commit 1a5d3cd

Please sign in to comment.