Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] CAPE Extractor Output Standardization #1716

Closed
cccs-rs opened this issue Aug 25, 2023 · 13 comments
Closed

[Feature Request] CAPE Extractor Output Standardization #1716

cccs-rs opened this issue Aug 25, 2023 · 13 comments
Assignees

Comments

@cccs-rs
Copy link
Contributor

cccs-rs commented Aug 25, 2023

Hey guys! Long time no see 🤓

Expected Behavior

When running a CAPE extractor located in ./modules/processing/parsers/CAPE/, we'd like to know what format to expect certain information in a deterministic manner. Similar to other extractor frameworks like MWCP and MACO.

Current Behavior

Currently there's no way to tell the output format of the extractor without examining the source code which is inconvenient if trying to run multiple extractors in an automated system like Assemblyline.

Possible Solutions?

  • CAPE creates its own output standard for it's extractors that are documented
  • CAPE could adopt an existing standard (like MACO) similar to the PR created a year ago, but this would imply there are also UI changes to account for the data in new format if used in capesandbox.com
  • The extractors could implement a secondary function that we can call manually to translate the configuration into another structured format

Context

Old PR: #1037

I look forward to picking up on our discussion!

@doomedraven
Copy link
Collaborator

doomedraven commented Aug 26, 2023

leaving this to @kevoreilly as i don't use/care about public extractors :)

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Sep 8, 2023

@kevoreilly any spare time to discuss this??

@kevoreilly
Copy link
Owner

I am planning to implement MACO output in addition to 'raw' output. I plan to achieve this by the addition of a conversion function that can be called from a distinct entrypoint than the current extract_config().

The idea is for the conversion to internally call extract_config to obtain the 'raw' config dict then iterate through the fields converting to the MACO standard.

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Sep 8, 2023

That sounds great! Is there anything I can do on my side to help further along the implementation? Maybe add this new entrypoint to the existing parsers that you have?

@doomedraven
Copy link
Collaborator

someone need to review all the parsers and make standard naming of things like: cnc, cncs, c2, c2s, and being strings and lists etc. that would be first step, second step is easy wrapper that transforms that

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Sep 8, 2023

As an aside, not super important, what do you guys think about putting the extractors in a separate repository and submoduling it into this repository. Similar to what you guys do with the test data?

Only reason why I ask is because we currently have to clone the whole repository just to get the extractors, which is a bit much but isn't the end of world if we have to.

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Sep 8, 2023

someone need to review all the parsers and make standard naming of things like: cnc, cncs, c2, c2s, and being strings and lists etc. that would be first step, second step is easy wrapper that transforms that

We can start doing a review on our side and try to start creating a standard for naming for things. Once we have that settled, we'll reach out to you guys to see what you guys think.

If everyone is happy with the naming, then we can start implementing the MACO conversion entrypoint in each parser and PR it.

How does that sound?

@doomedraven
Copy link
Collaborator

doomedraven commented Sep 8, 2023 via email

@kevoreilly
Copy link
Owner

kevoreilly commented Sep 21, 2023

The review and MACO stuff sounds good to me too, happy to help with some conversion functions to get it going.

The idea of putting the extractors separate to the main repo doesn't sound unreasonable, the only problem with that is the parsers that depend on dynamic config extraction - obviously they won't be any good to anyone outside of cape! While there may only be a couple of these currently (Ursnif and AgentTesla off the top of my head) I am hoping to make many more like this in the near future!

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Sep 22, 2023

Here's something we've whipped up.

The idea would be that the extractors would be written following MACO, and would probably go into a maco subdirectory rather than the cape directory, and from the output you would be able to call flattened() to condense the model into something that's more friendly for the UI + other people that prefer a flatter representation of the output.

Example 1 - RedLineStealer:

CAPE_RLS

  1. Is the current representation of the RLS configuration
  2. Is the RLS configuration in the MACO format
  3. Is the RLS configution in the MACO format after it's been flattened()

Our thoughts we were to keep as much detail as possible while condensing the output.

Example 2 - AgentTesla

image
Similar representation when converting from MACO to flattened() in the previous example but highlights how we would show some nested types like SMTP

Example 3 - Qakbot

image
In this particular example, the obvious diff is how we handle multiple C2s while maintaining the context of the network requests (User Agent, request method, headers) and correlating the data by using an identifier when calling flattened().

Would love to hear your thoughts and would appreciate any feedback you can provide 😀

@doomedraven
Copy link
Collaborator

hello, well i have suggestion, we had the same problem few years ago at $dayjob. And our own solution was different.

  1. We enforce standard of the naming
  2. Write a wrapper that traduces config to any other formats(different ingestión tools wants diferente formats)
  3. We had different entry poins. Ex:
    1. raw: config_extract
    2. maco: config_extract_maco - would call to config_extract inside to get the raw config, run wrapper on it for transformation and return MACO formatted config
  4. This worked very well for us and makes it super easy to integrate to any other tools our extractors with easy mod for their expected output

just my 2 cents :)

@kevoreilly
Copy link
Owner

Yep that is my preferred way too as I am a stickler for the 'raw' config output in addition to a normalised one.

@doomedraven
Copy link
Collaborator

moved to internal chat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants