Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Extension Proposal] Antivirus #36611

Closed
melloware opened this issue Oct 21, 2023 · 18 comments · Fixed by quarkiverse/quarkiverse-devops#187
Closed

[Extension Proposal] Antivirus #36611

melloware opened this issue Oct 21, 2023 · 18 comments · Fixed by quarkiverse/quarkiverse-devops#187
Labels
area/quarkiverse This issue/PR is part of the Quarkiverse organization kind/extension-proposal Discuss and Propose new extensions

Comments

@melloware
Copy link
Contributor

Description

Interested in this extension, please +1 via the emoji/reaction feature of GitHub (top right).

These days most applications allow some kind of file upload to either bulk load data, or proof of documentation, or any other good reason. These files should always be scanned for viruses upon entry and rejected if a virus is found.

Implementations

Currently we have implementations for:

  • ClamAV which is a Linux Native antivirus server
  • VirusTotal which is a REST API to check the Hash of a file to see if it has already been reported for viruses

The implementation should allow checking either ClamAV or VirusTotal or BOTH if the user want to check multiple engines.

Future Proof

Allow a pluggable architecture to allow users to plug their own AV Engine in easily.

Dev Service

Allow users to connect to existing ClamAV or start a DevContainer if desired running the latest ClamAV.
image

Dev UI

image

Configuration

image

Health Check

image

Repository name

quarkus-antivirus

Short description

Virus scan files using ClamAV or VirusTotal

Repository Homepage URL

https://www.clamav.net/

Repository Topics

  • quarkus-extension
  • antivirus
  • clamav
  • virustotal
  • security
  • file
    ...

Team Members

  • melloware

Additional context

I have been using this code both in PrimeFaces and at my clients for years.

@melloware melloware added area/quarkiverse This issue/PR is part of the Quarkiverse organization kind/extension-proposal Discuss and Propose new extensions labels Oct 21, 2023
@quarkus-bot
Copy link

quarkus-bot bot commented Oct 21, 2023

/cc @aloubyansky (extension-proposal), @gastaldi (extension-proposal), @gsmet (extension-proposal), @maxandersen (extension-proposal)

@gastaldi
Copy link
Contributor

+1, That sounds interesting

@sberyozkin
Copy link
Member

It looks very interesting, how would it work, will the request go the anti-virus server after the uploaded content has been saved on the disk or as part of the multipart upload processing ? Can it take a lot of request time ?

@gastaldi
Copy link
Contributor

Maybe it can happen after the file is uploaded, in a scheduled task but yeah, that's an interesting question

@melloware
Copy link
Contributor Author

In my clients we do it in real time so as the multi part upload comes in so a virus is never saved to disk first. We have done it on 100mb files and you would be shocked how fast it is to stream to clamAV how fast it is. I will post example code here of what it would look like. But I can tell you in 10 years in production we have never seen performance be an issue and even in theory if it's a little slower most security departments can live with slower if it's stopping a virus from being stored on disk.

@gastaldi
Copy link
Contributor

Do you also upload the 100mb file to the service or just the SHA?

@melloware
Copy link
Contributor Author

For clamAv you stream the whole 100mb to clamav. For virusTotal it's just the SHA and it checks its database to see if it has results for that file from over 70 antivirus engines.

@sberyozkin
Copy link
Member

sberyozkin commented Oct 22, 2023

Forwarding SHA can be a good option when someone is concerned about the performance.

How would, for example, RestEasy Reactive users would use quarkus-antivirus, I believe it may be saving some transient data to the disk for large multipart/form-data, depending on a given part's size. Indeed, seeing some example code would help :-)

Good you plan to make it pluggable, it would be interesting to plugin, for example, an NVD feed check too, though may be it is out of scope and a similar concept you propose but to work with CVEs can be realized in another extension

@melloware
Copy link
Contributor Author

Awesome! Yeah I wanted to give multiple options and allow for users to write their own plug-in for say a proprietary virus engine etc!

I am not as familiar with Reactive but I will test it out also. I will provide performance numbers for 1mb, 10mb, and 100mb files.

@gsmet
Copy link
Member

gsmet commented Oct 23, 2023

Looks wonderful. I was wondering though why you made it one extension with all options in it? Don't you think people will use either one or another?
I'm especially asking that if you plan to add more in the future.

@melloware
Copy link
Contributor Author

Yes we plan on 1) adding more if users donate them 2) made it pluggable so people can implement their own and 3) allow you to call multiple AV if you want to scan it against one or more engines to be sure!

@maxandersen
Copy link
Member

interesting.

I'm wondering if antivirus is the right name/abstraction as it sounds like this is usable for any kind of content scanning/classification?

but that can be evolved in future it it broadens.

@melloware
Copy link
Contributor Author

@maxandersen you might be right as we could also check Mime Type matching etc but for now I thought I would just focus on antivirus? We can always make a new extension after this one gets burned in?

@sberyozkin OK tested with 139MB file VSCode-win32-x64-1.83.1.zip which is my VSCode download and here are the results.

ClamAV: 1.945 seconds
VirusTotal: 1.825 seconds

Told you it was fast!

@gastaldi
Copy link
Contributor

I'm surprised that it takes less than 2s to upload a 139Mb file, scan, and get the results.

@melloware
Copy link
Contributor Author

melloware commented Oct 23, 2023

Here is my test case:

  1. I scan the file for virus.
  2. Reset the inputStream
  3. Save it to disk.

The whole thing takes 1.9 seconds on my machine.

    @PUT
    @Consumes(MediaType.MULTIPART_FORM_DATA)
    @Produces(MediaType.TEXT_PLAIN)
    @Path("/upload")
    public Response upload(@MultipartForm @Valid final UploadRequest fileUploadRequest) {
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        final String fileName = fileUploadRequest.getFileName();
        final InputStream data = fileUploadRequest.getData();
        log.infof("Uploading document %s", fileUploadRequest);
        try {
            final ByteArrayInputStream inputStream = new ByteArrayInputStream(
                    IOUtils.toBufferedInputStream(data).readAllBytes());
            engine.scan(fileName, inputStream);
            inputStream.reset();

            // write the file out to disk
            final File tempFile = File.createTempFile("fileName", "tmp");
            tempFile.deleteOnExit();
            FileOutputStream outputStream = new FileOutputStream(tempFile);
            IOUtils.copy(inputStream, outputStream);
            log.infof("File '%s' is successfully uploaded.", fileName);
        } catch (AntivirusException | IOException e) {
            throw new BadRequestException(e);
        } finally {
            stopWatch.stop();
            log.infof("File '%s' processed in %s.", fileName, stopWatch.toString());
        }

        return Response.ok(stopWatch.toString()).status(Response.Status.CREATED).build();
    }

@gastaldi
Copy link
Contributor

The fact that you read the whole file contents in memory is a bit concerning (at least from a resource usage perspective), but I understand that this is a test case and that you can do it from a temporary file

@melloware
Copy link
Contributor Author

melloware commented Oct 23, 2023

Yep and because the InputStream from JAXRS when you call reset says "this stream is not markable or resettable" so you have to make a copy of the stream if you wan to both scan it in ClamAV and then Save it to disk if it passes.

@melloware
Copy link
Contributor Author

I put a Beta 0.0.1 out if anyone wants to try it and submit feedback. @sberyozkin I documented in the README.MD how to plug your own AV engine in.

https://github.com/quarkiverse/quarkus-antivirus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/quarkiverse This issue/PR is part of the Quarkiverse organization kind/extension-proposal Discuss and Propose new extensions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants