Skip to content

Commit

Permalink
Event-based Malware check (#7249)
Browse files Browse the repository at this point in the history
* requirements: Introduce yara

* [WIP] malware/check: SetupPatternCheck

In progress.

Introduces SetupPatternCheck, an implementation of an event-based
check that scans the `setup.py`s of release files for suspicious
patterns.

* malware/checks: Give MalwareCheckBase.run/scan args, kwargs

* malware: Add check preparation

Fiddle with the check/run signature a bit more.

* malware/checks: Unpack file path correctly

* docker-compose: Override FILES_BACKEND for worker

The worker needs to be able to see the "files" virtual host
during development so that malware checks can fetch their underlying
release files.

* [WIP] malware/checks: setup.py extraction

* malware/checks: setup_patterns: Fix enum, seek

* malware/checks: setup_patterns: Apply YARA rules

Each rule match becomes a verdict.

* malware/checks: setup_patterns: Prefer get over filter

* warehouse/{admin,malware}: Consistent enum names

Also enforce uniqueness for enum values.

* warehouse/{admin,malware}: More enum changes

* tests: Update admin, malware tests

* tests: Fix enum, more test fixes

* tests: Add prepare tests

* malware/changes: base: Unpack id correctly

* tests: Begin adding SetupPatternCheck tests

* malware/checks: setup_patterns: Fix enum

* tests: More SetupPatternCheck tests

* warehouse/malware: setup_patterns: Fix enums

* tests: More SetupPatternCheck tests

* tests: Add license header

* malware/checks: setup_patterns: Add TODO

* tests: More SetupPatternCheck tests

* tests: More SetupPatternCheck tests

* tests: Complete extraction tests for SetupPatternCheck

* tests: Fix test

* malware/checks: Add docstring for prepare

* malware/checks: blacken

* malware/checks: Document, expand YARA rules

* tests, warehouse: Restructure utilities

* malware: Order some enums, reduce SetupPatternCheck verdicts

* malware/models: Add missing __lt__

* malware/checks: Always embed the model object in the prepared arguments

Use it instead of performing a DB request in the check itself.

* malware/checks: Avoid raw bytes

* malware/changes: Remove unused import

* tests: Fixup malware tests

* warehouse/malware: blacken

* tests: Fill in malware coverage

* tests, warehouse: Add a benign verdict for SetupPatternCheck

* tests: blacken
  • Loading branch information
woodruffw authored and ewdurbin committed Feb 11, 2020
1 parent 4a9afe0 commit f2b93df
Show file tree
Hide file tree
Showing 24 changed files with 863 additions and 92 deletions.
1 change: 1 addition & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ services:
env_file: dev/environment
environment:
C_FORCE_ROOT: "1"
FILES_BACKEND: "warehouse.packaging.services.LocalFileStorage path=/var/opt/warehouse/packages/ url=http://files:9001/packages/{path}"
links:
- db
- redis
Expand Down
1 change: 1 addition & 0 deletions requirements/main.in
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,6 @@ typeguard
webauthn
whitenoise
WTForms>=2.0.0
yara-python
zope.sqlalchemy
zxcvbn
14 changes: 14 additions & 0 deletions requirements/main.txt
Original file line number Diff line number Diff line change
Expand Up @@ -594,6 +594,20 @@ wired==0.2.1 \
wtforms==2.2.1 \
--hash=sha256:0cdbac3e7f6878086c334aa25dc5a33869a3954e9d1e015130d65a69309b3b61 \
--hash=sha256:e3ee092c827582c50877cdbd49e9ce6d2c5c1f6561f849b3b068c1b8029626f1
yara-python==3.11.0 \
--hash=sha256:105d851e050b32951ee577148c7f1b18c0a7c64432fef8159069191d522fba86 \
--hash=sha256:1d35c7f606465015de02143dfa4e1ad2f4ee85fdb5d5af756b51b2bac62ac7bc \
--hash=sha256:24cd492d6bf8ecedb128f5b02886770be9df03bd1b84ab06a978d45bb1a8ff92 \
--hash=sha256:58cfc837e7769811afbfb19b1db952ec01e50cdbf9df576fb587e1e343694526 \
--hash=sha256:5b8d708751a66d1507d819218d06baccdf5527c147c2bd3062f087e2f367a17d \
--hash=sha256:6f90bb264470235549e1bb4e355fa82895409cd46f27aceecaddfbf55e66ed71 \
--hash=sha256:70d39c2238c5854e7cd8f11595317dc4d89417e88035d8acca24bcc58a93150f \
--hash=sha256:8d255349d69d833bca604b4215bdf499c87357172512273feb934f6442b8e6b2 \
--hash=sha256:8e44f9600607cb1d74a0f26df5d0a1c06ea54f4601206124f47f1bbb58e6a374 \
--hash=sha256:9e4fafc327e3a343c545dcf5f173fa8bc712aebffe5f034d205c0bac1f1c5df6 \
--hash=sha256:c919ee656139ed46a0056e8a3de179bbc98d42a2be6fb85c95b1e2ec65396b34 \
--hash=sha256:e4124414d3cff9a10669569a89f585f81c8114b283ab48b2e756e0347a89de0a \
--hash=sha256:f104f0bb21a0867f22e750bb4e05de629ec9f37facc84daf963385a86371b0d9
zipp==2.1.0 \
--hash=sha256:ccc94ed0909b58ffe34430ea5451f07bc0c76467d7081619a454bf5c98b89e28 \
--hash=sha256:feae2f18633c32fc71f2de629bfb3bd3c9325cd4419642b1f1da42ee488d9b98
Expand Down
8 changes: 6 additions & 2 deletions tests/common/checks/hooked.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,14 @@ class ExampleHookedCheck(MalwareCheckBase):
def __init__(self, db):
super().__init__(db)

def scan(self, file_id=None):
def scan(self, **kwargs):
file_id = kwargs.get("obj_id")
if file_id is None:
return

self.add_verdict(
file_id=file_id,
classification=VerdictClassification.benign,
classification=VerdictClassification.Benign,
confidence=VerdictConfidence.High,
message="Nothing to see here!",
)
12 changes: 6 additions & 6 deletions tests/unit/admin/views/test_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,11 @@ def test_no_check_state(self, db_request):
views.change_check_state(db_request)

@pytest.mark.parametrize(
("final_state"), [MalwareCheckState.disabled, MalwareCheckState.wiped_out]
("final_state"), [MalwareCheckState.Disabled, MalwareCheckState.WipedOut]
)
def test_change_to_valid_state(self, db_request, final_state):
check = MalwareCheckFactory.create(
name="MyCheck", state=MalwareCheckState.disabled
name="MyCheck", state=MalwareCheckState.Disabled
)

db_request.POST = {"check_state": final_state.value}
Expand Down Expand Up @@ -104,7 +104,7 @@ def test_change_to_valid_state(self, db_request, final_state):

assert check.state == final_state

if final_state == MalwareCheckState.wiped_out:
if final_state == MalwareCheckState.WipedOut:
assert wipe_out_recorder.delay.calls == [pretend.call("MyCheck")]

def test_change_to_invalid_state(self, db_request):
Expand Down Expand Up @@ -134,11 +134,11 @@ class TestRunBackfill:
("check_state", "message"),
[
(
MalwareCheckState.disabled,
MalwareCheckState.Disabled,
"Check must be in 'enabled' or 'evaluation' state to run a backfill.",
),
(
MalwareCheckState.wiped_out,
MalwareCheckState.WipedOut,
"Check must be in 'enabled' or 'evaluation' state to run a backfill.",
),
],
Expand All @@ -160,7 +160,7 @@ def test_invalid_backfill_parameters(self, db_request, check_state, message):
assert db_request.session.flash.calls == [pretend.call(message, queue="error")]

def test_sucess(self, db_request):
check = MalwareCheckFactory.create(state=MalwareCheckState.enabled)
check = MalwareCheckFactory.create(state=MalwareCheckState.Enabled)
db_request.matchdict["check_name"] = check.name

db_request.session = pretend.stub(
Expand Down
11 changes: 11 additions & 0 deletions tests/unit/malware/checks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
11 changes: 11 additions & 0 deletions tests/unit/malware/checks/setup_patterns/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
145 changes: 145 additions & 0 deletions tests/unit/malware/checks/setup_patterns/test_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pretend
import pytest
import yara

from warehouse.malware.checks.setup_patterns import check as c
from warehouse.malware.models import (
MalwareCheckState,
VerdictClassification,
VerdictConfidence,
)

from .....common.db.malware import MalwareCheckFactory
from .....common.db.packaging import FileFactory


def test_initializes(db_session):
check_model = MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

assert check.id == check_model.id
assert isinstance(check._yara_rules, yara.Rules)


@pytest.mark.parametrize(
("obj", "file_url"), [(None, pretend.stub()), (pretend.stub(), None)]
)
def test_scan_missing_kwargs(db_session, obj, file_url):
MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)
check.scan(obj=obj, file_url=file_url)

assert check._verdicts == []


def test_scan_non_sdist(db_session):
MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

file = FileFactory.create(packagetype="bdist_wheel")

check.scan(obj=file, file_url=pretend.stub())

assert check._verdicts == []


def test_scan_no_setup_contents(db_session, monkeypatch):
monkeypatch.setattr(
c, "fetch_url_content", pretend.call_recorder(lambda *a: pretend.stub())
)
monkeypatch.setattr(
c, "extract_file_content", pretend.call_recorder(lambda *a: None)
)

MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

file = FileFactory.create(packagetype="sdist")

check.scan(obj=file, file_url=pretend.stub())

assert len(check._verdicts) == 1
assert check._verdicts[0].check_id == check.id
assert check._verdicts[0].file_id == file.id
assert check._verdicts[0].classification == VerdictClassification.Indeterminate
assert check._verdicts[0].confidence == VerdictConfidence.High
assert (
check._verdicts[0].message
== "sdist does not contain a suitable setup.py for analysis"
)


def test_scan_benign_contents(db_session, monkeypatch):
monkeypatch.setattr(
c, "fetch_url_content", pretend.call_recorder(lambda *a: pretend.stub())
)
monkeypatch.setattr(
c,
"extract_file_content",
pretend.call_recorder(lambda *a: b"this is a benign string"),
)

MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

file = FileFactory.create(packagetype="sdist")

check.scan(obj=file, file_url=pretend.stub())

assert len(check._verdicts) == 1
assert check._verdicts[0].check_id == check.id
assert check._verdicts[0].file_id == file.id
assert check._verdicts[0].classification == VerdictClassification.Benign
assert check._verdicts[0].confidence == VerdictConfidence.Low
assert check._verdicts[0].message == "No malicious patterns found in setup.py"


def test_scan_matched_content(db_session, monkeypatch):
monkeypatch.setattr(
c, "fetch_url_content", pretend.call_recorder(lambda *a: pretend.stub())
)
monkeypatch.setattr(
c,
"extract_file_content",
pretend.call_recorder(
lambda *a: b"this looks suspicious: os.system('cat /etc/passwd')"
),
)

MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

file = FileFactory.create(packagetype="sdist")

check.scan(obj=file, file_url=pretend.stub())

assert len(check._verdicts) == 1
assert check._verdicts[0].check_id == check.id
assert check._verdicts[0].file_id == file.id
assert check._verdicts[0].classification == VerdictClassification.Threat
assert check._verdicts[0].confidence == VerdictConfidence.High
assert check._verdicts[0].message == "process_spawn_in_setup"
93 changes: 93 additions & 0 deletions tests/unit/malware/checks/test_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import io
import tarfile
import zipfile

import pretend

from warehouse.malware.checks import utils


def test_fetch_url_content(monkeypatch):
response = pretend.stub(
raise_for_status=pretend.call_recorder(lambda: None), content=b"fake content"
)
requests = pretend.stub(get=pretend.call_recorder(lambda url: response))

monkeypatch.setattr(utils, "requests", requests)

io = utils.fetch_url_content("hxxp://fake_url.com")

assert requests.get.calls == [pretend.call("hxxp://fake_url.com")]
assert response.raise_for_status.calls == [pretend.call()]
assert io.getvalue() == b"fake content"


def test_extract_file_contents_zip():
zipbuf = io.BytesIO()
with zipfile.ZipFile(zipbuf, mode="w") as zipobj:
zipobj.writestr("toplevelgetsskipped", b"nothing to see here")
zipobj.writestr("foo/setup.py", b"these are some contents")
zipbuf.seek(0)

assert utils.extract_file_content(zipbuf, "setup.py") == b"these are some contents"


def test_extract_file_contents_zip_no_file():
zipbuf = io.BytesIO()
with zipfile.ZipFile(zipbuf, mode="w") as zipobj:
zipobj.writestr("foo/notsetup.py", b"these are some contents")
zipbuf.seek(0)

assert utils.extract_file_content(zipbuf, "setup.py") is None


def test_extract_file_contents_tar():
tarbuf = io.BytesIO()
with tarfile.open(fileobj=tarbuf, mode="w:gz") as tarobj:
contents = io.BytesIO(b"these are some contents")
member = tarfile.TarInfo(name="foo/setup.py")
member.size = len(contents.getbuffer())
tarobj.addfile(member, fileobj=contents)

contents = io.BytesIO(b"nothing to see here")
member = tarfile.TarInfo(name="toplevelgetsskipped")
member.size = len(contents.getbuffer())
tarobj.addfile(member, fileobj=contents)
tarbuf.seek(0)

assert utils.extract_file_content(tarbuf, "setup.py") == b"these are some contents"


def test_extract_file_contents_tar_empty():
tarbuf = io.BytesIO(b"invalid tar contents")

assert utils.extract_file_content(tarbuf, "setup.py") is None


def test_extract_file_contents_tar_no_file():
tarbuf = io.BytesIO()
with tarfile.open(fileobj=tarbuf, mode="w:gz") as tarobj:
contents = io.BytesIO(b"these are some contents")
member = tarfile.TarInfo(name="foo/notsetup.py")
member.size = len(contents.getbuffer())
tarobj.addfile(member, fileobj=contents)

contents = io.BytesIO(b"nothing to see here")
member = tarfile.TarInfo(name="toplevelgetsskipped")
member.size = len(contents.getbuffer())
tarobj.addfile(member, fileobj=contents)
tarbuf.seek(0)

assert utils.extract_file_content(tarbuf, "setup.py") is None
Loading

0 comments on commit f2b93df

Please sign in to comment.