Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFP Visualization Example #439

Merged
26 commits merged into from
Nov 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c51076f
dfp viz pipeline
efajardo-nv Nov 4, 2022
98f95ce
readme update
efajardo-nv Nov 4, 2022
a75014f
add dfp viz setup instructions to readme
efajardo-nv Nov 10, 2022
88b92f9
dfp viz readme updates
efajardo-nv Nov 10, 2022
7b212a2
Merge branch 'branch-22.11' of https://github.com/nv-morpheus/Morpheu…
efajardo-nv Nov 10, 2022
f26a2d2
add dfp viz screenshot
efajardo-nv Nov 11, 2022
defd07b
increase dfp viz screenshot size
efajardo-nv Nov 11, 2022
5cfc9e5
duo dfp viz pipeline script fix
efajardo-nv Nov 14, 2022
e4e0cb4
dfp viz readme update
efajardo-nv Nov 14, 2022
b6e6d9f
add dfp viz repo link to readme
efajardo-nv Nov 15, 2022
402f46e
duo dfp viz pipeline update
efajardo-nv Nov 15, 2022
7c987f6
Merge branch 'branch-22.11' of https://github.com/nv-morpheus/Morpheu…
efajardo-nv Nov 15, 2022
5c1311b
style fix
efajardo-nv Nov 15, 2022
8e3b574
update screenshot
efajardo-nv Nov 15, 2022
4ef25da
Merge branch 'branch-22.11' of https://github.com/nv-morpheus/Morpheu…
efajardo-nv Nov 17, 2022
c13069e
Merge branch 'branch-22.11' of https://github.com/nv-morpheus/Morpheu…
efajardo-nv Nov 18, 2022
2fdbff1
readme updates from feedback
efajardo-nv Nov 18, 2022
9d3bb88
readme update
efajardo-nv Nov 18, 2022
e7932f7
readme update
efajardo-nv Nov 18, 2022
670db59
update dfp viz tool link in readme
efajardo-nv Nov 18, 2022
74c9910
Merge branch 'branch-22.11' of https://github.com/nv-morpheus/Morpheu…
efajardo-nv Nov 18, 2022
d6c821c
dfp viz postproc feedback updates
efajardo-nv Nov 18, 2022
6846789
add docstring to dfp viz postproc stage
efajardo-nv Nov 18, 2022
3c389d7
Merge branch 'branch-22.11' of https://github.com/nv-morpheus/Morpheu…
efajardo-nv Nov 18, 2022
9524558
update dfp viz clone from readme
efajardo-nv Nov 18, 2022
01b6556
add symlink to dfp viz app
efajardo-nv Nov 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Copyright (c) 2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import logging
import os
import typing

import pandas as pd
import srf

from morpheus.config import Config
from morpheus.io import serializers
from morpheus.messages import MessageMeta
from morpheus.messages import MultiAEMessage
from morpheus.pipeline.single_port_stage import SinglePortStage
from morpheus.pipeline.stream_pair import StreamPair

logger = logging.getLogger(__name__)


class DFPVizPostprocStage(SinglePortStage):
efajardo-nv marked this conversation as resolved.
Show resolved Hide resolved
"""
DFPVizPostprocStage performs post-processing on DFP inference output. The inference output is converted
to input format expected by the DFP Visualization and saves to multiple files based on specified time
period. Time period to group data by must be one of pandas' offset strings. The default period is one
day (D). The output file will be named by appending period to prefix (e.g. dfp-viz-2022-08-30.csv).

Parameters
----------
c : `morpheus.config.Config`
Pipeline configuration instance.
period : str
Time period to batch input data and save output files by. [default: `D`]
output_dir : str
Directory to which the output files will be written. [default: current directory]
output_prefix : str
Prefix for output files.
"""

def __init__(self, c: Config, period: str = "D", output_dir: str = ".", output_prefix: str = "dfp-viz-"):
super().__init__(c)

self._user_column_name = c.ae.userid_column_name
self._timestamp_column = c.ae.timestamp_column_name
self._feature_columns = c.ae.feature_columns
self._period = period
self._output_dir = output_dir
self._output_prefix = output_prefix
self._output_filenames = []

@property
def name(self) -> str:
return "dfp-viz-postproc"

def accepted_types(self) -> typing.Tuple:
"""
Accepted input types for this stage are returned.

Returns
-------
typing.Tuple[`morpheus.pipeline.messages.MultiAEMessage`, ]
Accepted input types.

"""
return (MultiAEMessage, )

def supports_cpp_node(self):
return False

def _postprocess(self, x: MultiAEMessage):

viz_pdf = pd.DataFrame()
viz_pdf[["user", "time"]] = x.get_meta([self._user_column_name, self._timestamp_column])
datetimes = pd.to_datetime(viz_pdf["time"], errors='coerce')
viz_pdf["period"] = datetimes.dt.to_period(self._period)

for f in self._feature_columns:
viz_pdf[f + "_score"] = x.get_meta(f + "_z_loss")

viz_pdf["anomalyScore"] = x.get_meta("mean_abs_z")

return MessageMeta(df=viz_pdf)

def _build_single(self, builder: srf.Builder, input_stream: StreamPair) -> StreamPair:

stream = input_stream[0]

def write_to_files(x: MultiAEMessage):

message_meta = self._postprocess(x)

unique_periods = message_meta.df["period"].unique()

for period in unique_periods:
period_df = message_meta.df[message_meta.df["period"] == period]
period_df = period_df.drop(["period"], axis=1)
output_file = os.path.join(self._output_dir, self._output_prefix + str(period) + ".csv")

is_first = False
if output_file not in self._output_filenames:
self._output_filenames.append(output_file)
is_first = True

lines = serializers.df_to_csv(period_df, include_header=is_first, include_index_col=False)
os.makedirs(os.path.realpath(os.path.dirname(output_file)), exist_ok=True)
with open(output_file, "a") as out_file:
out_file.writelines(lines)

return x

dfp_viz_postproc = builder.make_node(self.unique_name, write_to_files)

builder.make_edge(stream, dfp_viz_postproc)
stream = dfp_viz_postproc

return stream, input_stream[1]
166 changes: 166 additions & 0 deletions examples/digital_fingerprinting/visualization/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
<!--
# Copyright (c) 2021-2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-->

# Digital Fingerprinting (DFP) Visualization Example

We show here how to set up and run the Production DFP pipeline on Azure and Duo log data to generate input files for the DFP visualization UI. You can find more information about the Production DFP pipeline in this [README](../production/README.md) and the [DFP Developer Guide](../../../docs/source/developer_guide/guides/5_digital_fingerprinting.md).

## Prerequisites

To run the demo you will need the following:
- Docker
- `docker-compose` (Tested with version 1.29)

## Pull `morpheus-visualizations` submodule

```bash
git submodule update --init --recursive
```

## Build the Morpheus container
This is necessary to get the latest changes needed for DFP. From the root of the Morpheus repo:
```bash
./docker/build_container_release.sh
```

## Building Services via `docker-compose`

```bash
cd examples/digital_fingerprinting/production
export MORPHEUS_CONTAINER_VERSION="$(git describe --tags --abbrev=0)-runtime"
docker-compose build
```

## Start Morpheus Pipeline Container

From the `examples/digital_fingerprinting/production` directory run:
```bash
docker-compose run -p 3000:3000 morpheus_pipeline bash
```

The `-p 3000:3000` maps the visualization app to port 3000 on the host for access via web browser. Starting the `morpheus_pipeline` service will also start the `mlflow` service in the background. For debugging purposes it can be helpful to view the logs of the running MLflow service.

By default, a mlflow dashboard will be available at:
```bash
http://localhost:5000
```


## Download DFP Example Data from S3

Run the following in your `morpheus_pipeline` container to download example data from S3:

```
/workspace/examples/digital_fingerprinting/fetch_example_data.py all
```

Azure training data will be saved to `/workspace/examples/data/dfp/azure-training-data`, inference data to `/workspace/examples/data/dfp/azure-inference-data`.
Duo training data will be saved to `/workspace/examples/data/dfp/duo-training-data`, inference data to `/workspace/examples/data/dfp/duo-inference-data`.

## Running pipeline to generate input for DFP Visualization

The pipeline uses `DFPVizPostprocStage` to perform post-processing on DFP inference output. The inference output is converted to input format expected by the DFP Visualization and saves to multiple files based on specified time period. Time period to group data by must be [one of pandas' offset strings](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases). The default period is one day (D). The output files will be named by appending period to prefix (e.g. `dfp-viz-2022-08-30.csv`). These are the available options used for `DFPVizPostprocStage`:

```
--period Time period to batch input data and save output files by. [default: `D`]
--output_dir Directory to which the output files will be written. [default: current directory]
--output_prefix Prefix for output files.
```

Set `PYTHONPATH` environment variable to allow import of production DFP Morpheus stages:
```
export PYTHONPATH=/workspace/examples/digital_fingerprinting/production/morpheus
```

### Azure

```
cd /workspace/examples/digital_fingerprinting/visualization
```

Train DFP user models using Azure log files in `/workspace/examples/data/dfp/azure-training-data` and save them to MLflow.
```
python dfp_viz_azure_pipeline.py \
--train_users=all \
--log_level=debug \
--start_time=2022-08-01 \
--input_file=/workspace/examples/data/dfp/azure-training-data/AZUREAD_2022-08-*.json
```
**Note:** Since models are persisted to a Docker volume, the above command only needs to be run once even if the `mlflow` service is restarted.

Run inference with DFP viz postprocessing using Azure log files in `/workspace/examples/data/dfp/azure-inference-data` to generate input files for Azure DFP visualization:
```
python dfp_viz_azure_pipeline.py \
--train_users=none \
--log_level=debug \
--start_time=2022-08-30 \
--input_file=/workspace/examples/data/dfp/azure-inference-data/AZUREAD_2022-08-*.json \
--output_dir=./azure-dfp-output
```

When pipeline run completes, you should now see `dfp-viz-azure-2022-08-30.csv` and `dfp-viz-azure-2022-08-31.csv` in the `azure-dfp-output` directory. These files can be used as input to the DFP Viz UI.

### Duo

Train:
```
python dfp_viz_duo_pipeline.py \
--train_users=all \
--log_level=debug \
--start_time=2022-08-01 \
--input_file=/workspace/examples/data/dfp/duo-training-data/DUO_2022-08-*.json
```
Inference:
```
python dfp_viz_duo_pipeline.py \
--train_users=none \
--log_level=debug \
--start_time=2022-08-30 \
--input_file=/workspace/examples/data/dfp/duo-inference-data/DUO_2022-08-*.json \
--output_dir=./duo-dfp-output
```

## Install DFP Visualization Tool

While still in the `morpheus_pipeline` container, perform the following steps to install and run the DFP Visualization Tool:

### Install dependencies
```
cd viz
```
```
corepack enable
```
```
yarn
```

### Configure `dataset_path`
Set the `dataset_path` environment variable to directory where viz input files will be read from. For this example, we'll set it to directory that contains our Azure DFP output files:
```
export dataset_path=/workspace/examples/digital_fingerprinting/visualization/azure-dfp-output
```

### Start server
```
yarn dev
```

The DFP Visualization Tool can now be accessed via web browser at http://localhost:3000.

<img src="./img/screenshot.png">

More information about the DFP Visualization Tool can be found [here](https://github.com/nv-morpheus/morpheus-visualizations/tree/HEAD/DFP).
Loading