Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split data dir, moving large files into examples/data #130

Merged
43 commits merged into from
Jun 7, 2022
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
61680dc
Add/update unittests to check for issue #60
dagardner-nv Apr 27, 2022
f840074
Ensure default path values are no longer relative to the current dir,…
dagardner-nv Apr 27, 2022
edc75bd
Move simple file reads to a helper function
dagardner-nv May 3, 2022
a7263fc
Merge branch 'branch-22.06' into david-cli-rel-paths
dagardner-nv May 16, 2022
06fb137
WIP
dagardner-nv May 16, 2022
c2c467b
Move data
dagardner-nv May 16, 2022
ce01a4a
Add missing dep for pybind11-stubgen
dagardner-nv May 17, 2022
0b6d959
Don't add deps for pybind11 stub files when we aren't doing an inplac…
dagardner-nv May 17, 2022
827ee41
Add MANIFEST.in to list of installed files
dagardner-nv May 17, 2022
4ef5624
Copy data dir, and files previously set by package_data
dagardner-nv May 17, 2022
c2c5975
Remove package_data, unfortunately the setuptools docs are vague and …
dagardner-nv May 17, 2022
4186357
Remove unused MORPHEUS_ROOT attr
dagardner-nv May 17, 2022
65473c6
Update path in examples for new data location
dagardner-nv May 17, 2022
be44798
Merge branch 'branch-22.06' into david-cli-rel-paths
dagardner-nv May 17, 2022
7ae1e30
Fix import path
dagardner-nv May 17, 2022
329a6a6
Update paths in examples
dagardner-nv May 17, 2022
405b539
Update data path in docs
dagardner-nv May 17, 2022
1c7f421
fix path
dagardner-nv May 17, 2022
c0d5281
Update lfs to reflect data dir move
dagardner-nv May 17, 2022
ce37b33
Remove unneded fea_length
dagardner-nv May 17, 2022
61ebfcf
Style fixes
dagardner-nv May 18, 2022
5a84ff2
Update docs/source/basics/examples.rst
dagardner-nv May 18, 2022
dfdeacc
Merge branch 'branch-22.06' into david-cli-rel-paths
dagardner-nv May 23, 2022
f59dcac
Fixing non-inplace builds install of stub files
mdemoret-nv May 23, 2022
7801803
Move data into previous install command
dagardner-nv May 23, 2022
f398f78
Merge branch 'david-cli-rel-paths' of github.com:dagardner-nv/Morpheu…
dagardner-nv May 23, 2022
798953a
Remove lfs filter for old data location
dagardner-nv May 23, 2022
a94dd62
Merge branch 'branch-22.06' into david-cli-rel-paths
dagardner-nv May 24, 2022
7cfafcf
examples/data/with_data_len.json,examples/data/without_data_len.json:…
dagardner-nv May 27, 2022
950b0d4
Move larger files from morpheus/data into examples/data
dagardner-nv May 27, 2022
4627709
Add new glob path to lfs
dagardner-nv May 27, 2022
0219ae0
Update path in launcher
dagardner-nv May 27, 2022
e32a3c6
Update paths for example data in examples & docs
dagardner-nv May 27, 2022
dac94a9
Add email_with_addresses.jsonlines used in the phishing developer gui…
dagardner-nv May 27, 2022
6448e7f
Merge branch 'branch-22.06' into david-split-data-dir
dagardner-nv May 31, 2022
fc3f06f
Merge branch 'branch-22.06' into david-split-data-dir
dagardner-nv Jun 2, 2022
203c6d6
Remove unused data files
dagardner-nv Jun 3, 2022
e036f7b
Merge branch 'branch-22.06' into david-split-data-dir
dagardner-nv Jun 3, 2022
435c74a
Pin to older neo
dagardner-nv Jun 3, 2022
f13da0b
Merge branch 'david-split-data-dir' of github.com:dagardner-nv/Morphe…
dagardner-nv Jun 3, 2022
5da2d94
Revert "Pin to older neo"
dagardner-nv Jun 6, 2022
d2d35e8
Manually ensure that the build is clean
dagardner-nv Jun 6, 2022
c989b15
Re-source the conda env
dagardner-nv Jun 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
examples/data/* filter=lfs diff=lfs merge=lfs -text
morpheus/_version.py export-subst
morpheus/data/* filter=lfs diff=lfs merge=lfs -text
tests/expected_data/* filter=lfs diff=lfs merge=lfs -text
Expand Down
10 changes: 5 additions & 5 deletions docs/source/basics/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,15 @@ This example will copy the values from Kafka into ``out.jsonlines``.
Remove Fields from JSON Objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This example will only copy the fields 'timestamp', 'src_ip' and 'dest_ip' from ``morpheus/data/pcap_dump.jsonlines`` to
This example will only copy the fields 'timestamp', 'src_ip' and 'dest_ip' from ``examples/data/pcap_dump.jsonlines`` to
``out.jsonlines``.

.. image:: img/remove_fields_from_json_objects.png

.. code-block:: bash
morpheus run pipeline-nlp --viz_file=basic_usage_img/remove_fields_from_json_objects.png \
from-file --filename morpheus/data/pcap_dump.jsonlines \
from-file --filename examples/data/pcap_dump.jsonlines \
deserialize \
serialize --include 'timestamp' --include 'src_ip' --include 'dest_ip' \
to-file --filename out.jsonlines
Expand All @@ -58,7 +58,7 @@ This example will report the throughput on the command line.
.. code-block:: console
$ morpheus run pipeline-nlp --viz_file=basic_usage_img/monitor_throughput.png \
from-file --filename morpheus/data/pcap_dump.jsonlines \
from-file --filename examples/data/pcap_dump.jsonlines \
deserialize \
monitor --description "Lines Throughput" --smoothing 0.1 --unit "lines" \
serialize \
Expand All @@ -79,7 +79,7 @@ decouple one stage from the next. Without the buffers, all montioring would show
.. code-block:: console
$ morpheus run pipeline-nlp --viz_file=basic_usage_img/multi_monitor_throughput.png \
from-file --filename morpheus/data/pcap_dump.jsonlines \
from-file --filename examples/data/pcap_dump.jsonlines \
monitor --description "From File Throughput" \
buffer \
deserialize \
Expand Down Expand Up @@ -107,7 +107,7 @@ This example shows an NLP Pipeline which uses most stages available in Morpheus.
$ morpheus run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \
pipeline-nlp --viz_file=basic_usage_img/nlp_kitchen_sink.png \
from-file --filename morpheus/data/pcap_dump.jsonlines \
from-file --filename examples/data/pcap_dump.jsonlines \
buffer --count=500 \
deserialize \
preprocess \
Expand Down
16 changes: 8 additions & 8 deletions docs/source/developer_guide/guides/1_simple_python_stage.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ Defining this stage requires us to specify the stage type. Morpheus stages conta
import typing

import neo
from morpheus.pipeline.pipeline import SinglePortStage
from morpheus.pipeline.pipeline import StreamPair
from morpheus.pipeline.single_port_stage import SinglePortStage
from morpheus.pipeline.stream_pair import StreamPair

class PassThruStage(SinglePortStage):
```
Expand Down Expand Up @@ -87,8 +87,8 @@ import typing

import neo

from morpheus.pipeline.pipeline import SinglePortStage
from morpheus.pipeline.pipeline import StreamPair
from morpheus.pipeline.single_port_stage import SinglePortStage
from morpheus.pipeline.stream_pair import StreamPair

class PassThruStage(SinglePortStage):
@property
Expand Down Expand Up @@ -122,8 +122,8 @@ import os

from morpheus.config import Config
from morpheus.pipeline import LinearPipeline
from morpheus.pipeline.general_stages import MonitorStage
from morpheus.pipeline.input.from_file import FileSourceStage
from morpheus.stages.general.monitor_stage import MonitorStage
from morpheus.stages.input.file_source_stage import FileSourceStage
from morpheus.utils.logging import configure_logging

from pass_thru import PassThruStage
Expand Down Expand Up @@ -183,8 +183,8 @@ import os

from morpheus.config import Config
from morpheus.pipeline import LinearPipeline
from morpheus.pipeline.general_stages import MonitorStage
from morpheus.pipeline.input.from_file import FileSourceStage
from morpheus.stages.general.monitor_stage import MonitorStage
from morpheus.stages.input.file_source_stage import FileSourceStage
from morpheus.utils.logging import configure_logging

from pass_thru import PassThruStage
Expand Down
38 changes: 19 additions & 19 deletions docs/source/developer_guide/guides/2_real_world_phishing.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,9 @@ import typing

import neo

from morpheus.pipeline.messages import MessageMeta
from morpheus.pipeline.pipeline import SinglePortStage
from morpheus.pipeline.pipeline import StreamPair
from morpheus.messages.message_meta import MessageMeta
from morpheus.pipeline.single_port_stage import SinglePortStage
from morpheus.pipeline.stream_pair import StreamPair


class RecipientFeaturesStage(SinglePortStage):
Expand Down Expand Up @@ -193,7 +193,7 @@ out_dir = os.environ.get('OUT_DIR', '/tmp')
labels_file = os.path.join(morpheus.DATA_DIR, 'labels_phishing.txt')
vocab_file = os.path.join(morpheus.DATA_DIR, 'bert-base-uncased-hash.txt')

input_file = os.path.join(root_dir, 'examples/data/email.jsonlines')
input_file = os.path.join(root_dir, 'examples/data/email_with_addresses.jsonlines')
results_file = os.path.join(out_dir, 'detections.jsonlines')
```

Expand Down Expand Up @@ -290,14 +290,14 @@ import morpheus
from morpheus.config import Config
from morpheus.config import PipelineModes
from morpheus.pipeline import LinearPipeline
from morpheus.pipeline.general_stages import FilterDetectionsStage
from morpheus.pipeline.general_stages import MonitorStage
from morpheus.pipeline.inference.inference_triton import TritonInferenceStage
from morpheus.pipeline.input.from_file import FileSourceStage
from morpheus.pipeline.output.serialize import SerializeStage
from morpheus.pipeline.output.to_file import WriteToFileStage
from morpheus.pipeline.preprocessing import DeserializeStage
from morpheus.pipeline.preprocessing import PreprocessNLPStage
from morpheus.stages.general.monitor_stage import MonitorStage
from morpheus.stages.inference.triton_inference_stage import TritonInferenceStage
from morpheus.stages.input.file_source_stage import FileSourceStage
from morpheus.stages.output.write_to_file_stage import WriteToFileStage
from morpheus.stages.postprocess.filter_detections_stage import FilterDetectionsStage
from morpheus.stages.postprocess.serialize_stage import SerializeStage
from morpheus.stages.preprocess.deserialize_stage import DeserializeStage
from morpheus.stages.preprocess.preprocess_nlp_stage import PreprocessNLPStage
from morpheus.utils.logging import configure_logging

from recipient_feature_stage import RecipientFeaturesStage
Expand All @@ -313,7 +313,7 @@ def run_pipeline():
labels_file = os.path.join(morpheus.DATA_DIR, 'labels_phishing.txt')
vocab_file = os.path.join(morpheus.DATA_DIR, 'bert-base-uncased-hash.txt')

input_file = os.path.join(root_dir, 'examples/data/email.jsonlines')
input_file = os.path.join(root_dir, 'examples/data/email_with_addresses.jsonlines')
results_file = os.path.join(out_dir, 'detections.jsonlines')

# It's necessary to configure the pipeline for NLP mode
Expand Down Expand Up @@ -453,9 +453,9 @@ import pika
import cudf

from morpheus.config import Config
from morpheus.pipeline.messages import MessageMeta
from morpheus.pipeline.pipeline import SingleOutputSource
from morpheus.pipeline.pipeline import StreamPair
from morpheus.messages.message_meta import MessageMeta
from morpheus.pipeline.single_output_source import SingleOutputSource
from morpheus.pipeline.stream_pair import StreamPair

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -597,9 +597,9 @@ import pika
import cudf

from morpheus.config import Config
from morpheus.pipeline.messages import MessageMeta
from morpheus.pipeline.pipeline import SinglePortStage
from morpheus.pipeline.pipeline import StreamPair
from morpheus.messages.message_meta import MessageMeta
from morpheus.pipeline.single_port_stage import SinglePortStage
from morpheus.pipeline.stream_pair import StreamPair

logger = logging.getLogger(__name__)

Expand Down
10 changes: 5 additions & 5 deletions docs/source/morpheus_quickstart_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -559,7 +559,7 @@ $ helm install --set ngc.apiKey="$API_KEY" \
pipeline-nlp \
--model_seq_length=128 \
--labels_file=./morpheus/data/labels_phishing.txt \
from-file --filename=./morpheus/data/email.jsonlines \
from-file --filename=./examples/data/email.jsonlines \
monitor --description 'FromFile Rate' --smoothing=0.001 \
deserialize \
preprocess --vocab_hash_file=./morpheus/data/bert-base-uncased-hash.txt --truncation=True --do_lower_case=True --add_special_tokens=False \
Expand Down Expand Up @@ -635,7 +635,7 @@ $ helm install --set ngc.apiKey="$API_KEY" \
--model_max_batch_size=32 \
pipeline-nlp \
--model_seq_length=256 \
from-file --filename=./morpheus/data/pcap_dump.jsonlines \
from-file --filename=./examples/data/pcap_dump.jsonlines \
monitor --description 'FromFile Rate' --smoothing=0.001 \
deserialize \
preprocess --vocab_hash_file=./morpheus/data/bert-base-uncased-hash.txt --truncation=True --do_lower_case=True --add_special_tokens=False \
Expand Down Expand Up @@ -685,7 +685,7 @@ Make sure you create input and output Kafka topics before you start the pipeline
$ kubectl -n $NAMESPACE exec -it deploy/broker -c broker -- kafka-console-producer.sh \
--broker-list broker:9092 \
--topic <YOUR_INPUT_KAFKA_TOPIC> < \
<YOUR_INPUT_DATA_FILE_PATH_EXAMPLE: ${HOME}/morpheus/data/pcap_dump.jsonlines>
<YOUR_INPUT_DATA_FILE_PATH_EXAMPLE: ${HOME}/examples/data/pcap_dump.jsonlines>
```

**Note**: This should be used for development purposes only via this developer kit. Loading from the file into Kafka should not be used in production deployments of Morpheus.
Expand All @@ -708,7 +708,7 @@ $ helm install --set ngc.apiKey="$API_KEY" \
--model_max_batch_size=64 \
--use_cpp=True \
pipeline-fil \
from-file --filename=./morpheus/data/nvsmi.jsonlines \
from-file --filename=./examples/data/nvsmi.jsonlines \
monitor --description 'FromFile Rate' --smoothing=0.001 \
deserialize \
preprocess \
Expand Down Expand Up @@ -754,7 +754,7 @@ Make sure you create input and output Kafka topics before you start the pipeline
$ kubectl -n $NAMESPACE exec -it deploy/broker -c broker -- kafka-console-producer.sh \
--broker-list broker:9092 \
--topic <YOUR_INPUT_KAFKA_TOPIC> < \
<YOUR_INPUT_DATA_FILE_PATH_EXAMPLE: ${HOME}/morpheus/data/nvsmi.jsonlines>
<YOUR_INPUT_DATA_FILE_PATH_EXAMPLE: ${HOME}/examples/data/nvsmi.jsonlines>
```

**Note**: This should be used for development purposes only via this developer kit. Loading from the file into Kafka should not be used in production deployments of Morpheus.
Expand Down
6 changes: 3 additions & 3 deletions examples/abp_nvsmi_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ $ nvidia-smi dmon

Each line in the output represents the GPU metrics at a single point in time. As the tool progresses the GPU begins to be utilized and you can see the SM% and Mem% increase as memory is loaded into the GPU and computations are performed. The model we will be using can ingest this information and determine whether or not the GPU is mining cryptocurriences without needing additional information from the host machine.

In this example we will be using the `morpheus/data/nvsmi.jsonlines` dataset that is known to contain mining behavior profiles. The dataset is in the `.jsonlines` format which means each new line represents an new JSON object. In order to parse this data, it must be ingested, split by lines into individual JSON objects, and parsed into cuDF dataframes. This will all be handled by Morpheus.
In this example we will be using the `examples/data/nvsmi.jsonlines` dataset that is known to contain mining behavior profiles. The dataset is in the `.jsonlines` format which means each new line represents an new JSON object. In order to parse this data, it must be ingested, split by lines into individual JSON objects, and parsed into cuDF dataframes. This will all be handled by Morpheus.

## Pipeline Architecture

Expand Down Expand Up @@ -102,7 +102,7 @@ morpheus --log_level=DEBUG \
`# Specify a NLP pipeline with 256 sequence length (Must match Triton config)` \
pipeline-fil \
`# 1st Stage: Read from file` \
from-file --filename=$MORPHEUS_ROOT/morpheus/data/nvsmi.jsonlines \
from-file --filename=$MORPHEUS_ROOT/examples/data/nvsmi.jsonlines \
`# 2nd Stage: Deserialize from JSON strings to objects` \
deserialize \
`# 3rd Stage: Preprocessing converts the input data into BERT tokens` \
Expand Down Expand Up @@ -178,7 +178,7 @@ CPP Enabled: True
====Registering Pipeline Complete!====
====Starting Pipeline====
====Building Pipeline====
Added source: <from-file-0; FileSourceStage(filename=/home/dagardner/work/morpheus/data/nvsmi.jsonlines, iterative=False, file_type=FileTypes.Auto, repeat=1, filter_null=True, cudf_kwargs=None)>
Added source: <from-file-0; FileSourceStage(filename=/home/dagardner/work/examples/data/nvsmi.jsonlines, iterative=False, file_type=FileTypes.Auto, repeat=1, filter_null=True, cudf_kwargs=None)>
└─> morpheus.MessageMeta
Added stage: <deserialize-1; DeserializeStage()>
└─ morpheus.MessageMeta -> morpheus.MultiMessage
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_pcap_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,11 @@ Options:
--help Show this message and exit.
```

To launch the configured Morpheus pipeline with the sample data that is provided at `<MORPHEUS_ROOT>/morpheus/data`, run the following:
To launch the configured Morpheus pipeline with the sample data that is provided at `<MORPHEUS_ROOT>/examples/data`, run the following:

```bash
python run.py \
--input_file ../../morpheus/data/abp_pcap_dump.jsonlines \
--input_file ../data/abp_pcap_dump.jsonlines \
--output_file ./pcap_out.jsonlines \
--model_name 'abp-pcap-xgb' \
--server_url localhost:8001
Expand Down
File renamed without changes.
3 changes: 3 additions & 0 deletions examples/data/email_with_addresses.jsonlines
Git LFS file not shown
File renamed without changes.
File renamed without changes.
Loading