Results about dataset E3-thiea #14

Acomand · 2024-05-25T12:15:39Z

I'd like to know the test set of dataset thiea.

If I use the graph you provided, I can get the results.

But if I run the program as "python trace_parser.py --dataset theia", the generated file "test0.pkl" is different from the one provided by you. And the result is far from the previous one (regardless of using the checkpoint you provided or retraining the model).

So I'd like to know, is "ta1-theia-e3-official-6r.json.8" the test set? (I have also tried to replace the test set with other files, but the results are also bad). Could you please rerun the trace_parser.py to check the code?

(The results in trace and cadets do not have problem)

Thanks a lot.

zmkzmkzmkzmkzmk · 2024-05-26T07:28:31Z

Do you have any problems reproducing the results of the cadets data set? After I processed the data and used it for model training, I got the following results during verification:

I repeated the experiment five times and the results were similar.

Acomand · 2024-05-26T08:14:43Z

When I use the dataset E3-cadets, I can get the result which is similar to yours.

But I can not reproduce the result on the dataset E3-theia. I run the code as
`
cd utils && python trace_parser.py --dataset theia

cd ..

python train.py --dataset theia

python eval.py --dataset theia
`
And the result is

Jimmyokok · 2024-05-26T09:42:03Z

The correct way: Put ta1-theia-e3-official-6r.json to ta1-theia-e3-official-6r.8.json (including ALL of 0-8) into the data directory and the parser will read entities from 0-8, parse 0-3 into training graphs and 8 into the test graph.
I tried to parse E3-THEIA without 4-7 present, resulting in many entities being lost, including several thousand malicious entities, because they are simply defined in 4-7.
Additionally, the peak detection threshold is pre-defined in the eval script. If you are training and detecting from scratch, you could adjust that threshold to make the confusion matrix looks normal, or simply refer to the AUC for threshold-insensitive evaluation.

Jimmyokok · 2024-05-26T09:57:27Z

Do you have any problems reproducing the results of the cadets data set? After I processed the data and used it for model training, I got the following results during verification:

I repeated the experiment five times and the results were similar.

Similar to above, the printed out confusion matrix could be quite different when the threshold varies. Please refer to the AUC for threshold-insensitive evaluation.
Meanwhile, the n_neighbors (i.e. k) also affects detection performance and it is more sensitive on E3-CADETS than others.
We are also planning to release a new version of MAGIC, which is more stable, more efficient and easier to reproduce from scratch.

Jimmyokok · 2024-05-26T10:06:08Z

Do you have any problems reproducing the results of the cadets data set? After I processed the data and used it for model training, I got the following results during verification:

I repeated the experiment five times and the results were similar.

For reference, I repeated the evaluation with two different k values, where k = 100 yields AUC= 0.9933 and k = 400 yields AUC=0.9968.

Acomand · 2024-05-26T10:17:38Z

It works, thanks a lot

Acomand closed this as completed May 26, 2024

Jimmyokok pinned this issue May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results about dataset E3-thiea #14

Results about dataset E3-thiea #14

Acomand commented May 25, 2024

zmkzmkzmkzmkzmk commented May 26, 2024

Acomand commented May 26, 2024 •

edited

Loading

Jimmyokok commented May 26, 2024

Jimmyokok commented May 26, 2024

Jimmyokok commented May 26, 2024

Acomand commented May 26, 2024

Results about dataset E3-thiea #14

Results about dataset E3-thiea #14

Comments

Acomand commented May 25, 2024

zmkzmkzmkzmkzmk commented May 26, 2024

Acomand commented May 26, 2024 • edited Loading

Jimmyokok commented May 26, 2024

Jimmyokok commented May 26, 2024

Jimmyokok commented May 26, 2024

Acomand commented May 26, 2024

Acomand commented May 26, 2024 •

edited

Loading