Name	Name	Last commit message	Last commit date
parent directory ..
OCR	OCR
logo_detection	logo_detection
percival	percival
phash	phash
resnet	resnet
scripts	scripts
sift	sift
README.md	README.md
compute_norm_diff.py	compute_norm_diff.py
exact_match.py	exact_match.py
utils.py	utils.py

Element-Based and Frame-Based Perceptual Ad-Blocking

We evaluate the following techniques. Element-based techniques are simple computer-vision algorithms that aim at detecting stand-alone images the AdChoices logo. Frame-based techniques operate on rendered portions of web content (iframes in this case).

Element-based algorithms:

OCR
Average Hashing
SIFT

Frame-based algorithms:

YOLO-v3 model that detects the AdChoices logo
ResNet model that recognizes images of ads
MobileNet model used in Percival

Data

Data used to train models or evaluate attacks can be found here: https://github.com/ftramer/ad-versarial/releases. The contents of the data folder are expected to be in ../data.

Specifically:

../data/ad_logos contains AdChoices logos collected in Ad Highlighter
../data/web contains images and rendered iframes collected from 10 news websites

Evaluation and Attacks

Under scripts are bash scripts that generate adversarial examples for all AdChoices logos in ../data/web/ad_logos for all element-based classifiers.

Each subdirectory also contains specific information on how to run or attack specific classifiers.

Element-Based Techniques

We evaluate and attack various element-based techniques to detect the AdChoices logo.

Exact Matching

To test exact matching of AdChoices logos, run

python exact_match.py ../data/ad_logos/ "../data/web/*/adchoice" temp

Average Hashing

Average hashing is a simple template matching technique, that computes binary image hashes by comparing each input pixel to the mean pixel value.

To evaluate the model on AdChoices logos found on CNN.com:

python -m phash.model ../data/ad_logos/ "../data/web/www.cnn.com/adchoice/"

To create an adversarial example that evades detection (i.e., it isn't matched by any of the templates in ../data/ad_logos/):

python -m phash.attack ../data/ad_logos/aol.png temp ../data/ad_logos/

To create a false positive that macthes with ../data/ad_logos/aol.png:

python -m phash.attack_false_positive ../data/ad_logos/aol.png temp

SIFT

SIFT is a powerful template-matching algorithm that compares images based on certain keypoints extracted by the algorithm.

SIFT is no longer included in the latest versions of OpenCV. We tested our code with OpenCV 3.4.1

We use SIFT to detect the full AdChoices logo (i.e., with text).

To evaluate the model on AdChoices logos found on CNN.com:

python -m sift.model ../data/ad_logos/ "../data/web/www.cnn.com/adchoice/"

To create an adversarial example that evades detection (i.e., it matches less than 5% of the keypoints for all the templates in ../data/ad_logos/):

python -m sift.black_box_attack ../data/web/ad_logos/text/9.png temp ../data/ad_logos/

To create a false positive that macthes 50% of keypoints with ../data/ad_logos/aol.png:

python -m sift.black_box_false_positive ../data/ad_logos/aol.png temp

OCR

We use Tesseract's OCR model to transcribe textual AdChoices logos.

The model was ported to TensorFlow by Song and Shmatikov, who previously created adversarial examples for it: https://arxiv.org/abs/1802.05385.

Most of the code in this directory has been written by Congzheng Song. We modified l2_attack.py to simplify the attack somewhat, and also added the boilerplate in eval.py, model.py and evade_or_fp_attack.py to evaluate the model and attack.

To use the model, you first need to build a custom RNN operator for TensorFlow (https://www.tensorflow.org/guide/extend/op#build_the_op_library).

cd cc/
TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared rnn_ops.cc -o rnn_ops.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2

To run the model on a single image:

python -m OCR.eval --image=../data/ad_logos/aol.png --target_height=30

To evaluate the model on all AdChoices logos found on CNN.com:

python -m OCR.model --glob_path="../data/web/www.cnn.com/adchoice/" --target_height=30

To create an adversarial example that evades detection:

python -m OCR.evade_or_fp_attack --image=../data/ad_logos/aol.png \
    --target_height=30 --const=0.1 --iter=200 --lr=1.0 output_dir=temp

To create a false positive:

python -m OCR.evade_or_fp_attack --image=../data/ad_logos/aol.png \
    --target_height=30 --fp=True --start_blank=True --const=100.0 \
    --iter=1000 --lr=1.0 --output_dir=temp --threshold=1 \
    --chars_target_file=OCR/targets/fp_30.txt

Frame-Based Techniques

YOLO-v3

The code in logo_detection trains a YOLO-v3 model to locate the AdChoices logo in rendered iframes.

Some code was adapted from yolov3-tensorflow. We used DarkNet for training.

Training

Training data is generated with:

python3 -m logo_detection.data_gen

This uses an ad dataset collected by Hussein et al.

You can visualize boxes with Yolo_mark:

./PATH_TO_YOLO_MARK/yolo_mark \
    logo_detection/data/adchoices \
    logo_detection/data/train.txt \
    logo_detection/data/adchoices.names

To get a better model, re-compute YOLO's anchors with:

./PATH_TO_DARKNET/darknet detector calc_anchors \
    logo_detection/data/adchoices.data \
    -num_of_clusters 9 \
    -width 416 -height 416

And set the new anchors in logo_detection/yolo_v3.py.

We make the following changes to the default YOLO3 configuration in logo_detection/cfg/yolo-adchoices.cfg:

classes=1 in lines 610, 696 and 783
filters=18 in lines 603, 689 and 776
re-computed anchors (lines 609, 695, 782)
changes for small classes (layers = -1, 11, stride = 4 in lines 717, 720)
add flip=0 (line 17)

We train the model with:

./PATH_TO_DARKNET/darknet detector train \
    logo_detection/data/adchoices.data \
    logo_detection/cfg/yolo-adchoices.cfg \
    ../models/darknet53.conv.74

Finally, we run logo_detection/convert_tf.py to convert the darknet weights to Keras.

Evaluation and Attacks

A pre-trained model is available here: https://github.com/ftramer/ad-versarial/releases. The below code assumes the model checkpoint is stored under ../models/frame_based_yolov3/.

We run logo_detection/eval_accuracy.py to evaluate accuracy and generate adversarial examples.

Resnet

We evaluate a ResNet model trained by Hussein et al. to recognize images of ads in Automatic Understanding of Image and Video Advertisements.

The model should be obtained from the authors and stored under ../external/keras_resnet.h5.

We evaluate the model and create adversarial examples with:

python -m resnet.test

Percival

Percival is a recently proposed patch for the Chromium and Brave browsers that adds an ad-detection neural network directly into the browser's rendering pipeline. We evaluate the neural network used by Percival in a Jupyter notebook. To build Percival, follow the official instructions.

We built a proof-of-concept web page to demonstrate that this attack works when deployed in Percival's instrumented Chromium browser:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

element-frame-based

element-frame-based

README.md

Element-Based and Frame-Based Perceptual Ad-Blocking

Data

Evaluation and Attacks

Element-Based Techniques

Exact Matching

Average Hashing

SIFT

OCR

Frame-Based Techniques

YOLO-v3

Training

Evaluation and Attacks

Resnet

Percival

Files

element-frame-based

Directory actions

More options

Directory actions

More options

Latest commit

History

element-frame-based

Folders and files

parent directory

README.md

Element-Based and Frame-Based Perceptual Ad-Blocking

Data

Evaluation and Attacks

Element-Based Techniques

Exact Matching

Average Hashing

SIFT

OCR

Frame-Based Techniques

YOLO-v3

Training

Evaluation and Attacks

Resnet

Percival