BFW, proposed in [1], provides balance in data (i.e., different subgroups) for face verification. Specifically, Table I compares BFW to related datasets (i.e., bias in faces), Table II characterizes BFW highlighting relevant stats, and Figure 1 shows a sample montage for each of the eight subgroups in BFW.
Fig 1. Subgroups of BFW. Rows depict different genders, Female (top) and Male (bottom). Columns are grouped by ethnicity (i.e., Asian, Black, Indian, and White, respectfully).
Table 1. Database stats and nomenclature. Header: Subgroup definitions. Top: Statistics of Balanced Faces in the Wild (BFW). Bottom: Number of pairs for each partition. Columns grouped by ethnicity and then further split by gender.
Table 2. BFW and related datasets. BFW is balanced across ID, gender, and ethnicity (Table 1). Compared with DemogPairs, BFW provides more samples per subject and subgroups per set, while using a single resource, VGG2. RFW, on the other hand, supports domain adaptation, and focuses on race-distribution - not the distribution of identities.
This folder contains the raw data files used in the paper:
bfw-<version>-datatable.pkl
: List of pairs with corresponding tags for class labels (1/0), subgroups, and scores. Download link: form.
Paired faces and all corresponding metadata is organized as a pandas dataframe formatted as follows.
ID | fold | p1 | p2 | label | id1 | id2 | att1 | att2 | vgg16 | resnet50 | senet50 | a1 | a2 | g1 | g2 | e1 | e2 | sphereface |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | asian_females/n000009/0010_01.jpg | asian_females/n000009/0043_01.jpg | 1 | 0 | 0 | asian_females | asian_females | 0.820 | 0.703 | 0.679 | AF | AF | F | F | A | A | 0.393 |
1 | 1 | asian_females/n000009/0010_01.jpg | asian_females/n000009/0120_01.jpg | 1 | 0 | 0 | asian_females | asian_females | 0.719 | 0.524 | 0.594 | AF | AF | F | F | A | A | 0.354 |
2 | 1 | asian_females/n000009/0010_01.jpg | asian_females/n000009/0122_02.jpg | 1 | 0 | 0 | asian_females | asian_females | 0.732 | 0.528 | 0.644 | AF | AF | F | F | A | A | 0.302 |
3 | 1 | asian_females/n000009/0010_01.jpg | asian_females/n000009/0188_01.jpg | 1 | 0 | 0 | asian_females | asian_females | 0.607 | 0.348 | 0.459 | AF | AF | F | F | A | A | -0.009 |
4 | 1 | asian_females/n000009/0010_01.jpg | asian_females/n000009/0205_01.jpg | 1 | 0 | 0 | asian_females | asian_females | 0.629 | 0.384 | 0.495 | AF | AF | F | F | A | A | 0.133 |
- ID : index (i.e., row number) of dataframe ([0, N], where N is pair count).
- fold : fold number of five-fold experiment [1, 5].
- p1 and p2 : relative image path of face
- label : ground-truth ([0, 1] for non-match and match, respectively)
- id1 and id2 : subject ID for faces in pair ([0, M], where M is number of unique subjects)
- att1 and att2 : attributee of subjects in pair.
- vgg16, resnet50, senet50, and sphereface : cosine similarity score for respective model.
- a1 and a2 : abbreviated attribute tag of subjects in pair [AF, AM, BF, BM, IF, IM, WF, WM].
- g1 and g2 : abbreviated gender tag of subjects in pair [F, M].
- e1 and e2 : abbreviate ethnicity tag of subjects in pair [A, B, I, W].
Here listed are the bugs in the data.1 Each bug is listed per the date first reported, along with a brief description proceeding the item in parentheses. Future versions of data will incorporate bug fixes based on the following:
24 July 2020,
- asian_females/n002509/0139_03.jpg (incorrect identity)
19 July 2020
- white_females/n003391/0017_02.jpg (cartoon face)
- asian_males/n006741/0275_02.jpg (cartoon face)
[1] Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition: too bias, or not too bias?" In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0-1. 2020.
1. The list is for contributors to post any errors found in the data. Besides, all are welcome to report bugs by directly contacting Joseph Robinson. ↩