Skip to content

Commit

Permalink
Use weighted sampling and separation of filters
Browse files Browse the repository at this point in the history
  • Loading branch information
victorlin committed Apr 11, 2024
1 parent 025fd8f commit fc84ca1
Show file tree
Hide file tree
Showing 29 changed files with 911 additions and 736 deletions.
221 changes: 107 additions & 114 deletions generate-subsampling-config.py

Large diffs are not rendered by default.

37 changes: 21 additions & 16 deletions subsampling/africa_1m.yaml
Original file line number Diff line number Diff line change
@@ -1,35 +1,40 @@
size: 4000
samples:
focal_early:
filters:
max_date: 1M
exclude:
- region!=Africa
group_by:
- country
- year
- month
max_sequences: 640
max_date: 1M
exclude:
- region!=Africa
weight: 4
context_early:
filters:
max_date: 1M
exclude:
- region=Africa
group_by:
- country
- year
- month
max_sequences: 160
max_date: 1M
exclude:
- region=Africa
weight: 1
focal_recent:
filters:
min_date: 1M
exclude:
- region!=Africa
group_by:
- country
- week
max_sequences: 2560
min_date: 1M
exclude:
- region!=Africa
weight: 16
context_recent:
filters:
min_date: 1M
exclude:
- region=Africa
group_by:
- country
- week
max_sequences: 640
min_date: 1M
exclude:
- region=Africa
weight: 4
37 changes: 21 additions & 16 deletions subsampling/africa_2m.yaml
Original file line number Diff line number Diff line change
@@ -1,35 +1,40 @@
size: 4000
samples:
focal_early:
filters:
max_date: 2M
exclude:
- region!=Africa
group_by:
- country
- year
- month
max_sequences: 640
max_date: 2M
exclude:
- region!=Africa
weight: 4
context_early:
filters:
max_date: 2M
exclude:
- region=Africa
group_by:
- country
- year
- month
max_sequences: 160
max_date: 2M
exclude:
- region=Africa
weight: 1
focal_recent:
filters:
min_date: 2M
exclude:
- region!=Africa
group_by:
- country
- week
max_sequences: 2560
min_date: 2M
exclude:
- region!=Africa
weight: 16
context_recent:
filters:
min_date: 2M
exclude:
- region=Africa
group_by:
- country
- week
max_sequences: 640
min_date: 2M
exclude:
- region=Africa
weight: 4
37 changes: 21 additions & 16 deletions subsampling/africa_6m.yaml
Original file line number Diff line number Diff line change
@@ -1,35 +1,40 @@
size: 4000
samples:
focal_early:
filters:
max_date: 6M
exclude:
- region!=Africa
group_by:
- country
- year
- month
max_sequences: 640
max_date: 6M
exclude:
- region!=Africa
weight: 4
context_early:
filters:
max_date: 6M
exclude:
- region=Africa
group_by:
- country
- year
- month
max_sequences: 160
max_date: 6M
exclude:
- region=Africa
weight: 1
focal_recent:
filters:
min_date: 6M
exclude:
- region!=Africa
group_by:
- country
- month
max_sequences: 2560
min_date: 6M
exclude:
- region!=Africa
weight: 16
context_recent:
filters:
min_date: 6M
exclude:
- region=Africa
group_by:
- country
- month
max_sequences: 640
min_date: 6M
exclude:
- region=Africa
weight: 4
15 changes: 9 additions & 6 deletions subsampling/africa_all-time.yaml
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
size: 4000
samples:
focal:
filters:
exclude:
- region!=Africa
group_by:
- country
- year
- month
max_sequences: 3200
exclude:
- region!=Africa
weight: 4
context:
filters:
exclude:
- region=Africa
group_by:
- country
- year
- month
max_sequences: 800
exclude:
- region=Africa
weight: 1
81 changes: 45 additions & 36 deletions subsampling/asia_1m.yaml
Original file line number Diff line number Diff line change
@@ -1,77 +1,86 @@
size: 4375
samples:
asia_early:
filters:
max_date: 1M
exclude:
- region!=Asia
- country=China
- country=India
group_by:
- division
- year
- month
max_sequences: 300
max_date: 1M
exclude:
- region!=Asia
- country=China
- country=India
weight: 12
china_early:
filters:
max_date: 1M
exclude:
- country!=China
group_by:
- division
- year
- month
max_sequences: 200
max_date: 1M
exclude:
- country!=China
weight: 8
india_early:
filters:
max_date: 1M
exclude:
- country!=India
group_by:
- division
- year
- month
max_sequences: 200
max_date: 1M
exclude:
- country!=India
weight: 8
context_early:
filters:
max_date: 1M
exclude:
- region=Asia
group_by:
- country
- year
- month
max_sequences: 175
max_date: 1M
exclude:
- region=Asia
weight: 1
asia_recent:
filters:
min_date: 1M
exclude:
- region!=Asia
- country=China
- country=India
group_by:
- division
- year
- month
max_sequences: 1200
min_date: 1M
exclude:
- region!=Asia
- country=China
- country=India
weight: 48
china_recent:
filters:
min_date: 1M
exclude:
- country!=China
group_by:
- division
- year
- month
max_sequences: 800
min_date: 1M
exclude:
- country!=China
weight: 32
india_recent:
filters:
min_date: 1M
exclude:
- country!=India
group_by:
- division
- year
- month
max_sequences: 800
min_date: 1M
exclude:
- country!=India
weight: 32
context_recent:
filters:
min_date: 1M
exclude:
- region=Asia
group_by:
- country
- year
- month
max_sequences: 700
min_date: 1M
exclude:
- region=Asia
weight: 4
Loading

0 comments on commit fc84ca1

Please sign in to comment.