This project explores high-level colour patterns present in Instagram posts with hashtag
#selfcare
. To this end, it compares pixel colour values of #selfcare-tagged images and generic images.
The code is built using Python and is distributed under GPL-3.0 License.
For this experiment, 2 datasets have been created. One containing Instagram images with hashtag #selfcare and the other containing generic Instagram images.
Read more to prepare your dataset.
A total of 3526 images have been retrieved mostly from the following days:
- 2021-01-07
- 2021-01-08
- 2021-01-10
However, other dates are also present. Details on the date occurences can be found in this file.
A total of 3526 images have been retrieved. They come from different hashtags: #tbt
, #followme
, #repost
, #photooftheday
,
#picoftheday
, #follow
, #like4like
, #nature
, #instagood
, #instadaily
, #instagram
, #happy
. Data was retrieved from different dats, specific date occurences can be found in this file.
We deemed that the images tagged with these 12 hashtags present a wide variety of imagery that may be representative of Instagram as a whole. The hashtags have been obtained from this list of the most used Instagram hashtags.
For both datasets (selfcare and generic):
- Download images: Images are downloaded from Instagram posts with specific hashtags using
instaloader
package. - Process images: Near-squared images are resized into (100, 100) pixel images.
- Build collage: Build a collage with all (100, 100) processed images. Example here.
- Extract palette: Finally, the colour palette is extracted from the previously generated collage, leveraging
colorgram.py
package.
Finally, once results for both datasets are obtained:
- Comparison: Palettes obtained from both datasets are compared.
In the following, results obtained from both datasets are presented.
Find below a graph with the most descriptive 10-colour palette of the selfcare dataset. The horizontal axis shows the RGB colour codes and the vertical axis quantifies the relative share of importance of each palette component (i.e. the higher the bar, the more presence a colour has in the dataset). We refer to the later as relative importance.
Note: The relative importance measures the proportion of all images with a given colour. Note that it is normalized such that the relative importance of the palette colours add up to 1.
The table below shows the relative importance values:
RGB colour | Relative importance |
---|---|
(240, 232, 223) | 0.299 |
(186, 159, 134) | 0.170 |
(121, 93, 72) | 0.111 |
(37, 26, 19) | 0.097 |
(216, 226, 236) | 0.072 |
(240, 224, 231) | 0.064 |
(230, 241, 236) | 0.051 |
(21, 28, 44) | 0.050 |
(135, 165, 189) | 0.047 |
(72, 97, 124) | 0.038 |
Likewise, the following graph shows the same results for the generic dataset.
The table below shows the relative importance values:
RGB colour | Relative importance |
---|---|
(181, 157, 134) | 0.169 |
(119, 92, 72) | 0.163 |
(237, 230, 220) | 0.162 |
(36, 25, 18) | 0.157 |
(21, 27, 42) | 0.084 |
(212, 223, 234) | 0.063 |
(139, 163, 184) | 0.060 |
(75, 96, 119) | 0.056 |
The core code of the project lives in folder scripts, where multiple scripts are found.
Make sure to have python installed.
$ pip install -r requirements.txt
This project was developed using Python 3.8
Use the script download_images.py
. By default, images are stored under data/original
(make sure it exists).
$ python scripts/download_images.py
Use the script process_images.py
. By default, images are stored under data/processed
(make sure it exists).
$ python scripts/process_images.py
This script resizes the images to 224x224 pixels. In order to minimize the impact of resizing (it can lead to noticeable distortions), only near-squared images have been used.
Use the script build_collage.py
.
$ python scripts/build_collage.py
By default, the generted collage is stored as results/collage.jpg.
Use the script get_palette.py
.
$ python scripts/get_palette.py
This will do the following (by default):
- Obtain a 10-length colour palette and store it as
results/palette_rgb_codes.csv
. - Generate the colour palette. Saves image as
results/palette.png
- Generate the colour palette bar plot, illustrating presence rate. Saves image as
results/palette_proportion.png
Use the script get_stats.py
.
$ python scripts/get_stats.py
By default, it saves results as results/stats_dates.csv