Skip to content

asianamericancv19archiveproject/cv19archiveprojectdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Asian American CV19 Archive Project: GitHub Data Repository

Note that the archive is no longer being updated with new data and is provided as is for research and historical purposes as was the original intent. It is the hope that this project in some small way can spur other community projects to share their data.

The github repository for the Asian American archive project (http://aacv19projectv2.azurewebsites.net/) is divided into the following folders:

  • datafiles: This directory contains the main index JSON file of all content data that is currently being archived.
  • images: This directory holds all images that have been able to be archived.
  • htmlfiles: This directory contains all source data that was able to be archived as HTML files.
  • pdfs: This directory holds all source data that was able to be archived as PDF files.

Format of JSON entries

The format of entries all follow the same format as the example below:

{
    "Title": "Asian man waiting for NYC subway spit on, threatened in coronavirus hate crime",
    "Url": "https://www.nydailynews.com/coronavirus/ny-coronavirus-hate-crime-brooklyn-subway-spit-20200325-h4w4nzb74fbadpx6li4f7xdoc4-story.html",
    "ImgUrl": "89e6033e-c1e5-4dbc-a03d-39a0e42666da.jpg",
    "SiteName": "Daily News",
    "ArticleDate": "2020-03-25T05:00:00Z",
    "HTMLArchiveFileName": "eee15083-68ea-4d46-9f1c-2641a94e1f15.html",
    "PDFArchiveFileName": "eee15083-68ea-4d46-9f1c-2641a94e1f15.html.pdf"
  }

ImgUrl, HTMLArchiveFileName, and PDFArchiveFileName, do not have names which correspond to the content data titles or urls, and are generated with random GUID names, however the HTML and PDF archive file names relate to each other in that they each contain the same GUID in their name, however the PDF is simply named with a ".pdf" extenstion.

External Files

Any assets like images or their HTML and PDF archives, are all stored in their respective directories, however, they not all data content contains images, or was able to be archived as a HTML or PDF document.

About

The github repository for the Asian American CV19 Archive Project

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages