This repo provides lists of four-digit SIC codes scraped from the websites of two government agencies: the SEC and OSHA. The cleaned lists can be downloaded here and here, respectively, and refresh instructions can be found below.
The Standard Industrial Classification (SIC) is a system used to classify businesses by their primary business activity, or industry. The SIC system was created in the 1930's and has since been replaced as the industry classification system for Federal statistical agencies; however, it is still widely used by many businesses and by some government agencies.
SIC codes were once maintained and assigned by the US government. I've found that only two government agencies currently publish a list of SIC codes and descriptions:
Source | Version | Use case |
---|---|---|
Occupational Safety & Health Administration (OSHA) | 1987 SIC manual | Unknown |
U.S. Securities and Exchange Commission (SEC) | No version provided, but the SEC website indicates the webpage was last modified January 25, 2015 | Used in EDGAR electronic filings |
The SIC codes provided by the SEC generally align with those provided by OSHA; however, OSHA's SIC manual is more comprehensive -- it contains many more SIC codes than does the SEC's list.
There are a number of online sources that provide SIC codes and descriptions, though I've found none that provide all of the following:
- The source of their data
- Their code, if relevant
- Machine readable data
Taken together, these are important for assessing data quality and reliability. The purpose of this repository is to provide SIC codes in adherence with these standards.
The latest data can be found in the root directory. To refresh:
- Install Python 2.7
- Install python requirements:
$ pip install -r requirements.txt
- From the command line run
$ python src/main.py