-
Notifications
You must be signed in to change notification settings - Fork 35
Mission Statement
What does DRAT stand for?
Distributed Release Audit Tool - based on the shoulders of Apache Creadur's Release Audit Tool (RAT) this project tries to scale out license checks on a large scale.
What does DRAT want?
The Distributed Release Audit Tool (DRAT) improves over the Apache RAT code audit tool in several ways. RAT is a command line tool and Java API and Maven plugin that audits a code base and its declared OSS licenses - if you say it's Apache2, RAT will check whether or not your source is Apache2 and produce a report that states what files are/aren't and why. RAT has several problems, namely:
- It doesn't scale to large code bases - running it on a 25k file and 10M LOC code base ran for ~4 weeks on a normal Linux server with 5GB memory and tons of hard disk and modern CPUs.
- RAT's crawler is rudimentary and you have to use explicit white/black lists on what files to avoid or else it will be checking binary files for licenses.
- RAT doesn't produce incremental output. It either completes and generates a log, or it doesn't.
DRAT improves upon RAT in several ways namely by addressing all of the above concerns. DRAT is a Map Reduce version of RAT using Apache Tika to automatically sort and classify the code base files; Apache OODT to index metadata and Tika information about those code files into Apache Solr; and OODT to produce a Map Reduce workflow that runs RAT incrementally on k-sized chunks of same-MIME-typed files (detected by Tika) and then producing incremental, per type logs, and then aggregating and reducing them into a combined log at the end.
What's the status of the project?
As of September 2017 the project was granted top-level status after being developed for a while on Github.