Skip to content

A CDX file creator written in Scala for Spark

Notifications You must be signed in to change notification settings

jhzab/CDXCreator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

# CDXCreator

CDXCreator is a Spark application that creates CDX files from (W)ARC files.

It expects to be run under Spark and receives two arguments: '--input' and
'--output'. The former can also be a glob. Output is written in a "CSV" format
with " " as the seperator. The output filenames are random.

The files currently **don't** have a CDX header.

About

A CDX file creator written in Scala for Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages