Skip to content

akora/digital-vault-archive-builder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digital Vault Archive Builder

A robust Python-based tool for organizing and maintaining a digital file vault with intelligent file processing, metadata extraction, and consistent naming conventions.

Features

  1. Intelligent File Processing

    • Specialized processors for images, videos, PDFs, and text files
    • Automatic file type detection and categorization
    • Consistent file naming across all formats
    • Metadata extraction and preservation
    • Duplicate detection using XXH64 checksums
  2. Hierarchical Organization

    • Configurable date-based directory structure (none/year/month/day)
    • Type-based categorization (images, videos, documents, notes)
    • Smart subcategorization (e.g., scanned vs regular documents)
  3. Metadata Management

    • EXIF data extraction from images
    • Video technical metadata (resolution, fps, HDR)
    • PDF metadata and document type detection
    • Text file analysis (word count, frontmatter parsing)
    • Creation dates and timestamps preservation
  4. Database Integration

    • SQLite database for file tracking
    • Checksum-based duplicate prevention
    • Quick file lookup and metadata querying

Setup

  1. Clone this repository
  2. Create a virtual environment: python -m venv venv
  3. Activate the virtual environment:
    • On Unix/MacOS: source venv/bin/activate
    • On Windows: .\venv\Scripts\activate
  4. Install dependencies: pip install -r requirements.txt

System Requirements

  • Python 3.8 or higher
  • External dependencies:
    • exiftool for image metadata extraction
    • ffprobe (part of ffmpeg) for video metadata extraction

Configuration

Edit config.py to customize:

  • Input directory (INBOX_DIR)
  • Vault directory (VAULT_DIR)
  • Date hierarchy level:
    • DATE_HIERARCHY_NONE: flat structure
    • DATE_HIERARCHY_YEAR: year folders
    • DATE_HIERARCHY_MONTH: year/month folders
    • DATE_HIERARCHY_DAY: year/month/day folders

File Type Support

Images

  • Regular formats (stored in images/): jpg, jpeg, png, gif, bmp, webp, tiff, tif
  • RAW formats (stored in photos/raw/): heic, arw (Sony), cr2 (Canon), nef (Nikon), raf (Fuji), dng, raw

Naming format:

d[YYYYMMDD]-t[HHMMSS]-s[BYTES]-r[WIDTHxHEIGHT][-sc######][-MAKE-MODEL].ext

Videos

  • Formats: mp4, mov, avi, mkv, wmv, flv, webm, mpg, mpeg, mts, m2ts, m4v, 3gp

Naming format:

d[YYYYMMDD]-t[HHMMSS]-s[BYTES]-[RESOLUTION]-[FPS]-[DURATION][-HDR].ext

PDFs

  • Categories: ebooks, scanned documents, regular PDFs

Naming format:

[scanned-]d[YYYYMMDD]-t[HHMMSS]-s[BYTES]-p[PAGES][-TITLE/SCANNER].pdf

Text Files

  • Formats: txt, md
  • Support for YAML frontmatter in markdown
  • Automatic title extraction from frontmatter or content

Naming format:

d[YYYYMMDD]-t[HHMMSS]-s[BYTES]-w[WORDCOUNT][-TITLE].ext

Usage

Basic Operation

# Process new files in inbox
python vault_builder.py

# Clean up inbox (remove empty dirs and hidden files)
python inbox_cleaner.py

File Organization

Files are organized into these main directories:

  • images/: Regular image files (jpg, png, etc.)
  • photos/raw/: RAW photo files (arw, cr2, nef, etc.)
  • videos/: All video files
  • documents/: PDFs and other documents
    • documents/scanned/: Scanned documents
    • documents/ebooks/: Detected ebooks
  • notes/: Text and markdown files

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

About

Digital Vault Archive Builder

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages