Skip to content

Storia-AI/screenplay-pdf-to-json

 
 

Repository files navigation

Screenplay Parser

Parse PDF screenplay into rich JSON format

Install

pip install screenplay-pdf-to-json

Package Dependencies

Contributing

Clone this repository and run the following:

pipenv install

# or

pip3 install -r requirements.txt

Usage

As a CLI:

python $PATH_OF_PACKAGE/src/convert.py -s path_of_screenplay.pdf --start page_number_to_start_analyzing

As a library:

from screenplay_pdf_to_json import convert
fp =  open('screenplay.pdf', 'rb')
scriptJSON = convert(fp, 0)
print(scriptJSON)

Notes

  • Works well for "clean" PDF screenplays, not OCR PDFs.

  • Production screenplays works pretty well.

JSON structure

[{
  // page number
  "page": 1,
  // scene info
  "scene_info": {
    "region":  "EXT.",  //region of scene [EXT., INT., EXT./INT, INT./EXT]
    "location":  "VILLA",
    "time": ["DAY"] // time of scene [DAY, NIGHT, DAWN, DUSK, ...]
  },
  "scene": [{
    "type":  "ACTION",  // type of snippet [ACTION, CHARACTER, TRANSITION, DUAL_DIALOGUE]
    "content": {...} // content differs based on ACTION
  }, {...}]
}, {...}]
  • Initial pages of a screenplay that's part of the title page, TOC, cast list, ... is included as type FIRST_PAGES.

  • It's really an array of dictionaries rather than a JSON object.

Type Content Structure

  • ACTION
"content": [{
  "text":  "an action paragraph",
  "x": 108,
  "y": 120 // Y-axis of last line in paragraph
}, {...}]
  • CHARACTER
"content": {
  "character":  "MILES",
  "modifier": null,  // V.O, O.S., and more. null if no modifier
  "dialogue": [
  "Hey good morning. How you doing?... Weekend was short, huh? ",
  "(he turns to another kid)",  //parentheticals are seperated
  " Oh my gosh this is embarrassing, we wore the same jacket--"
  ]
}
  • DUAL_DIALOGUE
"content": {
  "character1": {
    "character": {
      "character":  "PETER",
      "modifier": null
    },
    "dialogue": [
      "(groggy)",
      " Why are you trying to kill me?--"
    ]
  },
  "character2": {
    "character": {
      "character":  "MILES",
      "modifier":  "CONT'D"
    },
    "dialogue": [
    "--I’m not! I’m trying to save you!"
    ]
  }
}
  • TRANSITION
"content": {
  "text":  "SMASH TO:",
  "metadata": {
    "x": 448,
    "y": 720
  }
}

Run tests

python -m pytest tests/

Notes

  • Do poetry install OUTSIDE of poetry shell before entering the shell and running the script.

Todos

  • Add unit tests

  • Skip to start of screenplay

  • More documentation

  • Add option to use as a library

  • detect end of screenplay

Author

👤 Egan Bisma

Show your support

Give a ⭐️ if this project helped you!


About

Parse PDF screenplays into rich JSON format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%