Working with PDFs can be a huge drag. With AI, we can take PDFs and extract custom JSON data which make them much easier to work with.
Check out a live deployment of this app at jsonify.org.
Tech stack includes Next.js, Tailwind, OpenAI, and Typescript. Note that this app does not use LangChain, Pinecone, or even GPT-4 (it is using the GPT-3.5 model). While you would likely want to consider these on a more robust app, the purpose of this app is to simply demonstrate the "JSONification of a PDF" process.
Get in touch via twitter if you have questions
- Clone the repo
git clone https://github.com/AustonianTX/pdf-to-json.git
- Install packages
npm install
- Set up your
.env
file
- Create a
.env
file, and add your OpenAI API key
OPENAI_API_KEY=
- Visit openai to retrieve API keys and insert into your
.env
file.
One you have installed the dependenies, you can run the app using npm run dev
to launch the local dev environment. From there you can define a schema, upload your PDF file(s), and parse them into JSON.
In general, keep an eye out in the issues
and discussions
section of this repo for solutions.
The frontend of this repo was built with Austin Taylor at the Austin AI X Foundation Capital hackathon.