Resume Info Extractor

Resume Info Extractor is a serverless application that automatically extracts information from PDF resumes uploaded to an Amazon S3 bucket. It utilizes AWS Lambda, OpenAI API, and MongoDB Atlas to process and store structured data from resumes. The project is bootstrapped using AWS SAM CLI and includes a Lambda layer for pdf-dist node_module for handling PDF extraction.

How it works

Upload a PDF: Resumes are uploaded to an S3 bucket, which triggers a Lambda.
Automatic Processing: Lambda function extracts data using OpenAI API and stores it in MongoDB.
Retrieve Candidate Info: Extracted details include:
- Personal Info: Name, Email, Phone Number
- Experience: Total years, Employment History
- Education: Degrees, Schools, Timelines
- Skills & Languages

Prerequisites

AWS account with appropriate permissions.
SAM CLI installed locally: AWS SAM CLI Installation Guide
MongoDB Atlas cluster URL (MONGO_URI) for storing extracted data.
OpenAI API key for utilizing the OpenAI API (OPENAI_API_KEY).

Installation

Clone the repository:

git clone https://github.com/yaldram/resume-extractor.git
cd resume-extractor

Install dependencies and build the project using SAM CLI:
```
npm install
sam build
```
Deploy the application using SAM CLI:
```
sam deploy --guided
```

Environment Variables

Set the following environment variables in your AWS Lambda environment:

MONGO_URI: MongoDB Atlas cluster URL for storing extracted data.
OPENAI_API_KEY: OpenAI API key for accessing the OpenAI API.

Folder Structure

extractor: Lambda function for PDF parsing and data extraction.
pdfdist-layer/: Lambda layer containing pdf-dist module to handle PDF extraction.

SAM Template

The SAM template.yml file is provided to configure AWS resources and permissions required for the application. We created the following resources -

S3 Bucket (ResumeBucket):
- Stores uploaded PDF resumes.
Lambda Layer (PdfdistLayer):
- Layer for the pdf-dist node module.
Lambda Function (ResumeInfoExtracterFunction):
- Extracts information from uploaded PDF resumes.
- Triggered by S3 bucket upload events for .pdf files.
- Uses PdfdistLayer for PDF extraction.
- Requires read access to the specified S3 bucket (ResumeBucketName parameter).

Issues and Troubleshooting

If you encounter issues, please check the Issues section of this repository to see if the problem has already been reported. If not, please feel free to create a new issue.

Contributing

Contributions are welcome! Please fork the repository and create a pull request with your changes. Ensure your code follows the project's coding standards and test your changes thoroughly before submitting.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
extractor		extractor
pdfdist-layer/nodejs		pdfdist-layer/nodejs
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
samconfig.toml		samconfig.toml
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Info Extractor

How it works

Prerequisites

Installation

Environment Variables

Folder Structure

SAM Template

Issues and Troubleshooting

Contributing

About

Releases

Packages

Languages

yaldram/resume-info-extractor

Folders and files

Latest commit

History

Repository files navigation

Resume Info Extractor

How it works

Prerequisites

Installation

Environment Variables

Folder Structure

SAM Template

Issues and Troubleshooting

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages