Skip to content

Latest commit

 

History

History
79 lines (61 loc) · 4.84 KB

README.md

File metadata and controls

79 lines (61 loc) · 4.84 KB

MultiFormat-Interpreter

The MultiFormat Interpreter project involves developing a Python script to automate the extraction of content from various file types such as PDF, TXT, CSV, JSON, and XLSX. This script leverages libraries like PyPDF2 for PDFs, pandas for CSV and Excel files, and json for JSON files to read and process the content. The user interacts with the script by providing the file path and a prompt for code generation. This uses the GPT-4All model to generate code based on the extracted content and the user’s prompt.

GPT-4ALL

The GPT-4All model is a variant of the GPT (Generative Pre-trained Transformer) architecture designed to handle a wide range of natural language processing tasks. Developed by Anthropic, GPT-4All aims to be a versatile tool for generating text across various domains, including code generation, text completion, translation, and more. It builds upon advancements in transformer-based models and is tailored to support applications requiring nuanced understanding and generation of human-like text. Key features of the GPT-4All model are:

  1. Local Operation: GPT-4All operates locally without needing API calls or GPUs, enhancing accessibility and reducing reliance on cloud services.
  2. Offline Capability: Once downloaded, GPT-4All works offline, ensuring users can access AI-powered text generation tools even without internet connectivity.
  3. Versatile Applications: The model supports various tasks like natural language understanding, text generation, and summarization across different domains and languages.
  4. User-Friendly Interface: Designed for ease of use, GPT-4All offers an intuitive interface for inputting prompts and receiving text outputs efficiently.
  5. Privacy and Security: Operating locally ensures data privacy and security compliance, crucial for handling sensitive information.
  6. Customization: Users can customize GPT-4All for specific tasks, enhancing its adaptability to different applications.
  7. Continual Updates: Anthropic provides regular updates and support, ensuring ongoing improvements for enhanced performance and usability.

Dowmload GPT_4ALL: click here

To know about Llama.cpp: click here

Objectives:

File Content Extraction: The script should be able to read and extract text content from various file formats including PDF, TXT, CSV, JSON, and XLSX. Code Generation: Using the extracted content and a user-provided prompt, the script should generate code utilizing the GPT-4All model.

Features:

  1. File Type Support:

    • PDF: Extracts text from PDF documents using the PyPDF2 library.
    • TXT: Reads text files directly.
    • CSV: Reads and converts CSV file content to a string format using the pandas library.
    • JSON: Reads and pretty-prints JSON data.
    • XLSX: Reads and converts Excel file content to a string format using the pandas library.
  2. User Interaction:

    • Prompts the user to enter the path to the file they wish to process.
    • Displays the content of the selected file.
    • Asks the user for a prompt to guide the code generation.
  3. Code Generation:

    • Uses the GPT-4All model to generate code based on the combined input of file content and user prompt.
    • Outputs the generated code for the user.

Workflow:

  1. User Input:
    • The user provides the path to a file.
    • The user inputs a prompt for code generation.
  2. Content Reading:
    • The script reads the file content based on its type.
    • The content is displayed to the user.
  3. Prompt Handling:
    • The script combines the file content with the user prompt to create a comprehensive input for the GPT model.
  4. Code Generation:
    • The GPT-4All model processes the input and generates relevant code.
    • The generated code is displayed to the user.

Prerequisites:

  • Python 3.x
  • Required libraries: PyPDF2, pandas, json, gpt4all

Installation:

  • Clone the repository:

    git clone https://github.com/Ayushverma135/MultiFormat-Interpreter.git
    cd MultiFormat-Interpreter
    
  • Install required Python libraries:

    • Ensure that the required libraries (PyPDF2, pandas, gpt4all) are imported at the beginning of the notebook.

    • If not already installed, install the libraries directly in a code cell at the beginning of the notebook:

      !pip install PyPDF2 pandas gpt4all
      

Usage:

  • Open and Run the Notebook:
    • Open the Google Colab notebook MultiFormat Interpreter.ipynb in your browser.
    • Run each cell sequentially to execute the script.

Contributing

Contributions to enhance or fix issues in the Google Colab notebook are welcome! Please fork the repository and submit pull requests for any improvements.