A powerful and user-friendly web application that converts various file formats (Excel, CSV, PDF, TXT) into JSON format compatible with OpenAI's knowledge base structure. Perfect for preparing training data or creating custom knowledge bases for AI applications.
-
📊 Multiple Format Support
- Excel (.xlsx, .xls)
- CSV files
- PDF documents
- Text files (.txt)
-
🎯 Smart Conversion
- Intelligent text chunking for large documents
- Maintains document structure
- Preserves metadata
- Generates unique document IDs
-
💅 Modern UI/UX
- Clean, responsive design
- Real-time file information
- Progress feedback
- Error handling with clear messages
-
🛠 Developer Friendly
- Well-structured codebase
- Modular architecture
- Easy to extend
- Comprehensive documentation
Before you begin, ensure you have the following installed:
- Node.js (version 14.0.0 or higher)
- npm (usually comes with Node.js)
-
Clone the repository
git clone https://github.com/akelaonline/csv-to-json.git cd csv-to-json
-
Install dependencies
npm install
-
Start the server
npm start
-
Access the application Open your browser and navigate to:
http://localhost:3001
- Click the "Choose File" button to select your file
- Select any supported file (Excel, CSV, PDF, or TXT)
- Click "Convert to JSON" to process the file
- View the converted JSON in the result area
- Use the "Download JSON" button to save the result
Upload and convert a file to JSON format.
Request:
- Method: POST
- Content-Type: multipart/form-data
- Body:
- file: Your file (Excel, CSV, PDF, or TXT)
Response:
[
{
"id": "doc_0",
"text": "Extracted content from the file",
"metadata": {
"sourceType": "excel|csv|pdf|txt",
"filename": "original_filename.ext",
"uploadDate": "2025-02-15T19:17:53.000Z",
// Additional metadata specific to file type
}
}
]
The converter generates JSON in the following structure:
[
{
"id": "doc_0",
"text": "Content chunk 1",
"metadata": {
"sourceType": "excel",
"filename": "example.xlsx",
"uploadDate": "2025-02-15T19:17:53.000Z",
"sheetName": "Sheet1",
"rowCount": 100,
"columnCount": 5
}
},
{
"id": "doc_1",
"text": "Content chunk 2",
"metadata": {
// Metadata varies by file type
}
}
]
- sourceType: "excel"
- sheetName: Name of the worksheet
- rowCount: Number of rows
- columnCount: Number of columns
- sourceType: "csv"
- rowCount: Number of rows
- columnCount: Number of columns
- rowNumber: Position in original file
- sourceType: "pdf"
- pageCount: Total pages
- author: Document author (if available)
- title: Document title (if available)
- chunkIndex: Position in chunked content
- totalChunks: Total number of chunks
- sourceType: "txt"
- chunkIndex: Position in chunked content
- totalChunks: Total number of chunks
- characterCount: Length of chunk
├── index.js # Application entry point
├── routes/
│ └── fileRoutes.js # Route definitions
├── controllers/
│ └── fileController.js # Request handling logic
├── utils/
│ ├── parseExcel.js # Excel file parser
│ ├── parseCsv.js # CSV file parser
│ ├── parsePdf.js # PDF file parser
│ └── parseTxt.js # Text file parser
└── public/
└── index.html # Web interface
- express: Web application framework
- multer: File upload handling
- xlsx: Excel file parsing
- csv-parser: CSV file parsing
- pdf-parse: PDF file parsing
- File size limit: 5MB
- Supported file types only
- Temporary file cleanup
- Error handling for malformed files
- No system file access outside upload directory
We welcome contributions! Please follow these steps:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Need help? Have questions? Here's how to get support:
- Create an Issue
- Email: [your-email@example.com]
- Documentation: Wiki
- Thanks to all contributors
- Inspired by OpenAI's knowledge base format
- Built with modern web technologies
- Add support for more file formats
- Implement batch processing
- Add custom chunking options
- Create API authentication
- Add file compression support
- Implement real-time conversion progress
- Add custom metadata fields
Made with ❤️ by [Your Name/Team]