This repository contains tools that facilitate the conversion of chat session data from JSON format to various CSV formats and Hugging Face datasets. It provides a Bash script and a Go program designed to cater to different requirements for data processing and readability, offering multiple output options.
The tools process a JSON file containing chat session data, which includes fields such as id
, topic
, memoryPrompt
, and a nested messages
array with message metadata. Both the Bash script and the Go program offer four distinct output options for CSV formatting:
- Inline Formatting: All messages are included in a single cell for each session.
- One Message Per Line: Each message is placed on a new line with session context repeated.
- Separate Files for Sessions and Messages: Two CSV files are created; one for session metadata and one for messages.
- JSON String in CSV: Messages are stored as a JSON string in a single cell, preserving the array structure.
Additionally, the Go program can convert the sessions into a JSON format suitable for use as a Hugging Face dataset.
Below is an example of what the CSV output might look like for each format option:
id | topic | memoryPrompt | messages |
---|---|---|---|
8dgQves8ClEy0T4vfHjLs | New Conversation | Example prompt | '[user, 11/27/2023, 10:14:00 AM] "hello"; [assistant, 11/27/2023, 10:14:00 AM] "Hello! How can I assist you today?"' |
session_id | message_id | date | role | content | memoryPrompt |
---|---|---|---|---|---|
8dgQves8ClEy0T4vfHjLs | ZKSQGCgGKgrtBCSoqLhFe | 11/27/2023, 10:14:00 AM | user | hello | Example prompt |
8dgQves8ClEy0T4vfHjLs | S7DZB9nPoMk4Go_30zESE | 11/27/2023, 10:14:00 AM | assistant | Hello! How can I assist you today? | Example prompt |
sessions.csv:
id | topic | memoryPrompt | ... |
---|---|---|---|
8dgQves8ClEy0T4vfHjLs | New Conversation | Example prompt | ... |
messages.csv:
session_id | message_id | date | role | content | memoryPrompt |
---|---|---|---|---|---|
8dgQves8ClEy0T4vfHjLs | ZKSQGCgGKgrtBCSoqLhFe | 11/27/2023, 10:14:00 AM | user | hello | Example prompt |
8dgQves8ClEy0T4vfHjLs | S7DZB9nPoMk4Go_30zESE | 11/27/2023, 10:14:00 AM | assistant | Hello! How can I assist you today? | Example prompt |
id | topic | memoryPrompt | messages |
---|---|---|---|
8dgQves8ClEy0T4vfHjLs | New Conversation | Example prompt | [{"id": "ZKSQGCgGKgrtBCSoqLhFe", "date": "11/27/2023, 10:14:00 AM", "role": "user", "content": "hello"}, {"id": "S7DZB9nPoMk4Go_30zESE", "date": "11/27/2023, 10:14:00 AM", "role": "assistant", "content": "Hello! How can I assist you today?"}] |
Note: "..." represents other columns that would be present in the CSV but are omitted here for brevity.
To use the Bash script, follow these steps:
- Clone the repository or download the
chat_session_exporter.sh
file. - Make the script executable:
chmod +x chat_session_exporter.sh
- Run the script and follow the prompts:
./chat_session_exporter.sh
You will be asked to provide the path to your JSON file and to choose your preferred CSV output format. Optionally, you can save the output to a file.
jq
: The script relies on thejq
command-line JSON processor. Make sure it is installed on your system.- Bash shell: The script is intended to be run in a Bash environment.
To use the Go program (main.go
), follow these steps:
- Clone the repository or download from Latest release.
- Ensure you have Go installed on your system. You can download it from the official Go website.
- Navigate to the directory containing
main.go
in a terminal. - Compile the program using Go:
go build -o chat_session_exporter main.go
- Run the compiled program and follow the prompts:
./chat_session_exporter
You will be asked to provide the path to your JSON file and to choose your preferred output format. Optionally, you can save the output to a file.
- Go programming language installed on your system.
- A JSON file containing the chat session data.
Contributions to improve the tools or extend their functionality are welcome. Please feel free to fork the repository and submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.