VoiceLog

A terminal-based voice memo application built with Go and Bubble Tea.

Current Version: v1.0.8 - Latest release with real-time waveform visualization and audio processing features.

Screenshots

Main Screen

The main interface showing the memo list, ASCII art speaker visualization, and help information.

Settings Screen

Audio configuration interface displaying hardware/audio settings, available devices, and help.

Features

Audio Recording and Playback

Record audio using PortAudio with real-time waveform visualization
Playback with real-time controls and waveform display
WAV file format support with automatic post-processing
Configurable audio devices and settings
Test tone generation (440Hz sine wave)
Real-time clipping detection with visual warnings
Automatic silence trimming and audio normalization

Memo Management

List view with navigation
Rename memos
Add tags for organization
Delete memos
Export memos to Downloads folder
Optional transcription with multiple provider support

User Interface

Terminal user interface using Bubble Tea
Keyboard navigation
Settings screen for audio configuration and processing options
Help screen with keybindings
ASCII art speaker visualization with two-tone coloring
Professional color scheme with rounded borders
Adaptive layout with real-time audio visualizer
Real-time peak level meters and VU meters during recording

Installation

Pre-built Releases

Download the latest release from GitHub Releases:

Windows (amd64): voicelog-v1.0.8-windows-amd64.zip
Linux (amd64): voicelog-v1.0.8-linux-amd64.tar.gz

Windows Installation

Download voicelog-v1.0.8-windows-amd64.zip
Extract the archive
Run voicelog-windows-amd64.exe

Linux Installation

Download voicelog-v1.0.8-linux-amd64.tar.gz
Extract: tar -xzf voicelog-v1.0.8-linux-amd64.tar.gz
Install PortAudio: sudo apt-get install libportaudio2
Run: ./voicelog-linux-amd64

Build from Source

Prerequisites

Go 1.25 or later
PortAudio development libraries

Windows (MSYS2)

pacman -S mingw-w64-x86_64-portaudio

Linux (Ubuntu/Debian)

sudo apt-get install libportaudio2 portaudio19-dev

Build and Run

# Clone the repository
git clone https://github.com/Cod-e-Codes/voicelog.git
cd voicelog

# Download dependencies
go mod download

# Build the binary
go build -o voicelog main.go

# Run
./voicelog

Usage

Keybindings

Key	Action
`SPACE`	Start/Stop recording
`ENTER`	Play/Pause selected memo
`↑/↓`	Navigate memo list
`ctrl+r`	Rename memo
`ctrl+g`	Add tag
`ctrl+d`	Delete memo
`ctrl+e`	Export memo
`ctrl+x`	Stop playback
`?`	Show help
`ctrl+s`	Settings
`ctrl+t`	Transcribe selected memo
`F5`	Generate test file
`ESC/q`	Quit

Basic Operations

Recording: Press SPACE to start/stop recording
Playback: Select a memo and press ENTER to play
Transcription: Press ctrl+t to transcribe selected memo (optional)
Settings: Press ctrl+s to configure audio devices and transcription
Test File: Press F5 to generate a 5-second 440Hz test tone
Export: Press ctrl+e to export selected memo to Downloads folder

Audio Processing Features

VoiceLog includes advanced audio processing capabilities:

Real-Time Visualization

Waveform Display: Live waveform visualization during recording and playback
Peak Level Meters: Monitor input levels with color-coded peak indicators (during recording)
VU Meters: Left/right channel level monitoring (during recording)
Clipping Detection: Visual warnings when audio levels exceed thresholds (during recording)

Automatic Post-Processing

Silence Trimming: Automatically removes silence from beginning and end of recordings
Audio Normalization: Amplifies recordings to optimal levels (configurable target)
Configurable Thresholds: Adjust silence detection and clipping thresholds in settings

Adaptive Interface

Smart Layout: Interface adapts during recording/playback to show visualizer
Compact Mode: Memo list becomes compact when audio visualizer is active
Real-Time Updates: Waveform and meters update in real-time during operation

Transcription (Optional)

VoiceLog supports optional voice-to-text transcription through a flexible plugin system. Transcription is completely optional - the application works perfectly without it.

Supported Transcription Providers

whisper.cpp (Recommended - Local & Private)
- High accuracy, supports many languages
- Runs entirely offline - no internet required
- Complete privacy - audio never leaves your machine
- Installation: github.com/ggerganov/whisper.cpp
OpenAI Whisper API (Cloud-based - Highest Accuracy)
- Highest accuracy available
- Requires internet connection and API key
- Install: pip install openai
- Set OPENAI_API_KEY environment variable
Vosk (Lightweight & Fast)
- Smaller models, faster processing
- Good for real-time applications
- Installation: alphacephei.com/vosk
Custom Python Script
- Use any transcription API (AssemblyAI, Rev.ai, etc.)
- Write your own integration script
- Full flexibility for custom workflows

Quick Setup Examples

whisper.cpp Setup (Linux/macOS):

# Clone and build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make

# Download a model (base.en for English, base for multilingual)
./models/download-ggml-model.sh base.en

# The whisper binary will be auto-detected by VoiceLog

OpenAI Whisper API Setup:

# Install the OpenAI library
pip install openai

# Set your API key (get one from https://platform.openai.com)
export OPENAI_API_KEY="your-api-key-here"

Using Transcription

Enable in Settings: Press ctrl+s → Navigate to "Transcription:" → Toggle to ON
Select Provider: Navigate to "Default Provider:" → Choose your installed provider
Transcribe: Press ctrl+t on any memo to transcribe it
Auto-Transcribe: Enable "Auto Transcribe:" to automatically transcribe new recordings

Transcription Features

Visual Indicators: Transcribed memos show a 📝 icon in the memo list
Search Integration: Search through transcribed text using the built-in filter
Provider Status: Settings show ✓/✗ status for each provider's availability
Flexible Configuration: Each provider can be configured independently
Auto-Detection: VoiceLog automatically detects available transcription tools

Privacy & Performance

Local Options: whisper.cpp and Vosk run entirely on your machine
Cloud Options: OpenAI Whisper API provides highest accuracy but requires internet
No Telemetry: VoiceLog never sends any data anywhere (except when using API providers)
Storage: Transcriptions are stored locally alongside memo metadata

Configuration

Configuration is stored in ~/.voicelog/config.json and includes:

Audio device settings
Sample rate and format preferences
Audio processing settings (normalization, silence trimming, clipping detection)
Transcription settings (optional)
Memo storage path
Keybindings

File Structure

~/.voicelog/
├── config.json          # Application configuration
├── transcription.json   # Transcription settings (if enabled)
├── memos/               # Voice memo storage
│   ├── metadata.json    # Memo metadata (includes transcriptions)
│   └── memo_*.wav       # Audio files
└── voicelog.log         # Application logs

Technical Details

Built with:

Bubble Tea - TUI framework
PortAudio - Audio I/O
Go - Programming language

Known Issues

Audio Device Problems

WSL (Windows Subsystem for Linux): ALSA errors occur due to missing audio device access. WSL doesn't provide direct access to Windows audio devices.
Windows Standalone: Missing libportaudio.dll when running the pre-built binary outside of MSYS2 environment.
Recording Issues: Audio recording may not work properly in some environments, though playback and device detection work correctly.

Workarounds

For WSL: Use the Windows version instead, as WSL doesn't support direct audio device access.
For Windows: Run from MSYS2 environment or ensure PortAudio libraries are properly installed.
For Linux: Ensure you have proper audio device permissions and ALSA/PulseAudio configured.

Contributing

This project is a work in progress and contributions are welcome! If you encounter issues or have improvements to suggest, please:

Check existing issues on GitHub
Create a new issue with detailed information about your environment
Submit pull requests for bug fixes or new features

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
alsa_silence_linux.go		alsa_silence_linux.go
alsa_silence_others.go		alsa_silence_others.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
release.ps1		release.ps1
release.sh		release.sh
transcription.go		transcription.go
voicelog-screenshot-1.png		voicelog-screenshot-1.png
voicelog-screenshot-2.png		voicelog-screenshot-2.png

Uh oh!

License

Cod-e-Codes/voicelog

Folders and files

Latest commit

History

Repository files navigation

VoiceLog

Screenshots

Main Screen

Settings Screen

Features

Audio Recording and Playback

Memo Management

User Interface

Installation

Pre-built Releases

Windows Installation

Linux Installation

Build from Source

Prerequisites

Windows (MSYS2)

Linux (Ubuntu/Debian)

Build and Run

Usage

Keybindings

Basic Operations

Audio Processing Features

Real-Time Visualization

Automatic Post-Processing

Adaptive Interface

Transcription (Optional)

Supported Transcription Providers

Quick Setup Examples

Using Transcription

Transcription Features

Privacy & Performance

Configuration

File Structure

Technical Details

Known Issues

Audio Device Problems

Workarounds

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Sponsor this project

Uh oh!

Languages