Skip to content

Convert audio to images with a sequence of Open AI API calls

Notifications You must be signed in to change notification settings

unRARed/audio-images

Repository files navigation

Audio to Images

This tool takes an MP3 as input via a web form and generates a number of images via the Open AI API in this sequence:

  • Audio File -> Transcription
  • Transcription -> n Summarized Prompts
  • Summarized Prompts -> Short Global Summary
  • Indiviual Prompt + Global Summary -> Image

After generation is complete, and "Optimize" option will perform a handful of imagemagick operations on the generated content.

Setup

You'll need ruby, maybe wget and imagemagick.

  • Run bundle install
  • Ensure OPENAI_API_KEY is set to your particular API key
  • Add the upscaler: wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesrgan-ncnn-vulkan-20220424-macos.zip unzip realesrgan-ncnn-vulkan-20220424-macos.zip chmod u+x realesrgan-ncnn-vulkan rm realesrgan-ncnn-vulkan-20220424-macos.zip

Up and Running

  • Run ruby app.rb (or DEBUG=1 ruby app.rb for more logging)
  • Visit http://127.0.0.1:4567 and upload your audio

Example Output

Example projects are included from:

The first example was a music recording with only a style provided. The second example was per the narration from this YouTube video having both a style and context. However, the additional context of "There's always bats." was overlooked during the prompt generation. Some fine tuning of the prompting is still needed.

UI Example

Project Example Output

Custom Actions

If you want to do some specific post-processing, you can create your own actions from ./custom_actions. See colorize.rb for a full example. Then copy the boilerplate starter.rb into your own file.

About

Convert audio to images with a sequence of Open AI API calls

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published