Phrame generates captivating and unique art by listening to conversations around it, transforming spoken words and emotions into visually stunning masterpieces. Unleash your creativity and transform the soundscape around you.
Phrame relies on the SpeechRecognition interface of the Web Speech API to transform audio into text. This text is processed by OpenAI, producing a condensed summary. The summary is then combined with the configured generative AI image services and the final images are saved.
If you would like to make a donation to support development, please use GitHub Sponsors.
Phrame can be used without a microphone and any modern browser will work. However, if you would like to use speech recognition, you will need a compatible browser.
- Responsive UI and API bundled into single Docker image
- Websockets provide instant updates and remote control
- Built in config editor
- Support for multiple AI services
- Voice commands
Interact with Phrame by using the following voice commands.
||Wake word to generate images on demand|
||Advance to next image|
||Advance to previous image|
||Advance to previous image|
version: '3.9' volumes: phrame: services: phrame: container_name: phrame image: jakowenko/phrame restart: unless-stopped volumes: - phrame:/.storage ports: - 3000:3000
Configurable options are saved to
/.storage/config/config.yml and are editable via the UI at
Note: Default values do not need to be specified in configuration unless they need to be overwritten.
# image settings (default: shown below) image: # time in seconds between image animation interval: 60 # order of images to display: random, recent order: recent
Images are generated by processing transcripts. This can be scheduled with a cron expression. All of the transcripts within X minutes will then be processed by OpenAI using
openai.chat.prompt to summarize the transcripts.
# transcript settings (default: shown below) transcript: # schedule as a cron expression for processing transcripts (at every 30th minute) cron: '*/30 * * * *' # how many minutes of files to look back for (process the last 30 minutes of transcripts) minutes: 30 # minimum number of transcripts required to process minimum: 5
To configure OpenAI, obtain an API key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating your
# openai settings (default: shown below) openai: key: sk-XXXXXXX chat: # model name (https://platform.openai.com/docs/models/overview) model: gpt-3.5-turbo # the prompt used to generate a summary prompt: You are a helpful assistant that will take a string of random conversations and pull out a few keywords and topics that were talked about. You will then turn this into a short description to describe a picture, painting, or artwork. It should be no more than two or three sentences and be something that DALL·E can use. Make sure it doesn't contain words that would be rejected by your safety system. image: # size of the generated images: 256x256, 512x512, or 1024x1024 size: 512x512 # number of images to generate for each style n: 1 # used with summary to generate image (summary, style) style: - cinematic
To configure Stability AI, obtain an API key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating your
# stabilityai settings (default: shown below) stabilityai: key: sk-XXXXXXX image: # number of seconds before the request times out and is aborted timeout: 30 # engined used for image generation engine_id: stable-diffusion-512-v2-1 # width of the image in pixels, must be in increments of 64 width: 512 # height of the image in pixels, must be in increments of 64 height: 512 # how strictly the diffusion process adheres to the prompt text (higher values keep your image closer to your prompt) cfg_scale: 7 # number of images to generate samples: 1 # number of diffusion steps to run steps: 50 style: - cinematic
# time settings (default: shown below) time: # defaults to iso 8601 format with support for token-based formatting # https://github.com/moment/luxon/blob/master/docs/formatting.md#table-of-tokens format: # time zone used in logs timezone: UTC
# log settings (default: shown below) # options: silent, error, warn, info, http, verbose, debug, silly logs: level: info
# telemetry settings (default: shown below) # self hosted version of plausible.io # 100% anonymous, used to help improve project # no cookies and fully compliant with GDPR, CCPA and PECR telemetry: true
Run Local Services
Build Local Docker Image