Phrame

Phrame generates captivating and unique art by listening to conversations around it, transforming spoken words and emotions into visually stunning masterpieces. Unleash your creativity and transform the soundscape around you.

How

Phrame relies on the SpeechRecognition interface of the Web Speech API to transform audio into text. This text is processed by OpenAI, producing a condensed summary. The summary is then combined with the configured generative AI image services and the final images are saved.

Donations

If you would like to make a donation to support development, please use GitHub Sponsors.

Minimum Requirements

Phrame can be used without a microphone and any modern browser will work. However, if you would like to use speech recognition, you will need a compatible browser.

Features

Supported Architecture

  • amd64
  • arm64
  • arm/v7

Supported AIs

Voice Commands

Interact with Phrame by using the following voice commands.

Command Action
Hey Phrame Wake word to generate images on demand
Next Image Advance to next image
Previous Image Advance to previous image
Last Image Advance to previous image

Usage

Docker Compose

version: '3.9'

volumes:
  phrame:

services:
  phrame:
    container_name: phrame
    image: jakowenko/phrame
    restart: unless-stopped
    volumes:
      - phrame:/.storage
    ports:
      - 3000:3000

Configuration

Configurable options are saved to /.storage/config/config.yml and are editable via the UI at http://localhost:3000/config.

Note: Default values do not need to be specified in configuration unless they need to be overwritten.

image

# image settings (default: shown below)

image:
  # time in seconds between image animation
  interval: 60
  # order of images to display: random, recent
  order: recent

transcript

Images are generated by processing transcripts. This can be scheduled with a cron expression. All of the transcripts within X minutes will then be processed by OpenAI using openai.chat.prompt to summarize the transcripts.

# transcript settings (default: shown below)

transcript:
  # schedule as a cron expression for processing transcripts (at every 30th minute)
  cron: '*/30 * * * *'
  # how many minutes of files to look back for (process the last 30 minutes of transcripts)
  minutes: 30
  # minimum number of transcripts required to process
  minimum: 5

openai

To configure OpenAI, obtain an API key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating your config.yml file.

# openai settings (default: shown below)

openai:
  key: sk-XXXXXXX

  chat:
    # model name (https://platform.openai.com/docs/models/overview)
    model: gpt-3.5-turbo
    # the prompt used to generate a summary
    prompt: You are a helpful assistant that will take a string of random conversations and pull out a few keywords and topics that were talked about. You will then turn this into a short description to describe a picture, painting, or artwork. It should be no more than two or three sentences and be something that DALL·E can use. Make sure it doesn't contain words that would be rejected by your safety system.
  image:
    # size of the generated images: 256x256, 512x512, or 1024x1024
    size: 512x512
    # number of images to generate for each style
    n: 1
    # used with summary to generate image (summary, style)
    style:
      - cinematic

stabilityai

To configure Stability AI, obtain an API key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating your config.yml file.

# stabilityai settings (default: shown below)

stabilityai:
  key: sk-XXXXXXX

  image:
    # number of seconds before the request times out and is aborted
    timeout: 30
    # engined used for image generation
    engine_id: stable-diffusion-512-v2-1
    # width of the image in pixels, must be in increments of 64
    width: 512
    # height of the image in pixels, must be in increments of 64
    height: 512
    # how strictly the diffusion process adheres to the prompt text (higher values keep your image closer to your prompt)
    cfg_scale: 7
    # number of images to generate
    samples: 1
    # number of diffusion steps to run
    steps: 50
    style:
      - cinematic

time

# time settings (default: shown below)
time:
  # defaults to iso 8601 format with support for token-based formatting
  # https://github.com/moment/luxon/blob/master/docs/formatting.md#table-of-tokens
  format:
  # time zone used in logs
  timezone: UTC

logs

# log settings (default: shown below)
# options: silent, error, warn, info, http, verbose, debug, silly
logs:
  level: info

telemetry

# telemetry settings (default: shown below)
# self hosted version of plausible.io
# 100% anonymous, used to help improve project
# no cookies and fully compliant with GDPR, CCPA and PECR
telemetry: true

Development

Run Local Services

Service Command URL
UI npm run local:frontend localhost:8080
API npm run local:api localhost:3000

Build Local Docker Image

./.develop/build

GitHub

View Github