Phrame

Phrame generates captivating and unique art by listening to conversations around it, transforming spoken words and emotions into visually stunning masterpieces. Unleash your creativity and transform the soundscape around you.

How

Phrame relies on the SpeechRecognition interface of the Web Speech API to transform audio into text. This text is processed by OpenAI, producing a condensed summary. The summary is then combined with the configured generative AI image services and the final images are saved.

Donations

If you would like to make a donation to support development, please use GitHub Sponsors.

Minimum Requirements

Phrame can be used without a microphone and any modern browser will work. However, if you would like to use speech recognition, you will need a compatible browser.

OpenAI API Key

Features

Responsive UI and API bundled into single Docker image
Websockets provide instant updates and remote control
Built in config editor
Support for multiple AI services
Voice commands

Supported Architecture

amd64
arm64
arm/v7

Supported AIs

Voice Commands

Interact with Phrame by using the following voice commands.

Command	Action
`Hey Phrame`	Wake word to generate images on demand
`Next Image`	Advance to next image
`Previous Image`	Advance to previous image
`Last Image`	Advance to previous image

Usage

Docker Compose

version: '3.9'

volumes:
  phrame:

services:
  phrame:
    container_name: phrame
    image: jakowenko/phrame
    restart: unless-stopped
    volumes:
      - phrame:/.storage
    ports:
      - 3000:3000

Configuration

Configurable options are saved to /.storage/config/config.yml and are editable via the UI at http://localhost:3000/config.

Note: Default values do not need to be specified in configuration unless they need to be overwritten.

`image`

# image settings (default: shown below)

image:
  # time in seconds between image animation
  interval: 60
  # order of images to display: random, recent
  order: recent

`transcript`

Images are generated by processing transcripts. This can be scheduled with a cron expression. All of the transcripts within X minutes will then be processed by OpenAI using openai.chat.prompt to summarize the transcripts.

# transcript settings (default: shown below)

transcript:
  # schedule as a cron expression for processing transcripts (at every 30th minute)
  cron: '*/30 * * * *'
  # how many minutes of files to look back for (process the last 30 minutes of transcripts)
  minutes: 30
  # minimum number of transcripts required to process
  minimum: 5

`openai`

To configure OpenAI, obtain an API key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating your config.yml file.

# openai settings (default: shown below)

openai:
  key: sk-XXXXXXX

  chat:
    # model name (https://platform.openai.com/docs/models/overview)
    model: gpt-3.5-turbo
    # the prompt used to generate a summary
    prompt: You are a helpful assistant that will take a string of random conversations and pull out a few keywords and topics that were talked about. You will then turn this into a short description to describe a picture, painting, or artwork. It should be no more than two or three sentences and be something that DALL·E can use. Make sure it doesn't contain words that would be rejected by your safety system.
  image:
    # size of the generated images: 256x256, 512x512, or 1024x1024
    size: 512x512
    # number of images to generate for each style
    n: 1
    # used with summary to generate image (summary, style)
    style:
      - cinematic

`stabilityai`

To configure Stability AI, obtain an API key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating your config.yml file.

# stabilityai settings (default: shown below)

stabilityai:
  key: sk-XXXXXXX

  image:
    # number of seconds before the request times out and is aborted
    timeout: 30
    # engined used for image generation
    engine_id: stable-diffusion-512-v2-1
    # width of the image in pixels, must be in increments of 64
    width: 512
    # height of the image in pixels, must be in increments of 64
    height: 512
    # how strictly the diffusion process adheres to the prompt text (higher values keep your image closer to your prompt)
    cfg_scale: 7
    # number of images to generate
    samples: 1
    # number of diffusion steps to run
    steps: 50
    style:
      - cinematic

`time`

# time settings (default: shown below)
time:
  # defaults to iso 8601 format with support for token-based formatting
  # https://github.com/moment/luxon/blob/master/docs/formatting.md#table-of-tokens
  format:
  # time zone used in logs
  timezone: UTC

`logs`

# log settings (default: shown below)
# options: silent, error, warn, info, http, verbose, debug, silly
logs:
  level: info

`telemetry`

# telemetry settings (default: shown below)
# self hosted version of plausible.io
# 100% anonymous, used to help improve project
# no cookies and fully compliant with GDPR, CCPA and PECR
telemetry: true

Development

Run Local Services

Service	Command	URL
UI	`npm run local:frontend`	`localhost:8080`
API	`npm run local:api`	`localhost:3000`

Build Local Docker Image

./.develop/build

GitHub

View Github