Recording Skill

The Mr. Pumpkin recording skill lets you describe an animation in plain English and have it automatically generated and uploaded to your pumpkin server.

How it works

You describe the animation you want (e.g. “make the pumpkin look surprised then blink twice”)
The skill sends your description to an LLM (Google Gemini by default)
The LLM generates a timeline JSON file following the timeline schema
The generated timeline is validated locally before upload
The timeline is uploaded to the Mr. Pumpkin server and stored as a recording

Prerequisites

Python 3.10+ — the skill uses the match statement via Union type syntax

Gemini API key — set the environment variable:

set GEMINI_API_KEY=your-api-key-here       # Windows
export GEMINI_API_KEY=your-api-key-here    # macOS/Linux

Get a free key at Google AI Studio.

Install skill dependencies from the skill/ folder:
```
pip install -r skill/requirements.txt
```

Quick start

python -m skill.cli "make the pumpkin look surprised then blink" --filename my_animation

This generates a timeline, prints it for your review, then uploads it to localhost:5000 via TCP.

Usage

python -m skill.cli <prompt> --filename <name> [options]

Required arguments

Argument	Description
`prompt`	Natural language description of the animation
`-f`, `--filename`	Name to store the timeline as on the server (no `.json` extension)

Optional arguments

Option	Default	Description
`--host`	`localhost`	Mr. Pumpkin server hostname or IP
`--tcp-port`	`5000`	TCP server port
`--ws-port`	`5001`	WebSocket server port
`--protocol`	`tcp`	Upload protocol: `tcp` or `ws`
`--dry-run`	off	Generate and print the timeline without uploading
`--provider`	`gemini`	LLM provider (currently only `gemini`)

Examples

Simple blink animation:

python -m skill.cli "blink twice slowly" --filename slow_blink

Emotional sequence:

python -m skill.cli "look surprised, then happy, then wink" --filename surprise_wink

Preview before uploading:

python -m skill.cli "sleepy pumpkin nodding off" --filename sleepy --dry-run

Upload to a remote server via WebSocket:

python -m skill.cli "happy greeting" --filename hello --host 192.168.1.10 --protocol ws

Error handling

Error	Cause	Fix
`No Gemini API key found`	`GEMINI_API_KEY` not set	Set the environment variable
`Generation failed`	LLM produced invalid JSON or unknown commands	Rephrase the prompt and try again
`Upload failed — ERROR: already exists`	A file with that name already exists on the server	Choose a different `--filename`
`Connection failed`	Server not reachable	Verify the server is running and the host/port are correct

Using the API directly

You can also use the skill programmatically:

from skill import generate_timeline, upload_timeline

# Generate a timeline dict
timeline = generate_timeline("make the pumpkin look confused")

# Upload it
upload_timeline("confused_pumpkin", timeline, host="localhost")

Custom LLM provider

The skill is designed to be provider-agnostic. You can swap in any LLM by implementing the LLMProvider interface:

from skill.generator import LLMProvider, generate_timeline

class MyLocalModel(LLMProvider):
    def generate(self, system_prompt: str, user_prompt: str) -> str:
        # Call your local model here
        return my_model.chat(system=system_prompt, user=user_prompt)

timeline = generate_timeline("wave hello", provider=MyLocalModel())

Recording Chaining

You can embed one recording inside another by using the play_recording command in a timeline. When the playback engine reaches a play_recording command, it pauses the parent timeline, plays the named sub-recording in full, and then resumes the parent from the next command.

This lets you build reusable animation building-blocks (a slow blink, an excited wiggle) and compose them into larger sequences without duplicating content.

The LLM generator already knows this

The recording skill’s LLM generator has play_recording in its command vocabulary. You can describe chained animations in plain English:

python -m skill.cli "do our standard greeting animation, then look surprised and do the excitement wiggle" \
  --filename composed_greeting

The LLM may choose to reference existing recordings by name using play_recording if you describe them in the prompt. If you want to guarantee chaining, name the sub-recordings explicitly:

python -m skill.cli "play 'slow_blink' then 'excited_wiggle' with a happy expression in between" \
  --filename happy_sequence

Depth limit

Recordings can be nested up to 5 levels deep. If a play_recording command is reached at the maximum depth, it is skipped (an error is logged) and the parent continues playing. Circular references (A → B → A) are prevented by this limit.

Audio Lip-Sync Recording

The lipsync_cli tool generates a recording where the pumpkin’s mouth moves in sync with spoken audio. It uses Google Gemini to analyze the audio file, extract per-word timing and emotional tone, then produces a timeline with mouth viseme commands automatically timed to the speech.

How it works

The audio file is uploaded to the Gemini File API
Pass 1 — word-level timing extraction: which words occur at which millisecond
Pass 2 — emotion and pacing analysis: happy/sad/scared/surprised/angry, tempo, and duration
A timeline is generated combining expression commands (from emotion) and mouth viseme commands (from word timing)
Both the timeline JSON and the audio file are uploaded to the Mr. Pumpkin server
Playback plays the audio and timeline simultaneously — fully synchronized

Quick start

python -m skill.lipsync_cli speech.mp3 --filename halloween_speech

This uploads halloween_speech.json (timeline) and halloween_speech.mp3 (audio) to the server.

Usage

python -m skill.lipsync_cli <audio_file> [options]

Argument	Default	Description
`audio_file`	required	Path to audio file (`.mp3`, `.wav`, `.ogg`, `.m4a`, `.aac`, `.flac`)
`-f`, `--filename`	stem of audio file	Name to store the recording as (no extension)
`--prompt`	auto	Extra context added to the Gemini analysis prompt
`--host`	`localhost`	Mr. Pumpkin server hostname or IP
`--tcp-port`	`5000`	TCP server port
`--ws-port`	`5001`	WebSocket server port
`--protocol`	`tcp`	Upload protocol: `tcp` or `ws`
`--dry-run`	off	Analyze and print the timeline without uploading
`--audio-provider`	`gemini`	Audio analysis provider

Supported audio formats

Format	Gemini analysis	Pygame playback
`.mp3`	✅	✅
`.wav`	✅	✅
`.ogg`	✅	✅
`.m4a`	✅	❌
`.aac`	✅	❌
`.flac`	✅	❌

Tip: If you only have .m4a or .aac, convert first:
ffmpeg -i speech.m4a speech.mp3

Preview before uploading

python -m skill.lipsync_cli speech.mp3 --dry-run

Prints the generated timeline JSON without uploading anything.

Playing the recording

After upload, play it the same way as any other recording:

play halloween_speech

The pumpkin’s mouth will animate in sync with the audio automatically.

Playing back a recording

After uploading, play the recording using the play_timeline command:

play_timeline my_animation

See Building a Client for the full command reference.