Version 0.2.0a1

Released December 21st, 2025

Added

Global, per-connection, and per-source configuration system for extensive customization (bitrate, channels, volume, etc.).
MEMORY/DISK buffer modes for audio playback. Details below.
VoiceClient now automatically cleans up when signalled by hikari (stopping/closing).

Changed

FFmpeg blocksize changed from default 8 KB to 32 KB to reduce I/O overhead and improve streaming stability.
~AudioSource implementations refactored and cleaned up.

Fixed

Added guard to internal voice gateway packet receiver in the case that the websocket doesn't exist anymore.

Audio Buffer System Implementation

To further hikari-wave's goal of being modern and scalable, we have decided to implement a custom audio buffering system.
This system, by configuration, will allow developers to decide how much RAM the audio system will use.
The system allows developers to set the buffer mode, either MEMORY (default) or DISK.
The MEMORY mode stores all pre-encoded audio frames (that play in the audio player) in an internal, in-memory (RAM) buffer.
The DISK mode stores all pre-encoded audio frames in files in the program's directory, under wavecache/<guild_id>/.
When using the DISK mode, developers can set how big they want these files using the buffer config in the Config object in the VoiceClient constructor, or the set_config for a desired VoiceConnection. The duration option decides how many seconds of frames get stored in each file.
When benchmarking with our newest FFmpeg process-pooling system, 100 active audio sessions would take ~250 MB of RAM (buffer audio alone). To further our scaling capability, and developer experience, we wanted a more efficient and cost-effective solution.
We understand most Discord voice bots will be private, so either they'll be self-hosted on personal computers or hosted on cheap VPSes/servers.
This gives the option to dwindle RAM usage down to 4-8 KB/audio source all the way up to 3-5 MB/audio source, depending on mode and implementation, with tradeoffs of RAM usage to disk space - for a grand total usage anywhere from 50 MB to 300-500 MB of total RAM usage, including FFmpeg processes.
The default profile is MEMORY, as it's expected behavior from most voice libraries, but developers have the easy option to configure otherwise.