Skip to content

Version 0.2.0a1

Released December 21st, 2025

Added

  • Global, per-connection, and per-source configuration system for extensive customization (bitrate, channels, volume, etc.).
  • MEMORY/DISK buffer modes for audio playback. Details below.
  • VoiceClient now automatically cleans up when signalled by hikari (stopping/closing).

Changed

  • FFmpeg blocksize changed from default 8 KB to 32 KB to reduce I/O overhead and improve streaming stability.
  • ~AudioSource implementations refactored and cleaned up.

Fixed

  • Added guard to internal voice gateway packet receiver in the case that the websocket doesn't exist anymore.

Audio Buffer System Implementation

  • To further hikari-wave's goal of being modern and scalable, we have decided to implement a custom audio buffering system.
  • This system, by configuration, will allow developers to decide how much RAM the audio system will use.
  • The system allows developers to set the buffer mode, either MEMORY (default) or DISK.
  • The MEMORY mode stores all pre-encoded audio frames (that play in the audio player) in an internal, in-memory (RAM) buffer.
  • The DISK mode stores all pre-encoded audio frames in files in the program's directory, under wavecache/<guild_id>/.
  • When using the DISK mode, developers can set how big they want these files using the buffer config in the Config object in the VoiceClient constructor, or the set_config for a desired VoiceConnection. The duration option decides how many seconds of frames get stored in each file.
  • When benchmarking with our newest FFmpeg process-pooling system, 100 active audio sessions would take ~250 MB of RAM (buffer audio alone). To further our scaling capability, and developer experience, we wanted a more efficient and cost-effective solution.
  • We understand most Discord voice bots will be private, so either they'll be self-hosted on personal computers or hosted on cheap VPSes/servers.
  • This gives the option to dwindle RAM usage down to 4-8 KB/audio source all the way up to 3-5 MB/audio source, depending on mode and implementation, with tradeoffs of RAM usage to disk space - for a grand total usage anywhere from 50 MB to 300-500 MB of total RAM usage, including FFmpeg processes.
  • The default profile is MEMORY, as it's expected behavior from most voice libraries, but developers have the easy option to configure otherwise.