Audio Transcription¶
MiniBot can transcribe audio sent via Telegram using faster-whisper.
Setup¶
Install the
sttextra:poetry install --extras sttEnsure
ffmpegis available on the host.Enable file storage and transcription in
config.toml:
[tools.file_storage]
enabled = true
root_dir = "./data/files"
[tools.audio_transcription]
enabled = true
model = "small"
device = "auto"
compute_type = "int8"
beam_size = 5
vad_filter = true
Send an audio file to your Telegram bot as a document/file attachment (
.mp3,.wav,.m4a, etc.).Ask the bot to transcribe it: “transcribe this audio”.
Notes:
Telegram
voiceandaudiomessage types are ingested automatically, as well as file/document uploads.If
channels.telegram.allowed_document_mime_typesis set, include your audio MIME types.In the Docker yolo profile, Whisper model assets are downloaded lazily on first use and cached under
/app/data/.cache.
GPU Runtime Dependencies¶
If STT fails with Library libcublas.so.12 is not found, your CUDA runtime libraries
are missing from the loader path.
Debian / Ubuntu¶
sudo apt update
sudo apt install -y nvidia-driver nvidia-cuda-toolkit libcudnn9 libcudnn9-cuda-12
echo '/usr/local/cuda/lib64' | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig
ldconfig -p | grep libcublas.so.12
Arch / Manjaro¶
sudo pacman -Syu cuda cudnn
echo '/opt/cuda/lib64' | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig
ldconfig -p | grep libcublas.so.12
Alternative: CUDA runtime libs inside Poetry venv¶
Useful when system CUDA versions do not match your Python wheel:
poetry run pip install -U nvidia-cublas-cu12 nvidia-cudnn-cu12
export SP=$(poetry run python -c "import site; print(next(p for p in site.getsitepackages() if 'site-packages' in p))")
export LD_LIBRARY_PATH="$SP/nvidia/cublas/lib:$SP/nvidia/cudnn/lib:${LD_LIBRARY_PATH}"
Recommended STT config for GPU¶
[tools.audio_transcription]
device = "cuda"
compute_type = "float16"