Skip to content

Whisper Demo App #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Whisper Demo App #55

wants to merge 1 commit into from

Conversation

rohansjoshi
Copy link

Demo app showcasing Whisper running on-device

Both audio processing (STFT+Mel filterbank) and the actual Whisper model are exported to .pte files with ExecuTorch, and run in the app

The app does the following:

  1. Press button, record audio
  2. Save input as single-channel PCM
  3. Convert and save to .wav file (manually write .wav header), with 2 bytes per sample
  4. Open .wav file, read byte array two bytes at a time and convert to float array (little endian is convention for .wav)
  5. Convert float array to Tensor using ET Tensor binding, pass it through Module which wraps audio processing .pte
  6. Since the Qualcomm Whisper Runner reads raw bytes as input (maybe this should be changed), we convert the output to a byte array. Make sure it is in little endian order (this is the runner's convention)
  7. Pass array into WhisperModule which wraps the runner which runs the actual Whisper model .pte (encoder+decoder)

To build the app, you need to

  1. Export both .pte files using scripts in ExecuTorch. For audio preprocessing run extension/audio/mel_spectrogram. For QNN whisper, run the script examples/qualcomm/oss_scripts/whisper. Move both pte files to /data/local/tmp/whisper on device.
  2. Checkout Whisper JNI bindings PR, build the Executorch Android library with Qualcomm backend, and save executorch.aar
  3. Copy executorch.aar into app/libs
  4. Build the app in Android studio

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 22, 2025

private fun runWhisper() {
// The entire audio flow:
val wavFile = File(getExternalFilesDir(null), "audio_record.wav") // do this better
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we bundle the wav file with the apk?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the writeWavFile function the wav file is written, runWhisper runs afterward assuming it already exists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants