---
license: apache-2.0
tags:
- audio
- speech
- foundation-model
- next-token-prediction
- isoflop
- research
---

# Discrete Audio IsoFLOP Model (discrete-audio-isoflop-3e20-1.68B-d1920-L19-B128-a41e32) 

A suite of discrete audio models trained for our IsoFLOP study as part of **SODA**, which is a unified next-token prediction on interleaved semantic, acoustic, and text tokens.

🥤 **Project Page:** [https://soda-audio.github.io](https://soda-audio.github.io/)

For full usage instructions (e.g., inference code), and more information, please refer to the **[SODA-4B-base](https://huggingface.co/soda-research/soda-4b-base)** model card.

The details for this particular model is as follows:
- `compute_budget`: 3e20
- `param_count` (non-embedding): 1.68B
- `hidden_dim`: 1920
- `num_layers`: 19
- `batch_size`: 128
- `training_step`: 56131
- `hash_key`: a41e32

📈 **WandB**: https://wandb.ai/potsawee/marin/groups/IsoFlop/workspace