Welcome to Fireworks AI

Hero Light

What we offer

Fireworks offers a variety of generative AI services. All services are pay-as-you-go with developer-friendly pricing.

  • Serverless models - Run different generative AI models on Fireworks-hosted infrastructure with our optimized FireAttention inference engine. This is the easiest way to get started. We’ve set up the hardware, so you only pay per token/image and don’t wait for boot-ups. We offer:
  • On-demand deployments - Run text models on our own, private GPU, and pay per second of GPU usage. This is a great option if you (a) Have high volume (b) Need guaranteed latency (c) Need models that aren’t offered on-demand (see blog overview)
  • Fine-tuning - Fine-tune text models to use either serverless or on-demand. Fireworks charges only for tokens used for tuning. There’s no charge for deploying fine-tuned models. Fireworks lets you deploy 100 fine-tuned models to be simultaneously ready for serverless or on-demand inference at 0 extra cost.