Introduction

Welcome to Fireworks AI

Fireworks offers a variety of generative AI services. All services are pay-as-you-go with developer-friendly pricing.

Serverless models - Run different generative AI models on Fireworks-hosted infrastructure with our optimized FireAttention inference engine. This is the easiest way to get started. We’ve set up the hardware, so you only pay per token/image and don’t wait for boot-ups. We offer:
- Text and multi-modal (image understanding) models - Beyond serving popular text models like Llama 3, we also provide features like enabling structured output on all our LLMs
- Image generation models
- Embedding models
On-demand deployments - Run text models on our own, private GPU, and pay per second of GPU usage. This is a great option if you (a) Have high volume (b) Need guaranteed latency (c) Need models that aren’t offered on-demand (see blog overview)
Fine-tuning - Fine-tune text models to use either serverless or on-demand. Fireworks charges only for tokens used for tuning. There’s no charge for deploying fine-tuned models. Fireworks lets you deploy 100 fine-tuned models to be simultaneously ready for serverless or on-demand inference at 0 extra cost.