Hugging Face reposted this
We need more knowledge sharing about running ML infrastructure at scale! Here's the mix of AWS instances we currently run our serverless Inference API on. For context, the Inference API is the infra service that powers the widgets on Hugging Face Hub model pages + PRO users and Enterprise orgs can use it programmatically. 64 g4dn.2xlarge 48 g5.12xlarge 48 g5.2xlarge 10 p4de.24xlarge 42 r6id.2xlarge 9 r7i.2xlarge 6 m6a.2xlarge (control plane and monitoring) ––– Total = 229 instances This is a thread for AI Infra aficionados 🤓 What mix of instances do you run?
Really nice of you to post this. Would be cool to see how many instances of models you get from this deployment, throughputs and latencies.
Due to data privacy concerns and high AWS costs, we self-host our instances. We use GPU servers with 4xGPU or 8xGPU configurations and host them in our datacenter in Innsbruck. ROI is much better, but with the caveat that scaling is not as convenient - classical dilemma of cloud vs. on-prem :)
For my own use case, an investment finance super app, where one of its feature uses transformers to generate company constitutions and resolutions(combination of Mistral and Mixtral models), I run 4x 32C threadripper based machines in a home lab each with an Nvidia 4000 Rtx 4000 gpu all connected via a 10G SFP+ network with a docker swarm overlay. Best part, it only cost us an initial investment of 6 months of the equivalent AWS spend and is almost fire and forget.
Sum without prepaying or credits: $586,721/month https://www.perplexity.ai/search/find-prices-for-that-instances-nUX2pVsaSHGhYSvfeMKLyQ
What are you running what on? This is a nice expensive list of servers. What I first take a look at what your overall resource foot print is. For example, the overall time you're running it for, versus your operational resources that need to be up all the time. Even your m6a control plane VMs seem excessively large. How many experiments do you run? How many note books are you managing? How often are customer models updated? How many customers are you servicing with this hardware? What's your cost to serve that model?
I’m trying to deploy my first one and it’s a terrible experience for now haha
Elektra will be soon open source on Hugging Face. Devsecops and MLOps AI for free. Serverless included.
What is your AWS monthly bill?
Founder & CEO at OpenLink Software | Advancing Data Connectivity, Multi-Model Data Management, and AI Smart Agents | Unifying Disparate Data Silos via Open Standards (SQL, SPARQL, RDF, ODBC, JDBC, HTTP, GraphQL)
1wSo you have something like a $500k -- $700k monthly bill for that setup?