See Hugging Face’s activity on LinkedIn

CTO at Hugging Face

We need more knowledge sharing about running ML infrastructure at scale! Here's the mix of AWS instances we currently run our serverless Inference API on. For context, the Inference API is the infra service that powers the widgets on Hugging Face Hub model pages + PRO users and Enterprise orgs can use it programmatically. 64 g4dn.2xlarge 48 g5.12xlarge 48 g5.2xlarge 10 p4de.24xlarge 42 r6id.2xlarge 9 r7i.2xlarge 6 m6a.2xlarge (control plane and monitoring) ––– Total = 229 instances This is a thread for AI Infra aficionados 🤓 What mix of instances do you run?

118 Comments

Kingsley Uyi Idehen

Founder & CEO at OpenLink Software | Advancing Data Connectivity, Multi-Model Data Management, and AI Smart Agents | Unifying Disparate Data Silos via Open Standards (SQL, SPARQL, RDF, ODBC, JDBC, HTTP, GraphQL)

So you have something like a $500k -- $700k monthly bill for that setup?

6 Reactions

Mark Moyou, PhD

Sr. Data Scientist @NVIDIA | Host @ AI Portfolio Podcast + Caribbean Tech Pioneers Podcast | Director @Southern Data Science Conference

Really nice of you to post this. Would be cool to see how many instances of models you get from this deployment, throughputs and latencies.

7 Reactions

Ivan Dukic

Riding the AI wave with Localmind.ai | Co-Founder at Morgendigital | Passionate about crafting digital solutions with true value and impact

Due to data privacy concerns and high AWS costs, we self-host our instances. We use GPU servers with 4xGPU or 8xGPU configurations and host them in our datacenter in Innsbruck. ROI is much better, but with the caveat that scaling is not as convenient - classical dilemma of cloud vs. on-prem :)

15 Reactions

Mano Thanabalan

Driving a new paradigm in business systems' design with distributed systems

For my own use case, an investment finance super app, where one of its feature uses transformers to generate company constitutions and resolutions(combination of Mistral and Mixtral models), I run 4x 32C threadripper based machines in a home lab each with an Nvidia 4000 Rtx 4000 gpu all connected via a 10G SFP+ network with a docker swarm overlay. Best part, it only cost us an initial investment of 6 months of the equivalent AWS spend and is almost fire and forget.

12 Reactions

🚀 Adam Pavlát

Repeat until done { Design, Create, Enjoy, Improve }

Sum without prepaying or credits: $586,721/month https://www.perplexity.ai/search/find-prices-for-that-instances-nUX2pVsaSHGhYSvfeMKLyQ

22 Reactions

Darek Gajewski

Always learning new things

What are you running what on? This is a nice expensive list of servers. What I first take a look at what your overall resource foot print is. For example, the overall time you're running it for, versus your operational resources that need to be up all the time. Even your m6a control plane VMs seem excessively large. How many experiments do you run? How many note books are you managing? How often are customer models updated? How many customers are you servicing with this hardware? What's your cost to serve that model?