Hugging Face’s Post

Hugging Face reposted this

View profile for Julien Chaumond, graphic

CTO at Hugging Face

We need more knowledge sharing about running ML infrastructure at scale! Here's the mix of AWS instances we currently run our serverless Inference API on. For context, the Inference API is the infra service that powers the widgets on Hugging Face Hub model pages + PRO users and Enterprise orgs can use it programmatically. 64 g4dn.2xlarge 48 g5.12xlarge 48 g5.2xlarge 10 p4de.24xlarge 42 r6id.2xlarge 9 r7i.2xlarge 6 m6a.2xlarge (control plane and monitoring) ––– Total = 229 instances This is a thread for AI Infra aficionados 🤓 What mix of instances do you run?

  • No alternative text description for this image
Kingsley Uyi Idehen

Founder & CEO at OpenLink Software | Advancing Data Connectivity, Multi-Model Data Management, and AI Smart Agents | Unifying Disparate Data Silos via Open Standards (SQL, SPARQL, RDF, ODBC, JDBC, HTTP, GraphQL)

1w

So you have something like a $500k -- $700k monthly bill for that setup?

Mark Moyou, PhD

Sr. Data Scientist @NVIDIA | Host @ AI Portfolio Podcast + Caribbean Tech Pioneers Podcast | Director @Southern Data Science Conference

1w

Really nice of you to post this. Would be cool to see how many instances of models you get from this deployment, throughputs and latencies.

Ivan Dukic

Riding the AI wave with Localmind.ai | Co-Founder at Morgendigital | Passionate about crafting digital solutions with true value and impact

1w

Due to data privacy concerns and high AWS costs, we self-host our instances. We use GPU servers with 4xGPU or 8xGPU configurations and host them in our datacenter in Innsbruck. ROI is much better, but with the caveat that scaling is not as convenient - classical dilemma of cloud vs. on-prem :)

Mano Thanabalan

Driving a new paradigm in business systems' design with distributed systems

1w

For my own use case, an investment finance super app, where one of its feature uses transformers to generate company constitutions and resolutions(combination of Mistral and Mixtral models), I run 4x 32C threadripper based machines in a home lab each with an Nvidia 4000 Rtx 4000 gpu all connected via a 10G SFP+ network with a docker swarm overlay. Best part, it only cost us an initial investment of 6 months of the equivalent AWS spend and is almost fire and forget.

🚀 Adam Pavlát

Repeat until done { Design, Create, Enjoy, Improve }

1w
Darek Gajewski

Always learning new things

1w

What are you running what on? This is a nice expensive list of servers. What I first take a look at what your overall resource foot print is. For example, the overall time you're running it for, versus your operational resources that need to be up all the time. Even your m6a control plane VMs seem excessively large. How many experiments do you run? How many note books are you managing? How often are customer models updated? How many customers are you servicing with this hardware? What's your cost to serve that model?

Heidy Daumas

Neuroscience based HMI Designer - Founder at V.RTU

1w

I’m trying to deploy my first one and it’s a terrible experience for now haha

Daniel Sautot

Chief Data Scientist @ AIris

1w

Elektra will be soon open source on Hugging Face. Devsecops and MLOps AI for free. Serverless included.

Kai Spriestersbach

Applied AI Researcher, Web Scientist (M.Sc.), Entrepreneur, Speaker, Consultant & SEO Veteran

1w

What is your AWS monthly bill?

See more comments

To view or add a comment, sign in

Explore topics