What's next for PyTorch? We've made our roadmap public for 2024 🎉 Check it out here and let us know what you think: https://hubs.la/Q02G4Jnn0
Llama-3.1???? 😵
Skip to main content
What's next for PyTorch? We've made our roadmap public for 2024 🎉 Check it out here and let us know what you think: https://hubs.la/Q02G4Jnn0
Llama-3.1???? 😵
To view or add a comment, sign in
PolyBlocks-compiled PyTorch is now nearly 3x as fast as PyTorch (eager execution) and 2x as fast as even Torch Inductor on several transformer-based workloads! Seven such models from HuggingFace have been benchmarked here. PolyBlocks-generated code yields faster and fewer (more/better fused/tiled) GPU kernels and involves less overhead; it also doesn't use CUTLASS, CUBLAS, flash attention, or any cu* kernels or their variants. Everything is generated via compiler passes on MLIR. Academic/non-commercial users can try PolyBlocks out on its playground, while others can sign up for a license: https://lnkd.in/gzbrUAHC Let us know if you'd like us to benchmark workloads you are interested in! More info: https://lnkd.in/gPHqzSfa
To view or add a comment, sign in
⭐AI /GenAI, LLM, Prompt engineer⭐SAP BTP⭐SAP CPQ⭐SAP UI5⭐SAP Integration Suite(CPI)⭐SAP Kyma⭐SAP Datasphere⭐Azure Devops⭐Full Stack Developer ⭐Nodejs ⭐K8s ⭐
PyTorch Releases torchtune for Easily Fine-Tuning LLMs https://lnkd.in/djTsnC86
To view or add a comment, sign in
Published a new article on "How to train PyTorch Models on your local GPU and Deploying to Hugging Face". The article covers a beginner-friendly guide on how to set up PyTorch with CUDA support and deals with common issues while deploying the trained model to Hugging Face.
To view or add a comment, sign in
I wrote an article on PyTorch memory tuning - reducing GPU memory usage during inference and training. Along with some fundamental topics like mixed precision and inference mode, I also dig into some less well known features: activation checkpointing and the use of replacement optimizers from bitsandbytes. https://lnkd.in/dxtHSTJP
To view or add a comment, sign in
The community has adopted TensorFlow Lite as the key framework for On-Device Machine Learning. But did you know that PyTorch has its 'Mobile' version as well? I wanted to compare them from a software engineer's POV, so I implemented the same use case using 'TensorFlow Lite' and 'PyTorch Mobile' and compared them side by side. Find the results in this Medium post 😁 : https://lnkd.in/diiTUfaV
To view or add a comment, sign in
I wrote a tutorial on how to setup a Kubernetes Cluster on EKS with GPU nodes in order to deploy a Large Language Model like Llama-2. https://lnkd.in/dSf8EjMD
To view or add a comment, sign in
‘…fine-tuning a model is a complex, expensive process. It takes a lot of time, effort, and GPU computing. Finding the optimal hyperparameters and dealing with underfitting and overfitting models is hard. It's also difficult to find experienced people who know how to do it.’ Here is platform that offers no-code fine-tuning of open-source models.
To view or add a comment, sign in
cuML: Unleashing the Power of GPU Acceleration for standard machine learning algorithms ! 🚀💡 We've recently seen cuDF, a library that serves as a GPU accelerator for pandas. Let's now take a closer look at cuML, a library developed by the same team. cuML is a suite of GPU-accelerated algorithms designed by the brilliant minds at Nvidia RAPIDS! cuML transforms traditional tabular ML tasks by harnessing the speed and efficiency of GPU acceleration. Mirroring Sklearn's familiar API, cuML provides a seamless fit-predict-transform paradigm, eliminating the need for GPU programming. As datasets grow larger, cuML ensures optimal performance by enabling direct GPU-based compute tasks. For large datasets, cuML's GPU-based implementations showcase a staggering 10-50x faster completion than their CPU counterparts. Multi-GPU and multi-node-multi-GPU operations, powered by Dask, further expand cuML's capabilities across a diverse set of algorithms.
To view or add a comment, sign in
XGBoost & PyTotch releases 🎉🎉🎉 XGBoost 2.0 introduces a novel feature under development, focusing on vector-leaf tree models for multi-target regression, multi-label classification, and multi-class classification. Release notes 👉 https://lnkd.in/dCZ9Vuat Commentary 👉 https://lnkd.in/dGM3_DXN PyTorch 2.1 update offers automatic dynamic shape support in compiling and distributing checkpoints for parallelly saving and loading distributed training jobs on multiple ranks, alongside providing support for NumPy API. In addition to this, it has released the beta version updates to PyTorch domain libraries for TorchAudio and TorchVision. Lastly, the community has added support training and inference of Llama 2 models powered by AWS Inferentia2. This will make running Llama 2 models on PyTorch quicker, cheaper and more efficient. Release notes 👉 https://lnkd.in/dyU53yQe Commentary 👉 https://lnkd.in/dHM8vb83 #machinelearning #opensource
To view or add a comment, sign in
In the latest Engineering Blog post by Prodigy Education, Staff Site Reliability Engineer Erik Krieg takes a look at how to run a GPU-accelerated open-source Large Language Model (LLM) inference workload using Elastic Kubernetes Service (EKS). https://lnkd.in/geSPErxG #insideprodigy #largelanguagemodels #engineeringblog
To view or add a comment, sign in
Freelance Data Scientist | Open Source | Research
3wLove this! Looking forward to meet new ppl in the PR