Argilla

Desarrollo de software

Madrid, MADRID 9026 seguidores

The Platform where experts improve AI models

Seguir

Ver los 16 empleados

Sobre nosotros

Build robust NLP products through faster data labeling and curation. Argilla empowers teams with the easiest to use human-in-the-loop and programmatic labelling features.

Sitio web: https://www.argilla.io
Enlace externo para Argilla
Sector: Desarrollo de software
Tamaño de la empresa: De 11 a 50 empleados
Sede: Madrid, MADRID
Tipo: Empresa propia
Fundación: 2017
Especialidades: NLP, artificial intelligence, Data science y Open Source

Productos

Argilla

Plataformas de etiquetado de datos

The feedback layer for enterprise LLMs Build robust language models with human and machine feedback. Argilla empowers data teams from fine-tuning and RLHF to continuous model improvement.

Ubicaciones

Principal

Calle de Vandergoten, 1

Madrid, MADRID 28005, ES

Cómo llegar
Moli Canyars, 7

Carpesa, Valencia 46132, ES

Cómo llegar

Empleados en Argilla

Ver todos los empleados

Actualizaciones

Argilla

9026 seguidores
1 mes
Denunciar esta publicación
Today is a huge day for open source AI: Argilla is joining Hugging Face 🤗 🚀 It's time to double down on community, good data for AI, product features, and open collaboration. We're thrilled to continue our path with the wonderful Argilla team and a broader team and vision, with shared values and culture! Thanks to our investors Zetta Venture Partners (James Alcorn), Criteria Venture Tech (Roma Jelinskaite, Albert Morro, Aleix Pérez), Eniac Ventures (Hadley Harris, Dan Jaeck, Monica Lim), and many others, so lucky to have worked with you! https://lnkd.in/dfxvgpsT

Argilla is joining Hugging Face 🤗

argilla.io

16 comentarios

Recomendar Comentar Compartir
Argilla

9026 seguidores
16 h
Denunciar esta publicación
ZenML has a new integration with Argilla 🤟 Synthethic data lovers, we will also discuss distilabel ⚗️ Alex S. will show you how to use it within our coming community meetup. Sara Han Díaz Lorenzo is laying the final hand on the PR for Argilla 2.0 support for the integration 😎
Hamza Tahir

Co-Founder @ ZenML
16 h

We finally have a ZenML and Argilla collab 😍 Building a data flywheel for your RAG applications is critical for the successful deployment of your LLM. In the latest Argilla community meetup, Alex S. will showcase how you can use synthetic data generated by distilabel to bootstrap embedding model fine-tuning and then use human feedback in Argilla to iteratively and continuously improve the model performance. Thank you Daniel Vila Suero and David Berenstein for the invite <3! You don't want to miss this! The event is on Thursday, August 8, 5 PM-6 PM GMT+2. Sign up for free here. 👉 👉👉https://lu.ma/4b5ick1e
1 comentario

Recomendar Comentar Compartir
Argilla ha compartido esto

Gabriel Martín Blázquez

ML Engineer @ Hugging Face 🤗
3 días
Denunciar esta publicación
Dropping magpie-ultra-v0.1, the first open synthetic dataset built with Llama 3.1 405B. Created with distilabel, it's our most advanced and compute-intensive pipeline to date. https://lnkd.in/ecXn_Gbi Almost two months ago, Magpie by University of Washington and Ai2 was released. It described a simple mechanism to generate instruction-response pairs with no system prompt or seed data, taking advantage of the autoregressive capabilities of the LLMs and the SFT fine-tuning made with a chat template. They released two new datasets: Magpie-Air generated with Llama 3 8B Instruct and Magpie-Pro generated with Llama 3 70B Instruct. As I mentioned before, no need of system prompt or seed data to generate the instruction-response pairs, as Magpie it's kind of a hack that allows to extract similar instruction-response pairs to the ones used during the SFT phase of an LLM. As you may know Argilla joined Hugging Face, and a few weeks later the new family of models Llama 3.1 by AI at Meta was released! It came with a big big model: Llama 3.1 405B. We saw this as an opportunity and decided to replicate the Magpie recipe with the chunky boy to create Magpie Ultra v0.1, the first public synthetic dataset created with Llama 3.1 405B. The dataset contains 50K unfiltered rows containing instruction-response pairs of different categories: Information seeking, Reasoning, Planning, Editing, Coding & Debugging, Math, Data analysis, Creative writing, Advice seeking, Brainstorming or Others. It contains all the columns needed to do a proper filtering assuring a leaner final dataset containing more difficult, high-quality, safe and diverse instructions. We will work these days to bring out a filtered version. The dataset can be used for SFT, but it can also be used RLAIF as we generated two responses: one with the instruct and one with the base model. Probably, and as described in the Llama 3 document, the models that will get the most out of it when fine-tuned on it, will be small models of course! You can explore the dataset in Argilla: https://lnkd.in/eK3dahJE I'm very excited about this dataset as I was able to make the GPUs of the science-cluster go brrrrrr It also helped me a lot to test the upcoming features of the distilabel 1.3.0, which will be released next Tuesday. Only thing I can say is that scaling synthetic dataset

argilla/magpie-ultra-v0.1 · Datasets at Hugging Face

huggingface.co

5 comentarios

Recomendar Comentar Compartir
Argilla ha compartido esto

Daniel Vila Suero

Building Argilla @ Hugging Face 🤗
4 días
Denunciar esta publicación
Play with Argilla 2.0 UI on Hugging Face 🚀 https://lnkd.in/eY-83EBe The new demo is full of nice datasets, thanks to the awesome David Berenstein!

2 comentarios

Recomendar Comentar Compartir
Argilla ha compartido esto

Daniel Vila Suero

Building Argilla @ Hugging Face 🤗
5 días
Denunciar esta publicación
🦙 Mixture of Llamas 🦙 Sharing a new synthetic data generation example with Llama-3.1 and distilabel! I'm sharing a new example of synthetic data generation on Colab, implementing the recent Mixture of Agents method with the recent Llama3.1 models. What's Mixture of Agents for LLMs? A new approach to leverage the collective strengths of multiple LLMs to produce better outputs as follows: 👩🎓 Several proposer LLMs generate outputs for a given input multiple times, improving responses by including previous outputs in the system prompt (70B and CodeLlama in the example) 👩🏫 An aggregator LLM combines these outputs into a high-quality final response (405B model in the example) Free Colab we made with Gabriel Martín Blázquez: https://lnkd.in/dmdVfegM
6 comentarios

Recomendar Comentar Compartir
Argilla

9026 seguidores
5 días Editado
Denunciar esta publicación
High quality data makes models go from their model to your model Read the full post announcing Argilla 2.0: https://lnkd.in/dGJJefHU

1 comentario

Recomendar Comentar Compartir
Argilla ha compartido esto

Gabriel Martín Blázquez

ML Engineer @ Hugging Face 🤗
5 días
Denunciar esta publicación
Argilla 2.0 is out! 🥳 During the last months the Argilla team has been working to create this new versions which unifies what were known as the "old datasets" (TextClassificationDataset, TokenClassificationDataset, etc) and the `FeedbackDataset`s in a new class called `Dataset`, and the new Python SDK which is super nice and super easy to use! The `Dataset` class comes with all the ingredients required for a good annotation job: - Highly configurable as it allows multiple fields and multiple questions displayed to the annotators. - Easy to filter using either metadata, semantic search or text search - and... this is 🆕... task distribution! This version has been shipped with the task distribution feature which allows defining how many annotations (annotator overlap) is required per record! This is only the first strategy, we will be adding more soon 🤗 If you don't know Argilla or you want to start working with it, then you can start today by creating a 🆓 space on Hugging Face: https://lnkd.in/d5K6weJF

Hugging Face – The AI community building the future.

huggingface.co

2 comentarios

Recomendar Comentar Compartir
Argilla ha compartido esto

Daniel Vila Suero

Building Argilla @ Hugging Face 🤗
5 días Editado
Denunciar esta publicación
🌐 Contribute to build a truly multilingual benchmark for LLMs in collab with Cohere For AI. Help review some MMLU translations in your language! 🔥 The progress - More than 22K contributions, ~150 contributors - 8 languages are completed (Russian, Hindi, Telugu, Arabic, Spanish, Korean, French, Ukrainian) If you have 10 min, join this Hugging Face Argilla Space and start reviewing translations in your language: https://lnkd.in/dTyHaPEF Vietnamese, Portuguese, Amharic, German, and Indonesian are almost completed, help complete them so they can be included in the benchmark. There are many other languages that need contributions too. Watch the progress for your language: https://lnkd.in/dqzg6KQq
19 comentarios

Recomendar Comentar Compartir
Argilla ha compartido esto

Ben Burtenshaw

Building Argilla @ 🤗 Hugging face
5 días Editado
Denunciar esta publicación
Since Argilla joined Hugging Face, I've mainly worked on making it crazy easy to improve datasets with feedback. This before and after highlights how the SDK contributes to this by natively integrating with packages like datasets. This makes it easy to get feedback on changing data. 👯♀️ Changes from before to now: - You don't need to define records all the time. Just align your dataset and feedback task, and Argilla will match fields to fields and questions to questions. - Argilla can use the identifiers in your dataset, so you don't need to define external ids. Later on, you can use this edit, delete, or change records. - You don't need to create new datasets for each version. Argilla can update only the records with actual changes. - Log is back! Seasoned users of Argilla will remember the log function from the early days. Because Argilla supports updating records, it now make sense again to use a dynamic log method to create or update records based on their IDs. 🛣 Why should I care? (Re-)Sharing datasets with a team of experts is a crucial step in the ML lifecycle. Good engineers will aim to do as much as possible. So, a good feedback SDK should make this super simple. 🐉 A lot has changed in the background to support this abstraction, so it’s refreshing to share the development succinctly like this. If you want details, check out this how-to-guide on managing records: https://lnkd.in/er-dfRuM #ai #llm #datasets #opensourceai
7 comentarios

Recomendar Comentar Compartir
Argilla ha compartido esto

Amélie Viallet

NLP Product design
6 días
Denunciar esta publicación
I can’t be more excited to announce Argilla 2.0 👻 What makes it even more special? We’re doing it with the Hugging Face team, which means: ✌️ More talent, more feedback, more learning and this is awesome for the future of Argilla! 👉 A real opportunity to say hello to the rest of the OSS AI community who haven't heard about Argilla before! So basically, to make short presentations: Argilla is designed for everyone and it’s for anyone who values data and wants high-quality AI projects. Big 💗 for the entire Argilla team who always focus on quality, listening the community needs, both thinking and building great things! https://lnkd.in/dGw54zKK

Argilla 2.0 is out

argilla.io

2 comentarios

Recomendar Comentar Compartir

Páginas similares

Buscar empleos

Financiación

Argilla 3 rondas en total

Última ronda

Semilla 10 dic 2023

5.500.000,00 US$

Inversores

Eniac Ventures + 2 Otros inversores

Ver más información en Crunchbase

Argilla

Desarrollo de software

Madrid, MADRID 9026 seguidores

The Platform where experts improve AI models

Sobre nosotros

Productos

Argilla

Plataformas de etiquetado de datos

Ubicaciones

Empleados en Argilla

Roma Jelinskaite

VC Investor | SaaS & DeepTech

Leire Aguirre Eguiluz

UI Developer

Natalia E.

Building Argilla @ Hugging Face | Computational Linguist | PhD

Agustín Piqueres Lajarín

ML Engineer @ Argilla (acquired by Hugging Face 🤗)

Actualizaciones

Únete para ver lo que te estás perdiendo

Páginas similares

Gradio

Hugging Face

Weaviate

Explosion

MantisNLP

LlamaIndex

LangChain

Mistral AI

Jina AI

deepset

Buscar empleos

Empleos de Investigador

Empleos de Analista

Empleos de Desarrollador

Empleos de Ingeniero de software

Empleos de Escritor

Empleos de Ingeniero

Empleos de Analista de datos

Financiación