Argilla

Argilla

Desarrollo de software

Madrid, MADRID 9026 seguidores

The Platform where experts improve AI models

Sobre nosotros

Build robust NLP products through faster data labeling and curation. Argilla empowers teams with the easiest to use human-in-the-loop and programmatic labelling features.

Sitio web
https://www.argilla.io
Sector
Desarrollo de software
Tamaño de la empresa
De 11 a 50 empleados
Sede
Madrid, MADRID
Tipo
Empresa propia
Fundación
2017
Especialidades
NLP, artificial intelligence, Data science y Open Source

Productos

Ubicaciones

Empleados en Argilla

Actualizaciones

  • Ver la página de empresa de Argilla, gráfico

    9026 seguidores

    Today is a huge day for open source AI: Argilla is joining Hugging Face 🤗 🚀 It's time to double down on community, good data for AI, product features, and open collaboration. We're thrilled to continue our path with the wonderful Argilla team and a broader team and vision, with shared values and culture! Thanks to our investors Zetta Venture Partners (James Alcorn), Criteria Venture Tech (Roma Jelinskaite, Albert Morro, Aleix Pérez), Eniac Ventures (Hadley Harris, Dan Jaeck, Monica Lim), and many others, so lucky to have worked with you! https://lnkd.in/dfxvgpsT

    Argilla is joining Hugging Face 🤗

    Argilla is joining Hugging Face 🤗

    argilla.io

  • Ver la página de empresa de Argilla, gráfico

    9026 seguidores

    ZenML has a new integration with Argilla 🤟 Synthethic data lovers, we will also discuss distilabel ⚗️ Alex S. will show you how to use it within our coming community meetup. Sara Han Díaz Lorenzo is laying the final hand on the PR for Argilla 2.0 support for the integration 😎

    Ver el perfil de Hamza Tahir, gráfico

    Co-Founder @ ZenML

    We finally have a ZenML and Argilla collab 😍 Building a data flywheel for your RAG applications is critical for the successful deployment of your LLM. In the latest Argilla community meetup, Alex S. will showcase how you can use synthetic data generated by distilabel to bootstrap embedding model fine-tuning and then use human feedback in Argilla to iteratively and continuously improve the model performance. Thank you Daniel Vila Suero and David Berenstein for the invite <3! You don't want to miss this! The event is on Thursday, August 8, 5 PM-6 PM GMT+2. Sign up for free here. 👉 👉👉https://lu.ma/4b5ick1e

    • No hay descripción de texto alternativo para esta imagen
  • Argilla ha compartido esto

    Ver el perfil de Gabriel Martín Blázquez, gráfico

    ML Engineer @ Hugging Face 🤗

    Dropping magpie-ultra-v0.1, the first open synthetic dataset built with Llama 3.1 405B. Created with distilabel, it's our most advanced and compute-intensive pipeline to date. https://lnkd.in/ecXn_Gbi Almost two months ago, Magpie by University of Washington and Ai2 was released. It described a simple mechanism to generate instruction-response pairs with no system prompt or seed data, taking advantage of the autoregressive capabilities of the LLMs and the SFT fine-tuning made with a chat template. They released two new datasets: Magpie-Air generated with Llama 3 8B Instruct and Magpie-Pro generated with Llama 3 70B Instruct. As I mentioned before, no need of system prompt or seed data to generate the instruction-response pairs, as Magpie it's kind of a hack that allows to extract similar instruction-response pairs to the ones used during the SFT phase of an LLM. As you may know Argilla joined Hugging Face, and a few weeks later the new family of models Llama 3.1 by AI at Meta was released! It came with a big big model: Llama 3.1 405B. We saw this as an opportunity and decided to replicate the Magpie recipe with the chunky boy to create Magpie Ultra v0.1, the first public synthetic dataset created with Llama 3.1 405B. The dataset contains 50K unfiltered rows containing instruction-response pairs of different categories: Information seeking, Reasoning, Planning, Editing, Coding & Debugging, Math, Data analysis, Creative writing, Advice seeking, Brainstorming or Others. It contains all the columns needed to do a proper filtering assuring a leaner final dataset containing more difficult, high-quality, safe and diverse instructions. We will work these days to bring out a filtered version. The dataset can be used for SFT, but it can also be used RLAIF as we generated two responses: one with the instruct and one with the base model. Probably, and as described in the Llama 3 document, the models that will get the most out of it when fine-tuned on it, will be small models of course! You can explore the dataset in Argilla: https://lnkd.in/eK3dahJE I'm very excited about this dataset as I was able to make the GPUs of the science-cluster go brrrrrr It also helped me a lot to test the upcoming features of the distilabel 1.3.0, which will be released next Tuesday. Only thing I can say is that scaling synthetic dataset

    argilla/magpie-ultra-v0.1 · Datasets at Hugging Face

    argilla/magpie-ultra-v0.1 · Datasets at Hugging Face

    huggingface.co

  • Argilla ha compartido esto

    Ver el perfil de Daniel Vila Suero, gráfico

    Building Argilla @ Hugging Face 🤗

    🦙 Mixture of Llamas 🦙 Sharing a new synthetic data generation example with Llama-3.1 and distilabel! I'm sharing a new example of synthetic data generation on Colab, implementing the recent Mixture of Agents method with the recent Llama3.1 models. What's Mixture of Agents for LLMs? A new approach to leverage the collective strengths of multiple LLMs to produce better outputs as follows: 👩🎓 Several proposer LLMs generate outputs for a given input multiple times, improving responses by including previous outputs in the system prompt (70B and CodeLlama in the example) 👩🏫 An aggregator LLM combines these outputs into a high-quality final response (405B model in the example) Free Colab we made with Gabriel Martín Blázquez: https://lnkd.in/dmdVfegM

    • No hay descripción de texto alternativo para esta imagen
  • Argilla ha compartido esto

    Ver el perfil de Gabriel Martín Blázquez, gráfico

    ML Engineer @ Hugging Face 🤗

    Argilla 2.0 is out! 🥳 During the last months the Argilla team has been working to create this new versions which unifies what were known as the "old datasets" (TextClassificationDataset, TokenClassificationDataset, etc) and the `FeedbackDataset`s in a new class called `Dataset`, and the new Python SDK which is super nice and super easy to use! The `Dataset` class comes with all the ingredients required for a good annotation job: - Highly configurable as it allows multiple fields and multiple questions displayed to the annotators. - Easy to filter using either metadata, semantic search or text search - and... this is 🆕... task distribution! This version has been shipped with the task distribution feature which allows defining how many annotations (annotator overlap) is required per record! This is only the first strategy, we will be adding more soon 🤗 If you don't know Argilla or you want to start working with it, then you can start today by creating a 🆓 space on Hugging Face: https://lnkd.in/d5K6weJF

    Hugging Face – The AI community building the future.

    Hugging Face – The AI community building the future.

    huggingface.co

  • Argilla ha compartido esto

    Ver el perfil de Daniel Vila Suero, gráfico

    Building Argilla @ Hugging Face 🤗

    🌐 Contribute to build a truly multilingual benchmark for LLMs in collab with Cohere For AI. Help review some MMLU translations in your language! 🔥 The progress - More than 22K contributions, ~150 contributors - 8 languages are completed (Russian, Hindi, Telugu, Arabic, Spanish, Korean, French, Ukrainian) If you have 10 min, join this Hugging Face Argilla Space and start reviewing translations in your language: https://lnkd.in/dTyHaPEF Vietnamese, Portuguese, Amharic, German, and Indonesian are almost completed, help complete them so they can be included in the benchmark. There are many other languages that need contributions too. Watch the progress for your language: https://lnkd.in/dqzg6KQq

    • No hay descripción de texto alternativo para esta imagen
  • Argilla ha compartido esto

    Ver el perfil de Ben Burtenshaw, gráfico

    Building Argilla @ 🤗 Hugging face

    Since Argilla joined Hugging Face, I've mainly worked on making it crazy easy to improve datasets with feedback. This before and after highlights how the SDK contributes to this by natively integrating with packages like datasets. This makes it easy to get feedback on changing data. 👯♀️ Changes from before to now: - You don't need to define records all the time. Just align your dataset and feedback task, and Argilla will match fields to fields and questions to questions. - Argilla can use the identifiers in your dataset, so you don't need to define external ids. Later on, you can use this edit, delete, or change records. - You don't need to create new datasets for each version. Argilla can update only the records with actual changes. - Log is back! Seasoned users of Argilla will remember the log function from the early days. Because Argilla supports updating records, it now make sense again to use a dynamic log method to create or update records based on their IDs. 🛣 Why should I care? (Re-)Sharing datasets with a team of experts is a crucial step in the ML lifecycle. Good engineers will aim to do as much as possible. So, a good feedback SDK should make this super simple. 🐉 A lot has changed in the background to support this abstraction, so it’s refreshing to share the development succinctly like this. If you want details, check out this how-to-guide on managing records: https://lnkd.in/er-dfRuM #ai #llm #datasets #opensourceai

    • No hay descripción de texto alternativo para esta imagen
    • No hay descripción de texto alternativo para esta imagen
  • Argilla ha compartido esto

    Ver el perfil de Amélie Viallet, gráfico

    NLP Product design

    I can’t be more excited to announce Argilla 2.0 👻 What makes it even more special? We’re doing it with the Hugging Face team, which means: ✌️ More talent, more feedback, more learning and this is awesome for the future of Argilla! 👉 A real opportunity to say hello to the rest of the OSS AI community who haven't heard about Argilla before! So basically, to make short presentations: Argilla is designed for everyone and it’s for anyone who values data and wants high-quality AI projects. Big 💗 for the entire Argilla team who always focus on quality, listening the community needs, both thinking and building great things! https://lnkd.in/dGw54zKK

    Argilla 2.0 is out

    Argilla 2.0 is out

    argilla.io

Páginas similares

Buscar empleos

Financiación

Argilla 3 rondas en total

Última ronda

Semilla

5.500.000,00 US$

Ver más información en Crunchbase