F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

SWivid/F5-TTS 9 Oct 2024

This sampling strategy for flow step can be easily applied to existing flow matching based models without retraining.

Denoising Text to Speech

4,064
12.80 stars / hour

LightRAG: Simple and Fast Retrieval-Augmented Generation

hkuds/lightrag 8 Oct 2024

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs.

Information Retrieval RAG +1

3,035
6.66 stars / hour

Diffusion for World Modeling: Visual Details Matter in Atari

eloialonso/diamond 20 May 2024

Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model.

Image Generation reinforcement-learning +1

1,190
4.64 stars / hour

Pyramidal Flow Matching for Efficient Video Generative Modeling

jy0205/Pyramid-Flow 8 Oct 2024

Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage.

Text-to-Video Generation Video Generation

1,770
3.78 stars / hour

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

mit-han-lab/duo-attention 14 Oct 2024

Based on this insight, we introduce DuoAttention, a framework that only applies a full KV cache to retrieval heads while using a light-weight, constant-length KV cache for streaming heads, which reduces both LLM's decoding and pre-filling memory and latency without compromising its long-context abilities.

Quantization Retrieval

183
3.00 stars / hour

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

sihyun-yu/REPA 9 Oct 2024

Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods.

Denoising Image Generation +1

402
2.71 stars / hour

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

mit-han-lab/hart 14 Oct 2024

To address these challenges, we present the hybrid tokenizer, which decomposes the continuous latents from the autoencoder into two components: discrete tokens representing the big picture and continuous tokens representing the residual components that cannot be represented by the discrete tokens.

Image Generation Image Reconstruction

179
2.62 stars / hour

Baichuan-Omni Technical Report

westlake-baichuan-mllm/bc-omni 11 Oct 2024

The salient multimodal capabilities and interactive experience of GPT-4o highlight its critical role in practical applications, yet it lacks a high-performing open-source counterpart.

Language Modelling Large Language Model +1

163
2.01 stars / hour

LoLCATs: On Low-Rank Linearizing of Large Language Models

hazyresearch/lolcats 14 Oct 2024

When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3. 1 70B and 405B LLMs by 77. 8% and 78. 1% on 5-shot MMLU.

MMLU

139
1.79 stars / hour

Generalizable and Animatable Gaussian Head Avatar

xg-chu/gagavatar 10 Oct 2024

In this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction.

193
1.31 stars / hour