📣 New Text to Video research released: VADER - Adapts large-scale T2V models to specific downstream tasks - Uses pre-trained reward models, instead of supervised ft - The approach enables: Aesthetic video generation, Better alignment between text & videos, 3 times longer videos than training lengths🤯 ✨ VADER or Video Diffusion Alignment via Reward Gradients Model on @huggingface Hub: https://lnkd.in/gfs_E9FX Code: https://lnkd.in/gRW_vBFV We welcome community Gradio implementations for Vader-Videocrafter, Vader-Open-Sora, and Vader-Modelscope. You can request for GPU grants for your demo submissions. Start today: Gradio.dev
Gradio’s Post
More Relevant Posts
-
We have our round # 2 of the swag-give-away + fun challenge from Intel! The grand prize is an Intel NUC + A770m! Here is what you need to do. 1. Visit the site below. 2. Try our segmentation notebooks (for advanced user: try SAM too). 3. Create your own with OpenVINO 4. Submit your work on github! Our grand prize NUC is pretty maxed out spec and you can use it for AI workload including LLMs too! Link: https://lnkd.in/gTz2XDXm Thanks Kevin Hartman <3 #iamintel
To view or add a comment, sign in
-
Award-Winning Marketer, Podcast Host, Event MC, TV Host, Online Moderator, Virtual Emcee, Keynote Speaker, Meeting Facilitator and Journalist. My motto is ABC Always Be Connecting ⭐️
For our next talk at Ultralytics #YV23 #YoloVision, Soumik Rakshit, Machine Learning Engineer at Weights & Biases, gets bonus points for rocking a SpongeBob SquarePants background 🧽 Soumik holds the esteemed title of Google Developer Expert in #JAX and his expertise extends to open-source computer vision projects, with a focus on generative computing, image restoration, and computer graphics!
To view or add a comment, sign in
-
Check out this week's episode of #ElixirMix with Jonatan Kłosko #𝗘𝗠𝘅: Elixir, LiveBook, and NX: Innovations in Machine Learning Training and GPU Integration https://lnkd.in/gtzWNdk5
To view or add a comment, sign in
-
Check out this week's episode of #ElixirMix with Jonatan Kłosko #𝗘𝗠𝘅: Elixir, LiveBook, and NX: Innovations in Machine Learning Training and GPU Integration https://lnkd.in/gtzWNdk5
To view or add a comment, sign in
-
Computer Vision Engineer @Ultralytics | Solving Real-World Challenges🔎| Python | Published Research | Open Source Contributor | GitHub 🌟 | Daily Computer Vision LinkedIn Content 🚀 | Technical Writer VisionAI @Medium📝
YOLOv10 vs Ultralytics YOLOv8 🔥⚽ 🦊 YOLOv10 offers an NMS-free object detection. Is this true? I explored it and performed tests on the visual data shown in the demo, highlighting algorithms' capabilities to tackle objects. Experiment findings 😍 🦁 The postprocessing time of YOLOv10 is better as compared to YOLOv8, approximately reaching 0.0x seconds, because its NMS-free architecture. 🦁 The inference and preprocessing time of YOLOv8 is better as compared to YOLOv10, which means YOLOv8 is the successor in real-time processing. 🦁 Overall, the inference speed of YOLOv8 is still state of the art, while YOLOv10 become a leader in post-processing speed. So ideally this can be considered a minor release instead of a major one. 🚀 Regarding visual experiments, I notice YOLOv10 suffers a lot, especially in zoom-out, zoom-in, and when objects are far from the camera. Some outcomes are shared in the comments below 👇 Learn more ➡ https://lnkd.in/dsp8i4Mr #computervision #objectdetection #yolov10 #experimentation #researchanddevelopment
To view or add a comment, sign in
-
Nvidia released BigVGAN v2! 🎧 > Custom CUDA kernel for inference: w/ fused upsampling + activation kernel upto 3x faster inference on A100 > Improved discriminator and loss: using a multi-scale sub-band CQT discriminator and a multi-scale mel spectrogram loss > Larger training data: trained using datasets containing diverse audio types, including speech in multiple languages, environmental sounds, and instruments. > Permissive pre-trained model checkpoints supporting up to 44 kHz sampling rate and 512x upsampling ratio > Models & Demo on the Hub 🤗
To view or add a comment, sign in
-
https://lnkd.in/dJ53_RYC Interesting presentation this week by DeepLearning.AI on how to reduce the memory footprint needed to fine-tune a 7 billion parameters llama LLM model so that the fine-tuning process fits in a single 16 GB GPU. Techniques explored are quantization, lora, qlora, gradient accumulation. Includes hands-on lab with example jupyter notebook.
Efficient Fine-Tuning for Llama-v2-7b on a Single GPU
https://www.youtube.com/
To view or add a comment, sign in
-
🚀 LLMs & NLP Innovator | AI & Big Data Engineering Leader | Python Back-end Expert | 15+ Years in Tech | Speaker & Mentor
#Nvidia's #BigVGAN v2, highlighting its enhanced features for audio processing. It details improvements such as a custom #CUDA kernel for faster inference on A100 #GPUs, an improved discriminator and loss function for better audio quality, and the inclusion of diverse audio types in training data. Additionally, it mentions the availability of permissive pre-trained model checkpoints that support higher sampling rates and upsampling ratios. #mldk #mldktech #mldkgenai https://mldk.tech
Nvidia released BigVGAN v2! 🎧 > Custom CUDA kernel for inference: w/ fused upsampling + activation kernel upto 3x faster inference on A100 > Improved discriminator and loss: using a multi-scale sub-band CQT discriminator and a multi-scale mel spectrogram loss > Larger training data: trained using datasets containing diverse audio types, including speech in multiple languages, environmental sounds, and instruments. > Permissive pre-trained model checkpoints supporting up to 44 kHz sampling rate and 512x upsampling ratio > Models & Demo on the Hub 🤗
To view or add a comment, sign in
26,524 followers