forked from deep-diver/hf-daily-paper-newsletter
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b4faf6f
commit ac38967
Showing
12 changed files
with
128 additions
and
0 deletions.
There are no files selected for viewing
11 changes: 11 additions & 0 deletions
11
current/2024-01-18 Asynchronous Local-SGD Training for Language Modeling.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
date: "2024-01-18" | ||
author: Bo Liu | ||
title: Asynchronous Local-SGD Training for Language Modeling | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/wV15FTUGHbZ62WNxqwYgF.png | ||
link: https://huggingface.co/papers/2401.09135 | ||
summary: This paper presents an empirical study of asynchronous Local SGD for training language models. The study finds that asynchronous Local-SGD takes more iterations to converge than its synchronous counterpart, but proposes a novel method using a delayed Nesterov momentum update and adjusting worker training steps based on computation speed to match the performance of synchronous Local-SGD in terms of perplexity per update step and surpass it in terms of wall clock time.... | ||
opinion: placeholder | ||
tags: | ||
- Supervised Learning | ||
- Deep Learning | ||
- Natural Language Processing |
12 changes: 12 additions & 0 deletions
12
...01-18 Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
date: "2024-01-18" | ||
author: Jonghyun Lee | ||
title: 'Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/-yf5DxmvTrVDfbGeLFsZd.png | ||
link: https://huggingface.co/papers/2401.09048 | ||
summary: This paper introduces a conditional diffusion model that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. The model uses depth disentanglement training to identify the absolute positions of unseen objects and soft guidance to impose global semantics onto targeted regions. The integrated framework, Compose and Conquer, allows for the localization of multiple conditions in a disentangled manner... | ||
opinion: placeholder | ||
tags: | ||
- Computer Vision | ||
- Deep Learning | ||
- Explainable AI and Interpretability | ||
- Natural Language Processing |
10 changes: 10 additions & 0 deletions
10
...ed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
date: "2024-01-18" | ||
author: Connor Holmes | ||
title: 'DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/kHRcfLyUO_kQLUihKFoGL.png | ||
link: https://huggingface.co/papers/2401.08671 | ||
summary: This paper proposes DeepSpeed-FastGen, a system that uses dynamic prompt and generation composition to improve throughput and reduce latency for large language model deployment and scaling. DeepSpeed-FastGen leverages DeepSpeed-MII and DeepSpeed-Inference technology and supports non-persistent and persistent deployment options for a variety of models. The evaluations show significant improvements in throughput and latency across various models and hardware configurations, and the code is availab... | ||
opinion: placeholder | ||
tags: | ||
- Deep Learning | ||
- Natural Language Processing |
9 changes: 9 additions & 0 deletions
9
current/2024-01-18 GARField: Group Anything with Radiance Fields.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-01-18" | ||
author: Chung Min Kim | ||
title: 'GARField: Group Anything with Radiance Fields' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/v-0P4LwFSz1TA8CKoElRZ.webm | ||
link: https://huggingface.co/papers/2401.09419 | ||
summary: This paper introduces the Group Anything with Radiance Fields (GARField) approach for decomposing 3D scenes into semantically meaningful groups. GARField uses a scale-conditioned 3D affinity feature field to embrace group ambiguity and derive a hierarchy of possible groupings. The method produces higher fidelity groups than input SAM masks and has potential downstream applications, such as 3D asset extraction or dynamic scene understanding. A project website is available at <https://www.garfield... | ||
opinion: placeholder | ||
tags: | ||
- Computer Vision |
10 changes: 10 additions & 0 deletions
10
...24-01-18 ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
date: "2024-01-18" | ||
author: Weiyao Wang | ||
title: 'ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/h9IUGJ-Cp4Beq_v5JaZDt.png | ||
link: https://huggingface.co/papers/2401.08937 | ||
summary: This paper introduces ICON, an optimization procedure for training NeRFs from 2D video frames without using prior pose initialization. ICON estimates initial poses based on smooth camera motion and uses an adaptive measure of model quality called "confidence" to reweight gradients. It learns NeRF using high-confidence poses and 3D structure, leading to improved performance on CO3D and HO3D datasets compared to methods using SfM pose.... | ||
opinion: placeholder | ||
tags: | ||
- Computer Vision | ||
- Optimization and Learning Algorithms |
12 changes: 12 additions & 0 deletions
12
current/2024-01-18 ReFT: Reasoning with Reinforced Fine-Tuning.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
date: "2024-01-18" | ||
author: Trung Quoc Luong | ||
title: 'ReFT: Reasoning with Reinforced Fine-Tuning' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/bPp1x6IrywnkFJUhY7QLH.png | ||
link: https://huggingface.co/papers/2401.08967 | ||
summary: The paper proposes ReFT, a way to train LLMs for reasoning that uses supervised fine-tuning and reinforcement learning with a PPO algorithm. ReFT outperforms supervised fine-tuning on GSM8K, MathQA, and SVAMP datasets, and can further be boosted with inference-time strategies such as majority voting and re-ranking. It is effective in learning from the same training questions as supervised fine-tuning, indicating a superior generalization ability.... | ||
opinion: placeholder | ||
tags: | ||
- Supervised Learning | ||
- Reinforcement Learning | ||
- Deep Learning | ||
- Natural Language Processing |
10 changes: 10 additions & 0 deletions
10
...-18 SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
date: "2024-01-18" | ||
author: Baoxiong Jia | ||
title: 'SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/SpPYrZ75Uee17OWAQrm1q.png | ||
link: https://huggingface.co/papers/2401.09340 | ||
summary: This paper proposes a large-scale dataset, SceneVerse, and a unified learning framework, GPS, for grounding language in 3D scenes. SceneVerse contains 68K indoor scenes and 2.5M vision-language pairs, which allows for pre-training GPS. Extensive experiments demonstrate state-of-the-art performance on 3D visual grounding benchmarks and the potential of GPS for 3D vision-language tasks.... | ||
opinion: placeholder | ||
tags: | ||
- Computer Vision | ||
- Natural Language Processing |
10 changes: 10 additions & 0 deletions
10
...ng Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
date: "2024-01-18" | ||
author: Nanye Ma | ||
title: 'SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/hdPDnPoXo91-ivR-5d52c.png | ||
link: https://huggingface.co/papers/2401.08740 | ||
summary: This paper presents Scalable Interpolant Transformers (SiT), a family of generative models based on Diffusion Transformers (DiT), that allows for a more flexible way to connect two distributions. SiT surpasses DiT on the conditional ImageNet 256x256 benchmark and has an FID-50K score of 2.06 by exploring various diffusion coefficients that can be tuned separately from learning, making the model more efficient and flexible.... | ||
opinion: placeholder | ||
tags: | ||
- Unsupervised Learning | ||
- Deep Learning |
11 changes: 11 additions & 0 deletions
11
...1-18 TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
date: "2024-01-18" | ||
author: Yu-Ying Yeh | ||
title: 'TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/h7DGdKotisn9wcJciHHNX.mp4 | ||
link: https://huggingface.co/papers/2401.09416 | ||
summary: TextureDreamer is a novel image-guided texture synthesis method that uses geometry-aware diffusion models to transfer relightable textures from a few input images to target 3D shapes across different categories. Unlike traditional and learning-based methods, TextureDreamer can transfer complex textures from real-world environments to arbitrary objects, potentially democratizing texture creation. Its main technology is Personalized Geometry-aware Score Distillation (PGSD), which combines personal... | ||
opinion: placeholder | ||
tags: | ||
- Computer Vision | ||
- Deep Learning | ||
- Emerging Applications of Machine Learning |
13 changes: 13 additions & 0 deletions
13
current/2024-01-18 UniVG: Towards UNIfied-modal Video Generation.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
date: "2024-01-18" | ||
author: Ludan Ruan | ||
title: 'UniVG: Towards UNIfied-modal Video Generation' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/4FyqtWZf5J8vgDRF6xFK7.png | ||
link: https://huggingface.co/papers/2401.09084 | ||
summary: This paper proposes a unified video generation system called UniVG that can handle multiple video generation tasks across text and image modalities. The system employs Multi-condition Cross Attention for high-freedom video generation and introduces Biased Gaussian Noise for low-freedom video generation. UniVG achieves state-of-the-art results on the MSR-VTT benchmark and outperforms open-source methods in human evaluations.... | ||
opinion: placeholder | ||
tags: | ||
- Computer Vision | ||
- Deep Learning | ||
- Natural Language Processing | ||
- Generative Modeling | ||
- Video Generation |
10 changes: 10 additions & 0 deletions
10
...8 VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
date: "2024-01-18" | ||
author: Haoxin Chen | ||
title: 'VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/jgPqbzAZBFM7Anlxsjxpi.png | ||
link: https://huggingface.co/papers/2401.09047 | ||
summary: This paper proposes a method to overcome the limitations of large-scale, high-quality video datasets required for training text-to-video generation models by leveraging low-quality videos and synthesized high-quality images. The authors analyze the connection between spatial and temporal modules in video models and conduct experiments to demonstrate the superiority of their proposed method in picture quality, motion, and concept composition.... | ||
opinion: placeholder | ||
tags: | ||
- Deep Learning | ||
- Computer Vision |
10 changes: 10 additions & 0 deletions
10
...Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
date: "2024-01-18" | ||
author: Lianghui Zhu | ||
title: 'Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model' | ||
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/3CR9QVEIjbOr4I8mOvDy1.png | ||
link: https://huggingface.co/papers/2401.09417 | ||
summary: This paper proposes a new generic vision backbone called Vim that uses bidirectional Mamba blocks for efficient visual representation learning. Unlike existing vision transformers, Vim is capable of marking image sequences with position embeddings and compressing visual representation with bidirectional state space models. Vim is shown to achieve higher performance on ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks with significant improvement in computatio... | ||
opinion: placeholder | ||
tags: | ||
- Computer Vision | ||
- Deep Learning |