Automated report

codingpot · Jan 20, 2024 · ac38967 · ac38967
1 parent b4faf6f
commit ac38967
Show file tree

Hide file tree

Showing 12 changed files with 128 additions and 0 deletions.
diff --git a/current/2024-01-18 Asynchronous Local-SGD Training for Language Modeling.yaml b/current/2024-01-18 Asynchronous Local-SGD Training for Language Modeling.yaml
@@ -0,0 +1,11 @@
+date: "2024-01-18"
+author: Bo Liu
+title: Asynchronous Local-SGD Training for Language Modeling
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/wV15FTUGHbZ62WNxqwYgF.png
+link: https://huggingface.co/papers/2401.09135
+summary: This paper presents an empirical study of asynchronous Local SGD for training language models. The study finds that asynchronous Local-SGD takes more iterations to converge than its synchronous counterpart, but proposes a novel method using a delayed Nesterov momentum update and adjusting worker training steps based on computation speed to match the performance of synchronous Local-SGD in terms of perplexity per update step and surpass it in terms of wall clock time....
+opinion: placeholder
+tags:
+    - Supervised Learning
+    - Deep Learning
+    - Natural Language Processing
diff --git a/...01-18 Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis.yaml b/...01-18 Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis.yaml
@@ -0,0 +1,12 @@
+date: "2024-01-18"
+author: Jonghyun Lee
+title: 'Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/-yf5DxmvTrVDfbGeLFsZd.png
+link: https://huggingface.co/papers/2401.09048
+summary: This paper introduces a conditional diffusion model that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. The model uses depth disentanglement training to identify the absolute positions of unseen objects and soft guidance to impose global semantics onto targeted regions. The integrated framework, Compose and Conquer, allows for the localization of multiple conditions in a disentangled manner...
+opinion: placeholder
+tags:
+    - Computer Vision
+    - Deep Learning
+    - Explainable AI and Interpretability
+    - Natural Language Processing
diff --git a/...ed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference.yaml b/...ed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference.yaml
@@ -0,0 +1,10 @@
+date: "2024-01-18"
+author: Connor Holmes
+title: 'DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/kHRcfLyUO_kQLUihKFoGL.png
+link: https://huggingface.co/papers/2401.08671
+summary: This paper proposes DeepSpeed-FastGen, a system that uses dynamic prompt and generation composition to improve throughput and reduce latency for large language model deployment and scaling. DeepSpeed-FastGen leverages DeepSpeed-MII and DeepSpeed-Inference technology and supports non-persistent and persistent deployment options for a variety of models. The evaluations show significant improvements in throughput and latency across various models and hardware configurations, and the code is availab...
+opinion: placeholder
+tags:
+    - Deep Learning
+    - Natural Language Processing
diff --git a/current/2024-01-18 GARField: Group Anything with Radiance Fields.yaml b/current/2024-01-18 GARField: Group Anything with Radiance Fields.yaml
@@ -0,0 +1,9 @@
+date: "2024-01-18"
+author: Chung Min Kim
+title: 'GARField: Group Anything with Radiance Fields'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/v-0P4LwFSz1TA8CKoElRZ.webm
+link: https://huggingface.co/papers/2401.09419
+summary: This paper introduces the Group Anything with Radiance Fields (GARField) approach for decomposing 3D scenes into semantically meaningful groups. GARField uses a scale-conditioned 3D affinity feature field to embrace group ambiguity and derive a hierarchy of possible groupings. The method produces higher fidelity groups than input SAM masks and has potential downstream applications, such as 3D asset extraction or dynamic scene understanding. A project website is available at <https://www.garfield...
+opinion: placeholder
+tags:
+    - Computer Vision
diff --git a/...24-01-18 ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization.yaml b/...24-01-18 ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization.yaml
@@ -0,0 +1,10 @@
+date: "2024-01-18"
+author: Weiyao Wang
+title: 'ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/h9IUGJ-Cp4Beq_v5JaZDt.png
+link: https://huggingface.co/papers/2401.08937
+summary: This paper introduces ICON, an optimization procedure for training NeRFs from 2D video frames without using prior pose initialization. ICON estimates initial poses based on smooth camera motion and uses an adaptive measure of model quality called "confidence" to reweight gradients. It learns NeRF using high-confidence poses and 3D structure, leading to improved performance on CO3D and HO3D datasets compared to methods using SfM pose....
+opinion: placeholder
+tags:
+    - Computer Vision
+    - Optimization and Learning Algorithms
diff --git a/current/2024-01-18 ReFT: Reasoning with Reinforced Fine-Tuning.yaml b/current/2024-01-18 ReFT: Reasoning with Reinforced Fine-Tuning.yaml
@@ -0,0 +1,12 @@
+date: "2024-01-18"
+author: Trung Quoc Luong
+title: 'ReFT: Reasoning with Reinforced Fine-Tuning'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/bPp1x6IrywnkFJUhY7QLH.png
+link: https://huggingface.co/papers/2401.08967
+summary: The paper proposes ReFT, a way to train LLMs for reasoning that uses supervised fine-tuning and reinforcement learning with a PPO algorithm. ReFT outperforms supervised fine-tuning on GSM8K, MathQA, and SVAMP datasets, and can further be boosted with inference-time strategies such as majority voting and re-ranking. It is effective in learning from the same training questions as supervised fine-tuning, indicating a superior generalization ability....
+opinion: placeholder
+tags:
+    - Supervised Learning
+    - Reinforcement Learning
+    - Deep Learning
+    - Natural Language Processing
diff --git a/...-18 SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding.yaml b/...-18 SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding.yaml
@@ -0,0 +1,10 @@
+date: "2024-01-18"
+author: Baoxiong Jia
+title: 'SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/SpPYrZ75Uee17OWAQrm1q.png
+link: https://huggingface.co/papers/2401.09340
+summary: This paper proposes a large-scale dataset, SceneVerse, and a unified learning framework, GPS, for grounding language in 3D scenes. SceneVerse contains 68K indoor scenes and 2.5M vision-language pairs, which allows for pre-training GPS. Extensive experiments demonstrate state-of-the-art performance on 3D visual grounding benchmarks and the potential of GPS for 3D vision-language tasks....
+opinion: placeholder
+tags:
+    - Computer Vision
+    - Natural Language Processing
diff --git a/...ng Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers.yaml b/...ng Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers.yaml
@@ -0,0 +1,10 @@
+date: "2024-01-18"
+author: Nanye Ma
+title: 'SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/hdPDnPoXo91-ivR-5d52c.png
+link: https://huggingface.co/papers/2401.08740
+summary: This paper presents Scalable Interpolant Transformers (SiT), a family of generative models based on Diffusion Transformers (DiT), that allows for a more flexible way to connect two distributions. SiT surpasses DiT on the conditional ImageNet 256x256 benchmark and has an FID-50K score of 2.06 by exploring various diffusion coefficients that can be tuned separately from learning, making the model more efficient and flexible....
+opinion: placeholder
+tags:
+    - Unsupervised Learning
+    - Deep Learning
diff --git a/...1-18 TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion.yaml b/...1-18 TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion.yaml
@@ -0,0 +1,11 @@
+date: "2024-01-18"
+author: Yu-Ying Yeh
+title: 'TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/h7DGdKotisn9wcJciHHNX.mp4
+link: https://huggingface.co/papers/2401.09416
+summary: TextureDreamer is a novel image-guided texture synthesis method that uses geometry-aware diffusion models to transfer relightable textures from a few input images to target 3D shapes across different categories. Unlike traditional and learning-based methods, TextureDreamer can transfer complex textures from real-world environments to arbitrary objects, potentially democratizing texture creation. Its main technology is Personalized Geometry-aware Score Distillation (PGSD), which combines personal...
+opinion: placeholder
+tags:
+    - Computer Vision
+    - Deep Learning
+    - Emerging Applications of Machine Learning
diff --git a/current/2024-01-18 UniVG: Towards UNIfied-modal Video Generation.yaml b/current/2024-01-18 UniVG: Towards UNIfied-modal Video Generation.yaml
@@ -0,0 +1,13 @@
+date: "2024-01-18"
+author: Ludan Ruan
+title: 'UniVG: Towards UNIfied-modal Video Generation'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/4FyqtWZf5J8vgDRF6xFK7.png
+link: https://huggingface.co/papers/2401.09084
+summary: This paper proposes a unified video generation system called UniVG that can handle multiple video generation tasks across text and image modalities. The system employs Multi-condition Cross Attention for high-freedom video generation and introduces Biased Gaussian Noise for low-freedom video generation. UniVG achieves state-of-the-art results on the MSR-VTT benchmark and outperforms open-source methods in human evaluations....
+opinion: placeholder
+tags:
+    - Computer Vision
+    - Deep Learning
+    - Natural Language Processing
+    - Generative Modeling
+    - Video Generation
diff --git a/...8 VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models.yaml b/...8 VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models.yaml
@@ -0,0 +1,10 @@
+date: "2024-01-18"
+author: Haoxin Chen
+title: 'VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/jgPqbzAZBFM7Anlxsjxpi.png
+link: https://huggingface.co/papers/2401.09047
+summary: This paper proposes a method to overcome the limitations of large-scale, high-quality video datasets required for training text-to-video generation models by leveraging low-quality videos and synthesized high-quality images. The authors analyze the connection between spatial and temporal modules in video models and conduct experiments to demonstrate the superiority of their proposed method in picture quality, motion, and concept composition....
+opinion: placeholder
+tags:
+    - Deep Learning
+    - Computer Vision
diff --git a/...Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model.yaml b/...Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model.yaml
@@ -0,0 +1,10 @@
+date: "2024-01-18"
+author: Lianghui Zhu
+title: 'Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model'
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/3CR9QVEIjbOr4I8mOvDy1.png
+link: https://huggingface.co/papers/2401.09417
+summary: This paper proposes a new generic vision backbone called Vim that uses bidirectional Mamba blocks for efficient visual representation learning. Unlike existing vision transformers, Vim is capable of marking image sequences with position embeddings and compressing visual representation with bidirectional state space models. Vim is shown to achieve higher performance on ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks with significant improvement in computatio...
+opinion: placeholder
+tags:
+    - Computer Vision
+    - Deep Learning