Skip to content

Commit

Permalink
Automated report
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-diver committed Jan 20, 2024
1 parent b4faf6f commit ac38967
Show file tree
Hide file tree
Showing 12 changed files with 128 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
date: "2024-01-18"
author: Bo Liu
title: Asynchronous Local-SGD Training for Language Modeling
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/wV15FTUGHbZ62WNxqwYgF.png
link: https://huggingface.co/papers/2401.09135
summary: This paper presents an empirical study of asynchronous Local SGD for training language models. The study finds that asynchronous Local-SGD takes more iterations to converge than its synchronous counterpart, but proposes a novel method using a delayed Nesterov momentum update and adjusting worker training steps based on computation speed to match the performance of synchronous Local-SGD in terms of perplexity per update step and surpass it in terms of wall clock time....
opinion: placeholder
tags:
- Supervised Learning
- Deep Learning
- Natural Language Processing
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
date: "2024-01-18"
author: Jonghyun Lee
title: 'Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/-yf5DxmvTrVDfbGeLFsZd.png
link: https://huggingface.co/papers/2401.09048
summary: This paper introduces a conditional diffusion model that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. The model uses depth disentanglement training to identify the absolute positions of unseen objects and soft guidance to impose global semantics onto targeted regions. The integrated framework, Compose and Conquer, allows for the localization of multiple conditions in a disentangled manner...
opinion: placeholder
tags:
- Computer Vision
- Deep Learning
- Explainable AI and Interpretability
- Natural Language Processing
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
date: "2024-01-18"
author: Connor Holmes
title: 'DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/kHRcfLyUO_kQLUihKFoGL.png
link: https://huggingface.co/papers/2401.08671
summary: This paper proposes DeepSpeed-FastGen, a system that uses dynamic prompt and generation composition to improve throughput and reduce latency for large language model deployment and scaling. DeepSpeed-FastGen leverages DeepSpeed-MII and DeepSpeed-Inference technology and supports non-persistent and persistent deployment options for a variety of models. The evaluations show significant improvements in throughput and latency across various models and hardware configurations, and the code is availab...
opinion: placeholder
tags:
- Deep Learning
- Natural Language Processing
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-01-18"
author: Chung Min Kim
title: 'GARField: Group Anything with Radiance Fields'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/v-0P4LwFSz1TA8CKoElRZ.webm
link: https://huggingface.co/papers/2401.09419
summary: This paper introduces the Group Anything with Radiance Fields (GARField) approach for decomposing 3D scenes into semantically meaningful groups. GARField uses a scale-conditioned 3D affinity feature field to embrace group ambiguity and derive a hierarchy of possible groupings. The method produces higher fidelity groups than input SAM masks and has potential downstream applications, such as 3D asset extraction or dynamic scene understanding. A project website is available at <https://www.garfield...
opinion: placeholder
tags:
- Computer Vision
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
date: "2024-01-18"
author: Weiyao Wang
title: 'ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/h9IUGJ-Cp4Beq_v5JaZDt.png
link: https://huggingface.co/papers/2401.08937
summary: This paper introduces ICON, an optimization procedure for training NeRFs from 2D video frames without using prior pose initialization. ICON estimates initial poses based on smooth camera motion and uses an adaptive measure of model quality called "confidence" to reweight gradients. It learns NeRF using high-confidence poses and 3D structure, leading to improved performance on CO3D and HO3D datasets compared to methods using SfM pose....
opinion: placeholder
tags:
- Computer Vision
- Optimization and Learning Algorithms
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
date: "2024-01-18"
author: Trung Quoc Luong
title: 'ReFT: Reasoning with Reinforced Fine-Tuning'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/bPp1x6IrywnkFJUhY7QLH.png
link: https://huggingface.co/papers/2401.08967
summary: The paper proposes ReFT, a way to train LLMs for reasoning that uses supervised fine-tuning and reinforcement learning with a PPO algorithm. ReFT outperforms supervised fine-tuning on GSM8K, MathQA, and SVAMP datasets, and can further be boosted with inference-time strategies such as majority voting and re-ranking. It is effective in learning from the same training questions as supervised fine-tuning, indicating a superior generalization ability....
opinion: placeholder
tags:
- Supervised Learning
- Reinforcement Learning
- Deep Learning
- Natural Language Processing
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
date: "2024-01-18"
author: Baoxiong Jia
title: 'SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/SpPYrZ75Uee17OWAQrm1q.png
link: https://huggingface.co/papers/2401.09340
summary: This paper proposes a large-scale dataset, SceneVerse, and a unified learning framework, GPS, for grounding language in 3D scenes. SceneVerse contains 68K indoor scenes and 2.5M vision-language pairs, which allows for pre-training GPS. Extensive experiments demonstrate state-of-the-art performance on 3D visual grounding benchmarks and the potential of GPS for 3D vision-language tasks....
opinion: placeholder
tags:
- Computer Vision
- Natural Language Processing
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
date: "2024-01-18"
author: Nanye Ma
title: 'SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/hdPDnPoXo91-ivR-5d52c.png
link: https://huggingface.co/papers/2401.08740
summary: This paper presents Scalable Interpolant Transformers (SiT), a family of generative models based on Diffusion Transformers (DiT), that allows for a more flexible way to connect two distributions. SiT surpasses DiT on the conditional ImageNet 256x256 benchmark and has an FID-50K score of 2.06 by exploring various diffusion coefficients that can be tuned separately from learning, making the model more efficient and flexible....
opinion: placeholder
tags:
- Unsupervised Learning
- Deep Learning
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
date: "2024-01-18"
author: Yu-Ying Yeh
title: 'TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/h7DGdKotisn9wcJciHHNX.mp4
link: https://huggingface.co/papers/2401.09416
summary: TextureDreamer is a novel image-guided texture synthesis method that uses geometry-aware diffusion models to transfer relightable textures from a few input images to target 3D shapes across different categories. Unlike traditional and learning-based methods, TextureDreamer can transfer complex textures from real-world environments to arbitrary objects, potentially democratizing texture creation. Its main technology is Personalized Geometry-aware Score Distillation (PGSD), which combines personal...
opinion: placeholder
tags:
- Computer Vision
- Deep Learning
- Emerging Applications of Machine Learning
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
date: "2024-01-18"
author: Ludan Ruan
title: 'UniVG: Towards UNIfied-modal Video Generation'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/4FyqtWZf5J8vgDRF6xFK7.png
link: https://huggingface.co/papers/2401.09084
summary: This paper proposes a unified video generation system called UniVG that can handle multiple video generation tasks across text and image modalities. The system employs Multi-condition Cross Attention for high-freedom video generation and introduces Biased Gaussian Noise for low-freedom video generation. UniVG achieves state-of-the-art results on the MSR-VTT benchmark and outperforms open-source methods in human evaluations....
opinion: placeholder
tags:
- Computer Vision
- Deep Learning
- Natural Language Processing
- Generative Modeling
- Video Generation
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
date: "2024-01-18"
author: Haoxin Chen
title: 'VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/jgPqbzAZBFM7Anlxsjxpi.png
link: https://huggingface.co/papers/2401.09047
summary: This paper proposes a method to overcome the limitations of large-scale, high-quality video datasets required for training text-to-video generation models by leveraging low-quality videos and synthesized high-quality images. The authors analyze the connection between spatial and temporal modules in video models and conduct experiments to demonstrate the superiority of their proposed method in picture quality, motion, and concept composition....
opinion: placeholder
tags:
- Deep Learning
- Computer Vision
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
date: "2024-01-18"
author: Lianghui Zhu
title: 'Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model'
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/60f1abe7544c2adfd699860c/3CR9QVEIjbOr4I8mOvDy1.png
link: https://huggingface.co/papers/2401.09417
summary: This paper proposes a new generic vision backbone called Vim that uses bidirectional Mamba blocks for efficient visual representation learning. Unlike existing vision transformers, Vim is capable of marking image sequences with position embeddings and compressing visual representation with bidirectional state space models. Vim is shown to achieve higher performance on ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks with significant improvement in computatio...
opinion: placeholder
tags:
- Computer Vision
- Deep Learning

0 comments on commit ac38967

Please sign in to comment.