Skip to content
View DAVID-Hown's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Block or report DAVID-Hown

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A collection of resources and papers on Diffusion Models

HTML 11,350 953 Updated Aug 1, 2024

👀 Visual Instruction Inversion: Image Editing via Visual Prompting (NeurIPS 2023)

Python 88 2 Updated Dec 19, 2023

A collection of resources on controllable generation with text-to-image diffusion models.

972 27 Updated Dec 31, 2024

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,250 196 Updated Jan 6, 2025

animatediff prompt travel

Python 1,193 104 Updated Jan 13, 2024

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Jupyter Notebook 5,571 354 Updated Jun 28, 2024

Create images of a given character in different poses

Python 627 67 Updated Jun 5, 2024

Focus on prompting and generating

Python 42,788 6,278 Updated Jan 24, 2025

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

Python 2 Updated Jun 23, 2023

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Python 6,006 855 Updated May 13, 2024

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Python 7,513 1,236 Updated Jul 23, 2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Python 353 15 Updated Jan 14, 2025
33 Updated Jan 10, 2025

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 6,631 581 Updated Jan 11, 2025

Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)

Python 121 4 Updated Nov 13, 2023

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,872 526 Updated Dec 25, 2024

Official implementation for "Automatic Chain of Thought Prompting in Large Language Models" (stay tuned & more will be updated)

Jupyter Notebook 1,674 150 Updated Mar 13, 2024

[AAAI 2024 Oral] AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Python 858 107 Updated Dec 20, 2023

The Codes and Data of The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Python 49 3 Updated Jan 8, 2025

③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

Python 342 24 Updated Aug 12, 2024

Generative Models by Stability AI

Python 25,143 2,785 Updated Sep 4, 2024

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,533 2,926 Updated Sep 2, 2024

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 6,140 616 Updated Sep 26, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,319 268 Updated Jan 21, 2025

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 37,576 4,600 Updated Jan 23, 2025

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,341 406 Updated Aug 7, 2024

An Open-source Toolkit for LLM Development

Python 2,747 175 Updated Jan 13, 2025

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Python 788 109 Updated Jun 30, 2021
Next
Showing results