multi-head-attention

Here are 38 public repositories matching this topic...

Bruce-Lee-LY / decoding_attention

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

gpu cuda inference nvidia mha multi-head-attention llm large-language-model flash-attention cuda-core decoding-attention flashinfer

Updated Sep 14, 2024
C++

dev-geof / final-state-transformer

Star

Machine learning development toolkit built upon Transformer encoder network architectures and tailored for the realm of high-energy physics and particle-collision event analysis.

machine-learning deep-learning toolkit transformer particle-physics science-research multi-head-attention

Updated Sep 13, 2024
Python

Bruce-Lee-LY / flash_attention_inference

Star

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

Updated Sep 7, 2024
C++

imperial-qore / TranAD

Star

[VLDB'22] Anomaly Detection using Transformers, self-conditioning and adversarial training.

unsupervised-learning anomaly-detection adversarial-learning multi-head-attention transformer-models

Updated Jul 25, 2024
Python

JacobHanimann / scDINO

Star

Self-Supervised Vision Transformers for multiplexed imaging datasets

unsupervised morphology phenotyping multi-channel single-cell fluorescence-microscopy-imaging high-content-screening multi-head-attention self-supervised-learning greyscale-image vision-transformer

Updated Jul 1, 2024
Python

IParraMartin / An-Explanation-Is-All-You-Need

Star

The original transformer implementation from scratch. It contains informative comments on each block

nlp machine-learning translation ai deep-learning pytorch artificial-intelligence transformer gpt language-model attention-mechanism begginers multi-head-attention begginer-friendly gpt-2 gpt-3 gpt-4

Updated Jun 18, 2024
Python

sushantkumar23 / nano-gpt

Star

Simple character level Transformer

transformers pytorch attention attention-mechanism rope self-attention multi-head-attention shakespeare-dataset transformer-architecture llm rmsnorm

Updated May 27, 2024
Jupyter Notebook

knotgrass / attention

Star

several types of attention modules written in PyTorch

transformers pytorch transformer attention attention-mechanism softmax-layer multi-head-attention multi-query-attention grouped-query-attention scale-dot-product-attention

Updated May 13, 2024
Python

liaoyanqing666 / transformer_pytorch

Star

完整的原版transformer程序，complete origin transformer program

python pytorch transformer beginner multi-head-attention positional-encoding

Updated Mar 18, 2024
Python

TmohamedashrafT / vision-transformer-implementation

Star

This repository contains code for implementing Vision Transformer (ViT) model for image classification

transformers multi-head-attention vision-transformer

Updated Dec 20, 2023
Python

HydraViT is a PyTorch implementation of the HydraViT model, an adaptive multi-branch transformer for multi-label disease classification from chest X-ray images. The repository provides the necessary code to train and evaluate the HydraViT model on the NIH Chest X-ray dataset.

machine-learning computer-vision deep-learning neural-networks chest-xray-images multi-head-attention visual-transformers

Updated Oct 14, 2023

poloclub / dodrio

Star

Exploring attention weights in transformer-based models with linguistic knowledge.

visualization nlp deep-learning transformer attention-mechanism interactive-visualizations multi-head-attention

Updated Oct 3, 2023
Svelte

aminaghoul / TrajDCM

Star

ieee trajectory-prediction multi-head-attention interpretable-machine-learning interaction-dataset

Updated Sep 15, 2023

SpydazWebAI-NLP / BasicNeuralNetWork2023

Star

A Basic Multi layered Neural Network, With Attention Masking Features

nlp neural-network rnn self-attention multi-head-attention transformer-architecture

Updated Jul 30, 2023
Visual Basic .NET

pi-tau / transformer

Star

The Transformer model implemented from scratch using PyTorch. The model uses weight sharing between the embedding layers and the pre-softmax linear layer. Training on the Multi30k machine translation task is shown.

deep-learning machine-translation pytorch transformer shared-embedding multi-head-attention multi30k