index.html

<!DOCTYPE html>
<html lang="en">
<head>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-L64K6S2GBM"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'G-L64K6S2GBM');
</script>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
  <meta name="description" content="MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data">
  <meta name="keywords" content="fMRI,Stable Diffusion">
  <title>MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="website_files/bootstrap.css">
  <script src="website_files/jquery.js"></script>
  <script src="website_files/bootstrap.js"></script>
  <style>
    /* Remove the navbar's default margin-bottom and rounded borders */ 
    .navbar {
      margin-bottom: 0;
      border-radius: 0;
    }
    
    /* Add a gray background color and some padding to the footer */
    footer {
      background-color: #f2f2f2;
      padding: 25px;
    }
    .long-image{
        width: 100%;
        height: 1000px;
        overflow: auto;
        margin: auto;
      }
  </style>
  <link rel="stylesheet" href="website_files/font.css">
  <link rel="stylesheet" href="website_files/main.css">
</head>
<body>

<div class="jumbotron"> 
  <div class="container text-center"> 
    <h1 style="color:white;margin-bottom:0;">MindEye2:</h1> 
    <h3 style="color:white;margin-top:0;">Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data</h3> 
    <br> 
    <p style="color:white"><a href="https://paulscotti.github.io">Paul S. Scotti</a><sup>1,2</sup>, Mihir Tripathy<sup>†2</sup>, Cesar Kadir Torrico Villanueva<sup>†2</sup>, Reese Kneeland<sup>†3</sup>, Tong Chen<sup>4,2</sup>,
    Ashutosh Narang<sup>2</sup>, Charan Santhirasegaran<sup>2</sup>, Jonathan Xu<sup>5,2</sup>, Thomas Naselaris<sup>3</sup>, Kenneth A. Norman<sup>6</sup>,
    <a href="https://tanishq.ai">Tanishq Mathew Abraham</a><sup>1,2</sup>
    <br>
    <br>
    <br><sup>1</sup><a href="https://stability.ai">Stability AI</a>, <sup>2</sup><a href="https://medarc.ai">Medical AI Research Center (MedARC)</a>, 
    <sup>3</sup>University of Minnesota, <sup>4</sup>The University of Sydney, <sup>5</sup>University of Waterloo, <sup>6</sup>Princeton Neuroscience Institute.
    <br> († indicates core contribution)
    <br>
  </div> 
</div>
  
<div class="container bg-3">    
  <div class="row text-center">
    <div class="col-sm-2"></div>
    <div class="col-sm-2 col-sm-offset-1">
      <a href="https://arxiv.org/abs/2403.11207"><img height="100" width="78" src="website_files/imgs/paper_page.png" data-nothumb="" style="border: 1px solid"><br>arXiv<br>Preprint<br></a>
    </div>
    <div class="col-sm-2">
      <a href="https://github.com/MedARC-AI/MindEyeV2"><img height="100" width="78" src="website_files/imgs/source_code.png" data-nothumb="" style="border: 1px solid"><br>Source Code<br>GitHub</a>
    </div>
    <div class="col-sm-2">
      <a href="#results"><img height="100" width="78" src="website_files/imgs/website_teaser.png" data-nothumb="" style="border: 1px solid"><br>Results</a>
    </div>
    <div class="col-sm-2 col-sm-offset-1"></div>
  </div>
</div><br>

<div class="container bg-3">
  <div class="row">
    <p><b>Announcement</b>📢: We are excited to share MindEye2 has been accepted to ICML 2024! An updated camera-ready version of our manuscript is now up on arXiv. See you in Vienna!</p>
  </div>
</div><br>
  
  
<div class="container bg-3">
  <div class="row">
    <h2 class="text-center">Abstract</h2>
    <hr>
    <p> Reconstructions of visual perception from brain activity have improved tremendously, but the practical utility of such methods has been limited. This is because such models are trained independently per subject where each subject requires dozens of hours of expensive fMRI training data to attain high-quality results. The present work showcases high-quality reconstructions using only 1 hour of fMRI training data. We pretrain our model across 7 subjects and then fine-tune on minimal data from a new subject. Our novel functional alignment procedure linearly maps all brain data to a shared-subject latent space, followed by a shared non-linear mapping to CLIP image space. We then map from CLIP space to pixel space by fine-tuning Stable Diffusion XL to accept CLIP latents as inputs instead of text. This approach improves out-of-subject generalization with limited training data and also attains state-of-the-art image retrieval and reconstruction metrics compared to single-subject approaches. MindEye2 demonstrates how accurate reconstructions of perception are possible from a single visit to the MRI facility. All code is available on GitHub. </p>
    <div class="col-sm-12 text-center">
      <img src="website_files/imgs/figure1.png" class="img-fluid" alt="glide" style="width:88%">
    </div>
  </div>
</div><br>

<br><br>


<div class="container bg-3">
  <div class="row">
    <h2 class="text-center">Methods</h2>
    <hr>
    <p>
      MindEye2 is first trained using data from 7 subjects in the Natural Scenes Dataset and then fine-tuned using a target held-out subject who may have scarce training data. The fMRI activity is initially mapped to a shared-subject latent space via subject-specific ridge regression. The rest of the model is subject-agnostic, consisting of an MLP backbone and a diffusion prior which are used to output predicted CLIP embeddings which are reconstructed into images using our SDXL unCLIP model. The diffusion prior helps bridge the modality gap between CLIP latent space and the fMRI latents generated by the MLP backbone. The alignment to shared-subject space is not trained independently—the whole pipeline is trained end-to-end where pretraining involves each batch containing brain inputs from all subjects.
    </p>
    <div class="col-sm-12 text-center">
      <img src="website_files/imgs/pipeline.png" class="img-fluid" alt="glide" style="width:88%">
    </div>
    <p>
      UnCLIP (or “image variations”) models have previously been used for the creative application of returning variations of a given reference image. Contrary to this, our goal was to train a model that returns images as close as possible to the reference image across both low-level structure and high-level semantics. For this use-case, we observed that existing unCLIP models do not accurately reconstruct images from their ground truth CLIP image embeddings (see below Figure). We therefore fine-tuned our own unCLIP model (using the 256 x 1664 dim. image embeddings from OpenCLIP ViT-bigG/14) from Stable Diffusion XL to support this goal, leading to much higher fidelity reconstructions from ground truth CLIP embeddings. For MindEye2, we can use OpenCLIP embeddings predicted from the brain instead of the ground truth embeddings to reconstruct images. This change increases the ceiling possible performance for fMRI-to-image reconstructions as it is no longer limited by the performance of the pre-trained frozen unCLIP model.
    </p>
    <div class="col-sm-12 text-center">
      <img src="website_files/imgs/unCLIP_comparison_titled.png" class="img-fluid" alt="glide" style="width:88%">
    </div>
  </div>
</div><br>


<div class="container bg-3">
  <div class="row">
    <h2 class="text-center" id="results">Results</h2>
    <hr>
     <p>
       MindEye2 achieves state-of-the-art image retrieval, image reconstruction and caption generation metrics across multiple subjects, and enables high quality image generation with just 2.5% of the previously required data (i.e., 1 hour of training data instead of 40 hours). That is, given a sample of fMRI activity from a participant viewing an image, MindEye can identify either which image out of a pool of possible image candidates was the original seen image (retrieval), or it can recreate the image that was seen (reconstruction) along with its text caption. These results can be generated by fine-tuning the pre-trained model with just an hour of fMRI data from a new subject.
     </p>
    <p>
    Qualitative comparison above confirms that in the 1-hour training data setting, MindEye2 outperforms other methods, and that specifically pre-training on other subjects’ data helps to enable such performance. See preprint for quantitative comparisons.
    </p>
    <p>
      If we use the full 40 hours of training data instead of 1, we also achieve SOTA performance for reconstruction and retrieval. We found that the 1-hour setting offered a good balance between scan duration and reconstruction performance, with notable improvements from first pre-training on other subjects.
    </p>
    <div class="col-sm-12 text-center">
      <img src="website_files/imgs/recon_comparison.png" class="img-fluid" alt="glide" style="width:88%">
    </div>
  </div>

</div><br><br>

<div class="container bg-3">
  <div class="row">
    <h2 class="text-center">Acknowledgements</h2>
    <hr>
    <p> Special thanks to Dustin Podell, Vikram Voleti, Andreas Blattmann, and Robin Rombach for technical assistance fine-tuning Stable Diffusion XL to support our unCLIP usecase. Thanks to the MedARC Discord community for being the public forum from which this research was developed, particularly thank you to Connor Lane, Alex Nguyen, Atmadeep Bannerjee, Amir Refaee, and Mohammed Baharoon for their helpful discussions. Thanks to Alessandro Gifford and Connor Lane for providing useful feedback on drafts of the manuscript. Thank you to Richard Vencu for help navigating the Stability AI HPC. Thanks to Stability AI for their support for open neuroAI research and providing the computational resources necessary to develop MindEye2. Collection of the Natural Scenes Dataset was supported by NSF IIS-1822683 and NSF IIS-1822929. This webpage template was recycled from <a href="https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/">here</a>. 
    </p>
  </div>
</div><br><br>


</body>
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
</html>