manual vault backup: 2024-06-29 - 6 files

swyx · swyx · commit 609b7ba2df05 · 2024-06-29T17:31:28.000-07:00
Affected files:
Monthly Notes/Apr 2024 notes.md
Monthly Notes/Jun 2024 notes.md
Monthly Notes/May 2024 notes.md
Resources/Good AI Podcasts and Newsletters.md
Resources/Understanding Transformers.md
stub notes/OpenAI notes.md
diff --git a/Monthly Notes/Apr 2024 notes.md b/Monthly Notes/Apr 2024 notes.md
@@ -42,11 +42,13 @@
 
 - Llama 3
 	- [top 5 in Lmsys, but also tied for first in English](https://x.com/lmsysorg/status/1782483701710061675?s=46&t=90xQ8sGy63D2OtiaoGJuww)
+	- [karpathy notes](https://x.com/karpathy/status/1781028605709234613), [HN](https://news.ycombinator.com/item?id=40077533)
 - Cohere Command R+: [@cohere](https://twitter.com/cohere/status/1775878850699808928) released Command R+, a 104B parameter model with 128k context length, open weights for non-commercial use, and strong multilingual and RAG capabilities. It's available on the [Cohere playground](https://twitter.com/cohere/status/1775878883268509801) and [Hugging Face](https://twitter.com/osanseviero/status/1775882744792273209). [Aidan tweet](https://twitter.com/aidangomez/status/1775878606108979495)
 	-   **Optimized for RAG workflows**: Command R+ is [optimized for RAG](https://twitter.com/aidangomez/status/1775878606108979495), with multi-hop capabilities to break down complex questions and strong tool use. It's integrated with [@LangChainAI](https://twitter.com/cohere/status/1775931339361149230) for building RAG applications.
 	-   **Multilingual support**: Command R+ has [strong performance](https://twitter.com/seb_ruder/status/1775882934542533021) across 10 languages including English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese. The SonnetTokenizer is [efficient for non-English text](https://twitter.com/JayAlammar/status/1775928159784915229).
 	- [became the #6 model on lmsys ](https://twitter.com/lmsysorg/status/1777630133798772766?t=90xQ8sGy63D2OtiaoGJuww)and top open model, beating mistral large and qwen, but behind claude sonnet, gemini pro, and gpt4t
 - Mistral 8x22B
+	- https://news.ycombinator.com/item?id=40064736
 - Phi-3 ([HN, Technical Report](https://news.ycombinator.com/item?id=40127806), [sebastian bubeck short video](https://twitter.com/SebastienBubeck/status/1782627991874678809?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1782627991874678809%7Ctwgr%5E507304ee4fbb7b0a8a9c60b9bb5711109bde1d41%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fwww.emergentmind.com%2Fpapers%2F2404.14219))
 	- phi-3-mini: 3.8B model trained on 3.3T tokens rivals Mixtral 8x7B and GPT-3.5
 	- phi-3-medium: 14B model trained on 4.8T tokens w/ 78% on MMLU and 8.9 on MT-bench
@@ -89,7 +91,8 @@
 
 - udio music https://twitter.com/udiomusic/status/1778045322654003448?t=6FDPaNxZcbSsELal6Sv7Ug
 	- [comedy dialogue, sports analysis, commercials, radio broadcasts, asmr, nature sounds](https://x.com/mckaywrigley/status/1778867824217542766?s=46&t=6FDPaNxZcbSsELal6Sv7Ug)
-	- sonauto as well
+	- [sonauto as well](https://news.ycombinator.com/item?id=39992817): A more controllable AI music creator  
+		- Others do music generation by training a Vector Quantized Variational Autoencoder like Descript Audio Codec (https://github.com/descriptinc/descript-audio-codec) to turn music into tokens, then training an LLM on those tokens. Instead, we ripped the tokenization part off and replaced it with a normal variational autoencoder bottleneck (along with some other important changes to enable insane compression ratios). This gave us a nice, normally distributed latent space on which to train a diffusion transformer (like Sora). Our diffusion model is also particularly interesting because it is the first audio diffusion model to generate coherent lyrics!
 - [Reka Core/Flash/Edge](https://publications.reka.ai/reka-core-tech-report.pdf)
 - [Infinity AI:  first Ai generated YC AI demo Day](https://x.com/snowmaker/status/1775598317399060687)
 
@@ -100,9 +103,12 @@
 - [Augment - 252m seed](https://techcrunch.com/2024/04/24/eric-schmidt-backed-augment-a-github-copilot-rival-launches-out-of-stealth-with-252m/)
 - [XAI seeking 4b](https://www.bloomberg.com/news/articles/2024-04-11/elon-musk-s-xai-seeks-up-to-4-billion-to-compete-with-openai)
 
+[Nvidia acquires RunAI for ~700m](https://news.ycombinator.com/item?id=40144235)
 ## Learning
 
+- llm.c release - [karpathy explanation](https://x.com/karpathy/status/1778153659106533806)
 - Thom Wolf - [how to train LLMs in 2024](https://youtu.be/2-SPH9hIKT8?si=wqYrDbhvgJUT2zHP)
+- [Building A GPU from scratch](https://x.com/MajmudarAdam/status/1783304235909877846)
 ## discussion
 
 - soumith v fchollet https://x.com/fchollet/status/1776319511807115589
diff --git a/Monthly Notes/Jun 2024 notes.md b/Monthly Notes/Jun 2024 notes.md
@@ -23,20 +23,26 @@
 
 ## open models
 
+
+- [Mamba-2 release](https://goombalab.github.io/blog/2024/mamba2-part1-model/)
+	- https://arxiv.org/abs/2405.21060
+	- https://x.com/_albertgu/status/1797651223035904355
+	- https://x.com/tri_dao/status/1797650443218436165
 - [stable diffusion 3 medium](https://stability.ai/news/stable-diffusion-3-medium)
 
 
 ## open tooling
 
 - [plandex](https://github.com/plandex-ai/plandex) - a reliable and developer-friendly AI coding agent in your terminal. It can plan out and complete large tasks that span many files and steps.
+- [R2R - rag2riches](https://github.com/SciPhi-AI/R2R) -  a prod-ready RAG (Retrieval-Augmented Generation) engine with a RESTful API. R2R includes hybrid search, knowledge graphs, and more.
 
 ## other launches
 
 - etched launch: https://www.etched.com/announcing-etched
 	- Sohu is the world’s first transformer ASIC. One 8xSohu server replaces 160 H100 GPUs.
 	- By specializing, Sohu gets unprecedented performance. One 8xSohu server can serve over 500,000 Llama 70B tokens per second.
 - luma ai dream machine https://news.ycombinator.com/item?id=40670096
-- arc-agi benchmark
+- arc-agi benchmark, [arc prize](https://news.ycombinator.com/item?id=40648960)
 	- got 71% or 50% solution https://x.com/bshlgrs/status/1802766374961553887
 - [hugginface open llm leaderboard v2](https://huggingface.co/spaces/open-llm-leaderboard/blog)
 	- gsm8k, truthfulqa are contaminated in instruction datasets
@@ -58,6 +64,7 @@
 	- Assuming 100% utilization of your model Llama-3 8B-Instruct model costs about $17 dollars per 1M tokens when self hosting with EKS, vs ChatGPT with the same workload can offer $1 per 1M tokens. 
 	- Choosing to self host the hardware can make the cost <$0.01 per 1M token that takes ~5.5 years to break even.
 - [A Picture is Worth 170 Tokens: How Does GPT-4o Encode Images?](https://www.oranlooney.com/post/gpt-cnn/)
+	- Here’s a [fact](https://openai.com/api/pricing/): GPT-4o charges 170 tokens to process each `512x512` tile used in high-res mode. At ~0.75 tokens/word, this suggests a picture is worth about 227 words—only a factor of four off from the traditional saying.
 - forcing AI on to us
 	- msft recall default https://news.ycombinator.com/item?id=40610435
 		- and delay https://news.ycombinator.com/item?id=40683210
@@ -66,3 +73,5 @@
 	- [talaria tool](https://buttondown.email/ainews/archive/ainews-talaria-apples-new-mlops-superweapon-4066/)
 - perplexity - forbes attribution issue
 - [books4 dataset](https://web.archive.org/web/20240519104217/https://old.reddit.com/r/datasets/comments/1cvi151/ai_books4_dataset_for_training_llms_further/)
+- [together ai mixture of agents](https://www.together.ai/blog/together-moa)
+	- Our reference implementation, Together MoA, significantly surpass GPT-4o 57.5% on AlpacaEval 2.0 with a score of 65.1% using only open source models. While Together MoA achieves higher accuracy, it does come at the cost of a slower time to first token; reducing this latency is an exciting future direction for this research.
diff --git a/Monthly Notes/May 2024 notes.md b/Monthly Notes/May 2024 notes.md
@@ -36,13 +36,25 @@
 - [Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x](https://hao-ai-lab.github.io/blogs/cllm/)|
 
 
+## launches
+
+- [ChatGPT UI for rabbit holes with reader pane - a9.io](https://delve.a9.io/)
+
 ## Funding
 
 - scale ai: **Scale AI raises $1B at $13.8B valuation in round led by Accel**: [@alexandr_wang](https://twitter.com/alexandr_wang/status/1792905417065914858)
 - suno ai: **Suno raises $125M to enable anyone to make music with AI**: [@suno_ai_](https://twitter.com/suno_ai_/status/1792922276683297162) 
 
 ## Discussions
 
+- [LLM evaluation tools/vendors that really like that are useful in domain specific contexts? ](https://x.com/HamelHusain/status/1787944186714759419) 
+	I'm looking for vendors that have ALL of the below:
+	
+	1. Observability
+	2. Allow you to write your own assertions and LLM judges
+	3. Bootsrap the creation of #2 automatically with LLMs
+	4. Measure the alignment between humans (through an annotation queue) and capture critiques, also helping to generate more kinds of tests like #2
+	5. Prompt, data and model versioning
 - [The future of foundation models is closed source](https://x.com/absoluttig/status/1793001830110380313)
 	- centralized vs decentralized
 	- Despite recent progress and endless cheerleading, open-source AI will become a financial drain for model builders, an inferior option for developers and consumers, and a risk to national security. Closed-source models will create far more economic and consumer value over the next decade.
@@ -51,9 +63,11 @@
 	- https://twitter.com/DrJimFan/status/1786054643568517261
 - [Consistency LLM - parallel decoders accelerates inference 3.5x](https://news.ycombinator.com/item?id=40302201)
 - [Google Gemini's impending Context Caching](https://news.ycombinator.com/item?id=40364220)
+- [Consistency Large Language Models: A Family of Efficient Parallel Decoders](https://hao-ai-lab.github.io/blogs/cllm/): converting LLMs to parallel decoders accelerates inference 3.5x
 - [shunyu yao phd defense](https://twitter.com/ShunyuYao12/status/1789058769982550031)
-	- "Language Agents: From Next-Token Prediction to Digital Automation"  
+	- "Language Agents: From Next-Token Prediction to Digital Automation"   https://ysymyth.github.io/papers/Dissertation-finalized.pdf
 	- Talk (WebShop, SWE-bench, ReAct, ToT, CoALA, and on the future of agents): 
 	- https://ysymyth.github.io/papers/Dissertation-finalized.pdf
+- [fineweb dataset](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1) a new, large-scale (15-trillion tokens, 44TB disk space) dataset for LLM pretraining. FineWeb is derived from 96 CommonCrawl snapshots and produces better-performing LLMs than other open pretraining datasets.
 - learning
 	- [Llama 3 implemented in pure NumPy](https://docs.likejazz.com/llama3.np/)
diff --git a/Resources/Good AI Podcasts and Newsletters.md b/Resources/Good AI Podcasts and Newsletters.md
@@ -24,7 +24,7 @@ some of these are on youtube too, i dont really bother separating them. ⭐  rep
 	- [Eye on AI](https://open.spotify.com/show/5aFnCGDhpL5bGr2uHy4bB5) - Weekly analysis at the intersection of artificial intelligence and industry. (less technical but great guest backlog)
 	- [AI Jason](https://youtu.be/pJwR5pv0_gs?si=BdXjIX1mEik-Lbpz) - new frequent ai project breakdowns channel
 	- [Matthew Berman](https://www.youtube.com/@matthew_berman) - short explainer videos of AI Engineering related projects and news
-	- smaller creators worth noting: [Umar Jamil (standard concept teaching channel, very technical)](https://www.youtube.com/@umarjamilai?app=desktop) and [Daniel Bourke (livestream paper replication)](https://www.youtube.com/@danielbourkearxiv2821?app=desktop), [Efficient NLP (good short paper/technique explainers)](https://www.youtube.com/@EfficientNLP), [Trelis Research](https://www.youtube.com/watch?v=ae2lbmtTY5A)
+	- smaller creators worth noting: [Algorithmic Simplicity](https://www.youtube.com/watch?v=N6Piou4oYx8) (explanations of archs), [Umar Jamil (standard concept teaching channel, very technical)](https://www.youtube.com/@umarjamilai?app=desktop) and [Daniel Bourke (livestream paper replication)](https://www.youtube.com/@danielbourkearxiv2821?app=desktop), [Efficient NLP (good short paper/technique explainers)](https://www.youtube.com/@EfficientNLP), [Trelis Research](https://www.youtube.com/watch?v=ae2lbmtTY5A)
 	- [/r/LocalLlama list has 23 recommendations](https://www.reddit.com/r/LocalLLaMA/comments/1atycgd/which_localllama_focused_yt_channels_do_you_follow/)
 - Companies
 	- ⭐ [The Cognitive Revolution](https://www.cognitiverevolution.ai/) - Nathan Labenz - great new pod
diff --git a/Resources/Understanding Transformers.md b/Resources/Understanding Transformers.md
@@ -57,6 +57,7 @@ The Illustrated Transformer - [https://jalammar.github.io/illustrated-transform
 - Reminder that my deep learning course [@unige_en](https://twitter.com/unige_en)is entirely available on-line. 1000+ slides, ~20h of screen-casts. [https://fleuret.org/dlc/](https://t.co/6OVyjPdwrC)
 - https://e2eml.school/transformers.html Transformers from Scratch
 
+https://www.jvoderho.com/blog.html?blogid=Transformer%20as%20a%20general%20purpose%20computer
 
 https://news.ycombinator.com/item?id=35712334
 The Illustrated Transformer is fantastic, but I would suggest that those going into it really should read the previous articles in the series to get a foundation to understand it more, plus later articles that go into GPT and BERT, here's the list:
@@ -76,6 +77,9 @@ Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With
 [3] [https://arxiv.org/abs/2301.05062](https://arxiv.org/abs/2301.05062)
 
 
+[3blue 1 brown visualizing attention](https://www.3blue1brown.com/lessons/attention) https://news.ycombinator.com/item?id=40035514
+
+
 The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) - [https://jalammar.github.io/illustrated-bert/](https://jalammar.github.io/illustrated-bert/)
 
 The Illustrated GPT-2 (Visualizing Transformer Language Models) - [https://jalammar.github.io/illustrated-gpt2/](https://jalammar.github.io/illustrated-gpt2/)
diff --git a/stub notes/OpenAI notes.md b/stub notes/OpenAI notes.md
@@ -56,6 +56,7 @@
 ## sama
 
 - intentionally did not give chatgpt a human name - want to distance from “person-ness” but accept that others will do it https://overcast.fm/+SusunrACk/12:51
+- [not fired by YC](https://news.ycombinator.com/item?id=40521657)
 
 ## mainstream puff pieces