vault backup: 2024-06-29 - 4 files

swyx · swyx · commit 84d24f8af016 · 2024-06-29T18:51:03.000-07:00
Affected files:
Monthly Notes/Apr 2024 notes.md
Monthly Notes/Jun 2024 notes.md
Monthly Notes/Mar 2024 notes.md
Monthly Notes/May 2024 notes.md
diff --git a/Monthly Notes/Apr 2024 notes.md b/Monthly Notes/Apr 2024 notes.md
@@ -86,7 +86,7 @@
 - https://github.com/GregorD1A1/TinderGPT
 - https://github.com/princeton-nlp/SWE-agent
 - https://github.com/Dhravya/supermemory t's a ChatGPT for your bookmarks. Import tweets or save websites and content using the chrome extension.
-
+- [Dify, a visual workflow to build/test LLM applications](https://github.com/langgenius/dify)
 ## other launches
 
 - udio music https://twitter.com/udiomusic/status/1778045322654003448?t=6FDPaNxZcbSsELal6Sv7Ug
@@ -120,7 +120,7 @@
 - papers
 	- Our 12 scaling laws (for LLM knowledge capacity)
 		- prefix [low quality data with junk tokens](https://twitter.com/ZeyuanAllenZhu/status/1777513028466188404) - "when pre-training good data (e.g., Wiki) together with "junks" (e.g., Common Crawl), LLM's capacity on good data may decrease by 20x times! A simple fix: add domain tokens to your data; LLMs can auto-detect domains rich in knowledge and prioritize."
-
+	- [Mixture of Depths](https://x.com/PiotrPadlewski/status/1775865549802598800)
 ## memes
 
 - suno memes
diff --git a/Monthly Notes/Jun 2024 notes.md b/Monthly Notes/Jun 2024 notes.md
@@ -60,6 +60,7 @@
 ## discussions and good reads
 
 
+- [leopold aschenbrenner's Trillion Dollar Cluster essay](https://situational-awareness.ai/)
 - [cost of self hosting Llama 3](https://blog.lytix.co/posts/self-hosting-llama-3)
 	- Assuming 100% utilization of your model Llama-3 8B-Instruct model costs about $17 dollars per 1M tokens when self hosting with EKS, vs ChatGPT with the same workload can offer $1 per 1M tokens. 
 	- Choosing to self host the hardware can make the cost <$0.01 per 1M token that takes ~5.5 years to break even.
diff --git a/Monthly Notes/Mar 2024 notes.md b/Monthly Notes/Mar 2024 notes.md
@@ -82,6 +82,8 @@
 	- (beat llama, with less tokens, on new architecture)
 - [Cohere Command R](https://x.com/aidangomez/status/1767264315550163024?s=46&t=6FDPaNxZcbSsELal6Sv7Ug) - a model focused on scalability, RAG, and Tool Use. We've also released the weights for research use, we hope they're useful to the community!
 - [Together/Hazy Research Based](https://www.together.ai/blog/based) - solving the **recall-memory tradeoff** of convolutional models like Hyena/H3 in linear attention models
+- [Qwen1.5-MoE](https://qwenlm.github.io/blog/qwen-moe/): 
+	- Qwen1.5-MoE-A2.7B, a small MoE model with only 2.7 billion activated parameters yet matching the performance of state-of-the-art 7B models like Mistral 7B and Qwen1.5-7B.
 - [Moondream2](https://x.com/vikhyatk/status/1764793494311444599?s=20) - a small, open-source, vision language model designed to run efficiently on edge devices. Clocking in at 1.8B parameters, moondream requires less than 5GB of memory to run in 16 bit precision. This version was initialized using Phi-1.5 and SigLIP, and trained primarily on synthetic data generated by Mixtral. Code and weights are released under the Apache 2.0 license, which permits commercial use.
 - OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on: https://github.com/levihsu/OOTDiffusion
 - [Yi: Open Foundation Models by 01.AI](https://news.ycombinator.com/item?id=39659781)  paper covering Yi--34B and variants
diff --git a/Monthly Notes/May 2024 notes.md b/Monthly Notes/May 2024 notes.md
@@ -14,10 +14,14 @@
 		- **OpenAI introduces interactive tables, charts, and file integration**: In a [tweet](https://x.com/OpenAI/status/1791227287569932368), OpenAI announced that ChatGPT Plus, Team, and Enterprise users can now upload files from Google Drive and Microsoft OneDrive, and interact with tables and charts within the AI model.
 	- [reddit partnership](https://openai.com/index/openai-and-reddit-partnership/)
 	- [stackoverlfow partnership](https://stackoverflow.co/company/press/archive/openai-partnership/)
+	- [openai model spec](https://news.ycombinator.com/item?id=40300482)
 - nontechnical stuff
 	- Sky/Scarlett Johannson drama
 	- Ilya + Jan Leike resignations
 	- [Leaked OpenAI documents reveal aggressive tactics toward former employees](https://www.vox.com/future-perfect/351132/openai-vested-equity-nda-sam-altman-documents-employees)
+	- [bought chatgpt.com](https://news.ycombinator.com/item?id=40259100)
+- Rumors
+	- [web search](https://news.ycombinator.com/item?id=40235206)
 
 ## frontier models
 
@@ -38,7 +42,10 @@
 
 ## launches
 
+- [Apple M4 chip](https://news.ycombinator.com/item?id=40286029&p=2)
 - [ChatGPT UI for rabbit holes with reader pane - a9.io](https://delve.a9.io/)
+- [Nonlinear chatgpt ui](https://news.ycombinator.com/item?id=40300126)
+- [ellipsis.dev - automated pr reviews/fixes](https://news.ycombinator.com/item?id=40309719)
 
 ## Funding
 
@@ -63,11 +70,13 @@
 	- https://twitter.com/DrJimFan/status/1786054643568517261
 - [Consistency LLM - parallel decoders accelerates inference 3.5x](https://news.ycombinator.com/item?id=40302201)
 - [Google Gemini's impending Context Caching](https://news.ycombinator.com/item?id=40364220)
+- [KAN: Kolmogorov-Arnold Networks](https://arxiv.org/abs/2404.19756) - [breakdown](https://x.com/aidev_isaak/status/1785771093824839914) vs [MLP](https://x.com/bozavlado/status/1787376558484709691)
 - [Consistency Large Language Models: A Family of Efficient Parallel Decoders](https://hao-ai-lab.github.io/blogs/cllm/): converting LLMs to parallel decoders accelerates inference 3.5x
 - [shunyu yao phd defense](https://twitter.com/ShunyuYao12/status/1789058769982550031)
 	- "Language Agents: From Next-Token Prediction to Digital Automation"   https://ysymyth.github.io/papers/Dissertation-finalized.pdf
 	- Talk (WebShop, SWE-bench, ReAct, ToT, CoALA, and on the future of agents): 
 	- https://ysymyth.github.io/papers/Dissertation-finalized.pdf
 - [fineweb dataset](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1) a new, large-scale (15-trillion tokens, 44TB disk space) dataset for LLM pretraining. FineWeb is derived from 96 CommonCrawl snapshots and produces better-performing LLMs than other open pretraining datasets.
 - learning
-	- [Llama 3 implemented in pure NumPy](https://docs.likejazz.com/llama3.np/)
+	- [Llama 3 implemented in pure NumPy](https://docs.likejazz.com/llama3.np/)
+	- [Exploring HN by mapping and analyzing 40M posts and comments for fun](https://news.ycombinator.com/item?id=40307519) (blog.wilsonl.in)