Skip to content

Commit 84d24f8

Browse files
author
swyx
committed
vault backup: 2024-06-29 - 4 files
Affected files: Monthly Notes/Apr 2024 notes.md Monthly Notes/Jun 2024 notes.md Monthly Notes/Mar 2024 notes.md Monthly Notes/May 2024 notes.md
1 parent 609b7ba commit 84d24f8

File tree

4 files changed

+15
-3
lines changed

4 files changed

+15
-3
lines changed

Monthly Notes/Apr 2024 notes.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@
8686
- https://github.com/GregorD1A1/TinderGPT
8787
- https://github.com/princeton-nlp/SWE-agent
8888
- https://github.com/Dhravya/supermemory t's a ChatGPT for your bookmarks. Import tweets or save websites and content using the chrome extension.
89-
89+
- [Dify, a visual workflow to build/test LLM applications](https://github.com/langgenius/dify)
9090
## other launches
9191

9292
- udio music https://twitter.com/udiomusic/status/1778045322654003448?t=6FDPaNxZcbSsELal6Sv7Ug
@@ -120,7 +120,7 @@
120120
- papers
121121
- Our 12 scaling laws (for LLM knowledge capacity)
122122
- prefix [low quality data with junk tokens](https://twitter.com/ZeyuanAllenZhu/status/1777513028466188404) - "when pre-training good data (e.g., Wiki) together with "junks" (e.g., Common Crawl), LLM's capacity on good data may decrease by 20x times! A simple fix: add domain tokens to your data; LLMs can auto-detect domains rich in knowledge and prioritize."
123-
123+
- [Mixture of Depths](https://x.com/PiotrPadlewski/status/1775865549802598800)
124124
## memes
125125

126126
- suno memes

Monthly Notes/Jun 2024 notes.md

+1
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@
6060
## discussions and good reads
6161

6262

63+
- [leopold aschenbrenner's Trillion Dollar Cluster essay](https://situational-awareness.ai/)
6364
- [cost of self hosting Llama 3](https://blog.lytix.co/posts/self-hosting-llama-3)
6465
- Assuming 100% utilization of your model Llama-3 8B-Instruct model costs about $17 dollars per 1M tokens when self hosting with EKS, vs ChatGPT with the same workload can offer $1 per 1M tokens.
6566
- Choosing to self host the hardware can make the cost <$0.01 per 1M token that takes ~5.5 years to break even.

Monthly Notes/Mar 2024 notes.md

+2
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,8 @@
8282
- (beat llama, with less tokens, on new architecture)
8383
- [Cohere Command R](https://x.com/aidangomez/status/1767264315550163024?s=46&t=6FDPaNxZcbSsELal6Sv7Ug) - a model focused on scalability, RAG, and Tool Use. We've also released the weights for research use, we hope they're useful to the community!
8484
- [Together/Hazy Research Based](https://www.together.ai/blog/based) - solving the **recall-memory tradeoff** of convolutional models like Hyena/H3 in linear attention models
85+
- [Qwen1.5-MoE](https://qwenlm.github.io/blog/qwen-moe/):
86+
- Qwen1.5-MoE-A2.7B, a small MoE model with only 2.7 billion activated parameters yet matching the performance of state-of-the-art 7B models like Mistral 7B and Qwen1.5-7B.
8587
- [Moondream2](https://x.com/vikhyatk/status/1764793494311444599?s=20) - a small, open-source, vision language model designed to run efficiently on edge devices. Clocking in at 1.8B parameters, moondream requires less than 5GB of memory to run in 16 bit precision. This version was initialized using Phi-1.5 and SigLIP, and trained primarily on synthetic data generated by Mixtral. Code and weights are released under the Apache 2.0 license, which permits commercial use.
8688
- OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on: https://github.com/levihsu/OOTDiffusion
8789
- [Yi: Open Foundation Models by 01.AI](https://news.ycombinator.com/item?id=39659781) paper covering Yi--34B and variants

Monthly Notes/May 2024 notes.md

+10-1
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,14 @@
1414
- **OpenAI introduces interactive tables, charts, and file integration**: In a [tweet](https://x.com/OpenAI/status/1791227287569932368), OpenAI announced that ChatGPT Plus, Team, and Enterprise users can now upload files from Google Drive and Microsoft OneDrive, and interact with tables and charts within the AI model.
1515
- [reddit partnership](https://openai.com/index/openai-and-reddit-partnership/)
1616
- [stackoverlfow partnership](https://stackoverflow.co/company/press/archive/openai-partnership/)
17+
- [openai model spec](https://news.ycombinator.com/item?id=40300482)
1718
- nontechnical stuff
1819
- Sky/Scarlett Johannson drama
1920
- Ilya + Jan Leike resignations
2021
- [Leaked OpenAI documents reveal aggressive tactics toward former employees](https://www.vox.com/future-perfect/351132/openai-vested-equity-nda-sam-altman-documents-employees)
22+
- [bought chatgpt.com](https://news.ycombinator.com/item?id=40259100)
23+
- Rumors
24+
- [web search](https://news.ycombinator.com/item?id=40235206)
2125

2226
## frontier models
2327

@@ -38,7 +42,10 @@
3842

3943
## launches
4044

45+
- [Apple M4 chip](https://news.ycombinator.com/item?id=40286029&p=2)
4146
- [ChatGPT UI for rabbit holes with reader pane - a9.io](https://delve.a9.io/)
47+
- [Nonlinear chatgpt ui](https://news.ycombinator.com/item?id=40300126)
48+
- [ellipsis.dev - automated pr reviews/fixes](https://news.ycombinator.com/item?id=40309719)
4249

4350
## Funding
4451

@@ -63,11 +70,13 @@
6370
- https://twitter.com/DrJimFan/status/1786054643568517261
6471
- [Consistency LLM - parallel decoders accelerates inference 3.5x](https://news.ycombinator.com/item?id=40302201)
6572
- [Google Gemini's impending Context Caching](https://news.ycombinator.com/item?id=40364220)
73+
- [KAN: Kolmogorov-Arnold Networks](https://arxiv.org/abs/2404.19756) - [breakdown](https://x.com/aidev_isaak/status/1785771093824839914) vs [MLP](https://x.com/bozavlado/status/1787376558484709691)
6674
- [Consistency Large Language Models: A Family of Efficient Parallel Decoders](https://hao-ai-lab.github.io/blogs/cllm/): converting LLMs to parallel decoders accelerates inference 3.5x
6775
- [shunyu yao phd defense](https://twitter.com/ShunyuYao12/status/1789058769982550031)
6876
- "Language Agents: From Next-Token Prediction to Digital Automation" https://ysymyth.github.io/papers/Dissertation-finalized.pdf
6977
- Talk (WebShop, SWE-bench, ReAct, ToT, CoALA, and on the future of agents):
7078
- https://ysymyth.github.io/papers/Dissertation-finalized.pdf
7179
- [fineweb dataset](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1) a new, large-scale (15-trillion tokens, 44TB disk space) dataset for LLM pretraining. FineWeb is derived from 96 CommonCrawl snapshots and produces better-performing LLMs than other open pretraining datasets.
7280
- learning
73-
- [Llama 3 implemented in pure NumPy](https://docs.likejazz.com/llama3.np/)
81+
- [Llama 3 implemented in pure NumPy](https://docs.likejazz.com/llama3.np/)
82+
- [Exploring HN by mapping and analyzing 40M posts and comments for fun](https://news.ycombinator.com/item?id=40307519) (blog.wilsonl.in)

0 commit comments

Comments
 (0)