Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: switch to GPT3.5 for cost savings #28

Merged
merged 3 commits into from
Apr 29, 2023
Merged

feat: switch to GPT3.5 for cost savings #28

merged 3 commits into from
Apr 29, 2023

Conversation

barakplasma
Copy link
Contributor

According to https://openai.com/pricing pricing for GPT3 is $0.02/1k tokens, while GPT3.5 is $0.002/1k tokens (10x cheaper).
This bumps the version of the library used in order to access GPT3.5 as well

@barakplasma barakplasma marked this pull request as draft April 27, 2023 14:46
@barakplasma
Copy link
Contributor Author

example output given this config.yaml on a feed with a very short description and no summary

markdown_dir_path:
feeds:
  - https://motherduck.com/rss.xml 1
weather_latitude: 32.068
weather_longitude: 34.789
terminal_mode: false
opml_file_path:
markdown_file_prefix:
markdown_file_suffix:
reading_time: false
openai_api_key: sk-1...
summary_feeds:
  - https://motherduck.com/rss.xml

24°C 🌤 ️

🍵 Blog posts | RSS Feed

DuckDB Ecosystem Newsletter: April 2023
The article is a monthly update on the latest developments in the DuckDB ecosystem, including the use of DuckDB by various companies, the release of extensions and tools, and upcoming events such as webinars and debates about the future of big data. The article also profiles Josh Wills, a well-known figure in the data analytics space.

DuckDB Ecosystem Newsletter: March 2023
The author Marcos introduces themselves as a data engineer for Riot Games and a creator of newsletters about data digs and AWS graviton. They have partnered with the MotherDuck team to share recent resources and developments in the DuckDB ecosystem. The newsletter includes links to blogs and events related to DuckDB, as well as a list of upcoming online and in-person events where DuckDB will be featured. The newsletter is available for subscription.

Why does everybody hate databases?
Hannes Mühleisen, a researcher at the Dutch research institute for computer science and mathematics, created DuckDB as a solution to the limitations and slow performance of hand-rolled dataframe engines used by some data practitioners in the R community. He wanted to build a database that would be easy to install and manage, and draw inspiration from SQLite, which does not have a server, but is in-process with a simple library. Hannes aims to have an impact as a researcher by creating something that will see widespread use in the area of data systems.

DuckDB Ecosystem Newsletter: 0.7.0 Released and More
The text is an introduction to the DuckDB ecosystem newsletter, created by Marcos, a data engineer. The newsletter features updates on the DuckDB database, with highlights from the DuckCon 2023 talks and articles published in January and February. The newsletter also includes information on the latest DuckDB release (0.7.0) and upcoming events related to data management and analytics. Additionally, readers can subscribe to the newsletter or join the mailing list for related news. The text also introduces Pedro Holanda, a Post-Doc and Chief of Operations at DuckDB Labs.

Solving Advent of Code with DuckDB and dbt
Advent of Code is an annual coding challenge that runs from December 1-25, consisting of small programming puzzles of varying difficulty that can be solved using any programming language. Each problem has two parts, with completing both earning a gold star and completing just one earning a silver star. The author of this article decided to use SQL, specifically DuckDB with dbt-duckdb, for the challenge and found that character and list manipulation functions such as string_split, unnest, and string_agg were useful for the problems. A specific solution for day three, which involves working with items in a rucksack, is also provided.

Big Data is Dead
The author argues that the era of Big Data is over and that data size was never the main problem preventing people from gaining insights. Data sizes have not increased as predicted, and hardware has gotten bigger, making it easier to handle data. The author suggests that most people don't have that much data, and traditional data management systems are still growing strongly. The author analyzes query logs, deal post-mortems, benchmark results, customer support tickets, customer conversations, service logs, and published blog posts to support their argument.

Python Faker for DuckDB Fake Data Generation
The text explains why generating data can be useful, especially when working with public data that needs cleaning. It describes using Python Faker, a package for generating fake data that includes a variety of providers for different types of data. It also gives an example of using Faker to generate a person record and explains how to insert generated data into DuckDB using Pandas DataFrames, CSV files, or Parquet files.

How to analyze SQLite databases in DuckDB
DuckDB is a lightweight, self-contained, embeddable analytics database that is sometimes referred to as the "SQLite for analytics." While SQLite is focused on transactions and row-based storage, the column-based storage of DuckDB makes it more suitable for analytics workloads. The DuckDB team has also added support for querying SQLite databases directly from DuckDB, using the sqlitescanner extension. An example of using the SQLite Sakila Sample Database to demonstrate this functionality is provided in the text. There are some differences between SQLite and DuckDB in how they store data and enforce data types.

This Month in the DuckDB Ecosystem: January 2023
The newsletter is from Marcos, a data engineer who creates newsletters about finding data gigs and AWS graviton by night. He partnered with the MotherDuck team to share information about the DuckDB ecosystem in the first issue of 2023. The newsletter includes links to articles and tutorials about DuckDB, as well as upcoming events related to data engineering.

How We're Making Analytics Ducking Awesome
MotherDuck is a data analytics startup that believes big data is dead, and easy data is the future. The company has received funding from top venture capitalists, including Andreessen Horowitz and Redpoint. Jordan Tigani, Chief Duck Herder, has shared his thoughts on the topic in various publications, including a podcast with Joe Reis and Matt Housley. The company's product, DuckDB, is a special analytical database designed for ease of use and fast querying. The company aims to build a cloud-based DuckDB service to complement the existing laptop version. The team plans to attend several industry conferences in the coming months, including Data Day Texas, DuckCon 2023, Data Council Austin, Modern Data Stack Conference, and Data + AI Summit.

This Month in the DuckDB Ecosystem
The text is about Marcos, who works as a data engineer by day and creates newsletters about topics he's passionate about by night. He partners with the MotherDuck team to share news and developments in the DuckDB ecosystem. The newsletter also features two members of the DuckDB community, Mark and Alex. The text provides a list of the top 10 DuckDB links of the month, including a video series, a tutorial on building a data lake using DuckDB, and an article on using DuckDB with Rust. The text also mentions an upcoming event, DuckCon 2023 User Group, which will take place in Brussels.

MotherDuck Raises $47.5 Million to Make Analytics Fun, Frictionless and Ducking Awesome
MotherDuck, a team of engineers, designers, and leaders from top data companies such as Google and AWS, has raised $47.5 million to deliver a serverless data analytics platform for data of any size. They have also partnered with DuckDB, a highly performant analytics database with a vibrant open-source community. MotherDuck aims to challenge the current status quo by providing a simple and easy-to-use data analytics platform that combines the elegance and speed of DuckDB with the scalability of the cloud. They want users to have easy access to query their data quickly with a personalized, delightful user experience.

Why Use DuckDB for Analytics?
DuckDB is an open source in-process SQL OLAP database management system that enables users to perform fast analyses using plain SQL. It is portable and can be run in virtually any environment with minimal complexity, making it universally useful for data scientists, analysts, data engineers, and application developers. It simplifies data access and enables direct querying of diverse data sources, including Arrow tables and relational databases, with a standard SQL interface. DuckDB has excellent SQL support and is optimized for read operations and large-scale aggregations, with a vectorized query engine that utilizes CPU caches for faster processing. The platform has a rapidly expanding open-source community on Discord supported by a growing DuckDB foundation.

Hello, World! Quack. Quack.
MotherDuck is a software company founded by experienced data geeks who formerly worked for top data companies. They believe that scaling up is better than scaling out, big data is dead and easy data is the future. They think that laptops can process data faster than the cloud and they've teamed up with DuckDB to build the Next New Thing in data. They encourage people to stay in touch and hang out with them on the DuckDB Discord server.

🍵 Blog posts | RSS Feed

DuckDB Ecosystem Newsletter: April 2023

@barakplasma barakplasma marked this pull request as ready for review April 27, 2023 14:55
@piqoni
Copy link
Owner

piqoni commented Apr 29, 2023

Thanks a lot! I did this three weeks ago here dd6fd1b but was not able to test it at the time, so did not merge it. This looks really good and its a reasonable default.

@piqoni piqoni merged commit dfe2691 into piqoni:main Apr 29, 2023
@barakplasma barakplasma deleted the feat--switch-to-GPT3.5-for-cost-savings branch April 30, 2023 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants