ws4duck? #7

mskyttner · 2022-04-29T08:33:56Z

There is a 🦆 figuring here so I wonder if this nice web service would need considerable refactoring to work with the sqlite-like database duckdb?

There is a go binding and also a C++ REST server component... and if I understand correctly, SQLite databases can also be attached.

Would be interesting to hear your thoughts on this.

proofrock · 2022-04-29T09:03:52Z

Haha 😅 the duck is there because my first name in my language also means a mallard, so it's sort of a trademark. I am not familiar with duckdb, admittedly; it could be interesting. I'll take a look. Thanks!

proofrock · 2022-04-29T09:22:34Z

First thing, the go interface needs CGO; it's very doable of course, but it limits the cross-compilation possibilities. For example, I would need a macos/arm64 and a macos/intel to be able to provide binaries for both... and I don't have a macos/arm64... yet. I just got rid of CGO last week for ws4sqlite, to run in the same thorn bush is something I wouldn't want to do. But in general it seems feasible.

A word: how do in your opinion SQLite and DuckDB compare? Different use cases, or "just" two solutions for the same problem? Sorry for the stupid question, but if DuckDB is tailor-made for a specific use case, then it could be possible that a REST interface isn't useful for that.

mskyttner · 2022-04-29T10:23:03Z

For reading and analytical queries, duckdb is fast. And often there is more reading than writing at least in the use cases I mostly work with, such as data analytics. But also I think in general.

There are examples of duckdb being able to be quite resource effective, such as requiring just one single machine to do the equivalence of what needs 32 spark worker nodes using Apache Spark (source: this talk here).

And I like how it can work smoothly with new and old data formats, CSV-files as well as large parquet files on S3 or Arrow data. Also the R and Python integrations are great.

So, I prefer using duckdb for most of my use cases, but for use cases with a lot write-heavy applications (OLTP) with many concurrent writes all the time, I think sqlite (for in-process) or postgres is probably more suitable. Even so, when looking at that data though, I'd personally prefer to use duckdb for fast queries against it. The write-intensive transactional use case might even be the more specialized use case for most applications, perhaps?

I experimented with putting some HTTP/JSON web services on top of duckdb databases here and even cheated with a http bash server directly calling the duckdb CLI to return JSON (so beware if you look at that!). When I found ws4sqlite, I thought it looked very nice and lightweight for using when "dockerizing" a database file to expose it as a msa web service.

Just quacking my few cents 🦆! :)

proofrock · 2022-04-29T10:30:29Z

I am quite satisfied of the on the air protocol (so to speak) and the security provisions aren't too dependent on the database after all. So it shouldn't be too difficult to switch engine... and clients would be already there. When I'll have some time I'll try. Thanks again!

proofrock · 2022-11-28T07:06:53Z

I will not do this, at least for now. The problem is, the build of marcboeker/go-duckdb uses CGO, and ws4sqlite migrated away from that (to improve cross-compilation). I confirm the possibility of forking ws4sqlite and integrating duckdb without many problems; I will leave this issue open if someone is interested. Thanks!

proofrock · 2023-10-09T17:48:03Z

@mskyttner After a loooong time (but it never left my mind for some reason) here it is:

https://github.com/proofrock/duckrg

It took a rewrite to rust (sqliterg) and then a port to duckdb, much easier because the drivers for it are modeled on the ones for sqlite. Lucky.

Let me know if you like it ;-)

mskyttner · 2023-10-10T20:54:31Z

@proofrock Wow, so very nice!

I got excited about this and tried to fiddle with getting a container build to work in a fork, in order to test it, and ran into some quirks with the alpine / cross-build / static binary builds ... but I was able to get a (quite fat) container image in the end that seemed to be able to use duckdb extensions too (nice!); details are here... and I tried using this GitHub Action to push the container to GHCR:

Testing it a little bit I was thinking of the storedStatement which is nice, and of something similar, like a storedCommand which could take parameters something along the lines of this (broken quotes, think "^C1")...

{
    "transaction": [
        {
            "statement": "install 'httpfs';"
	},
        {
            "statement": "load 'httpfs';"
	},
	{
            "query": "from read_json_auto('https://api.github.com/users/proofrock/events?per_page=100') select type, count(*) as event_count group by 1 order by 2 desc limit 10"
        },
	{
	    "command": "curl -sL 'https://api.github.com/users/##_username_##/events?per_page=100' \
    | duckdb -s 'COPY (SELECT type, count(*) AS event_count FROM read_json_auto('/dev/stdin') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)' \
    | uplot bar -d, -H -t 'GitHub Events for @##_username_##'",
	    "values": { "username": "proofrock" }
	}
    ]
}

if one knows that the server side have these commands available that can be chained together, it could be nice to be able to make use of them... perhaps a crazy idea :) ... for getting "text/plain" result like the one below from the API...

docker run --rm -v $(pwd)/gh_youplot.sh:/tm
p/gh_youplot.sh ghcr.io/mskyttner/duckrg /tmp/gh_youplot.sh
                           GitHub Events for @dproofrock
                     ┌                                        ┐ 
           PushEvent ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 51.0   
         CreateEvent ┤■■■■■■■■■■■■■■■ 22.0                      
         DeleteEvent ┤■■■■■■■■■■■ 17.0                          
        ReleaseEvent ┤■■■ 4.0                                   
         IssuesEvent ┤■ 2.0                                     
          WatchEvent ┤■ 2.0                                     
    PullRequestEvent ┤■ 1.0                                     
   IssueCommentEvent ┤■ 1.0                                     
                     └                                        ┘

proofrock · 2023-10-12T10:37:39Z

@mskyttner thanks for all this. Do you have time/interest to move your last message to duckrg's discussion section? Also, if you want and can, would you mind to contribute a PR for a "sane" dockerfile (ubuntu or debian-based is ok, maybe without the extra stuff)? If not, I can adapt yours and give proper attribution, of course, but a PR would give a better visibility. Thanks for taking the time to tinker with build scripts, I just ported them from sqliterg but didn't have time to properly check them.

For the rest, cool ideas. I'll give you answers over at the discussion, or if you can't open there I'll follow up here.

Thanks again!

mskyttner · 2023-10-12T14:20:22Z

Sure, a pleasure, I moved that idea to proofrock/duckrg#1 and will attempt a PR for a Dockerfile (starting with a x64-amd variant).

proofrock · 2024-02-15T07:29:48Z

Hi Markus! Please take a look at #44 and contribute if you want.

proofrock · 2025-01-14T12:50:55Z

Oh well, it took a while but proper duckdb integration is here.

See https://github.com/proofrock/ws4sqlite/releases/tag/v0.17dev3

proofrock self-assigned this Apr 29, 2022

proofrock added the suggestion label Apr 29, 2022

proofrock added the open to consideration label Apr 29, 2022

proofrock added the frozen label Nov 28, 2022

proofrock removed the frozen label Oct 9, 2023

proofrock added done and removed open to consideration labels Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ws4duck? #7

ws4duck? #7

mskyttner commented Apr 29, 2022

proofrock commented Apr 29, 2022 •

edited

Loading

proofrock commented Apr 29, 2022 •

edited

Loading

mskyttner commented Apr 29, 2022

proofrock commented Apr 29, 2022

proofrock commented Nov 28, 2022

proofrock commented Oct 9, 2023

mskyttner commented Oct 10, 2023

proofrock commented Oct 12, 2023

mskyttner commented Oct 12, 2023

proofrock commented Feb 15, 2024

proofrock commented Jan 14, 2025

ws4duck? #7

ws4duck? #7

Comments

mskyttner commented Apr 29, 2022

proofrock commented Apr 29, 2022 • edited Loading

proofrock commented Apr 29, 2022 • edited Loading

mskyttner commented Apr 29, 2022

proofrock commented Apr 29, 2022

proofrock commented Nov 28, 2022

proofrock commented Oct 9, 2023

mskyttner commented Oct 10, 2023

proofrock commented Oct 12, 2023

mskyttner commented Oct 12, 2023

proofrock commented Feb 15, 2024

proofrock commented Jan 14, 2025

proofrock commented Apr 29, 2022 •

edited

Loading

proofrock commented Apr 29, 2022 •

edited

Loading