Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ws4duck? #7

Open
mskyttner opened this issue Apr 29, 2022 · 11 comments
Open

ws4duck? #7

mskyttner opened this issue Apr 29, 2022 · 11 comments
Assignees

Comments

@mskyttner
Copy link

There is a 🦆 figuring here so I wonder if this nice web service would need considerable refactoring to work with the sqlite-like database duckdb?

There is a go binding and also a C++ REST server component... and if I understand correctly, SQLite databases can also be attached.

Would be interesting to hear your thoughts on this.

@proofrock proofrock self-assigned this Apr 29, 2022
@proofrock
Copy link
Owner

proofrock commented Apr 29, 2022

Haha 😅 the duck is there because my first name in my language also means a mallard, so it's sort of a trademark. I am not familiar with duckdb, admittedly; it could be interesting. I'll take a look. Thanks!

@proofrock
Copy link
Owner

proofrock commented Apr 29, 2022

First thing, the go interface needs CGO; it's very doable of course, but it limits the cross-compilation possibilities. For example, I would need a macos/arm64 and a macos/intel to be able to provide binaries for both... and I don't have a macos/arm64... yet. I just got rid of CGO last week for ws4sqlite, to run in the same thorn bush is something I wouldn't want to do. But in general it seems feasible.

A word: how do in your opinion SQLite and DuckDB compare? Different use cases, or "just" two solutions for the same problem? Sorry for the stupid question, but if DuckDB is tailor-made for a specific use case, then it could be possible that a REST interface isn't useful for that.

@mskyttner
Copy link
Author

For reading and analytical queries, duckdb is fast. And often there is more reading than writing at least in the use cases I mostly work with, such as data analytics. But also I think in general.

There are examples of duckdb being able to be quite resource effective, such as requiring just one single machine to do the equivalence of what needs 32 spark worker nodes using Apache Spark (source: this talk here).

And I like how it can work smoothly with new and old data formats, CSV-files as well as large parquet files on S3 or Arrow data. Also the R and Python integrations are great.

So, I prefer using duckdb for most of my use cases, but for use cases with a lot write-heavy applications (OLTP) with many concurrent writes all the time, I think sqlite (for in-process) or postgres is probably more suitable. Even so, when looking at that data though, I'd personally prefer to use duckdb for fast queries against it. The write-intensive transactional use case might even be the more specialized use case for most applications, perhaps?

I experimented with putting some HTTP/JSON web services on top of duckdb databases here and even cheated with a http bash server directly calling the duckdb CLI to return JSON (so beware if you look at that!). When I found ws4sqlite, I thought it looked very nice and lightweight for using when "dockerizing" a database file to expose it as a msa web service.

Just quacking my few cents 🦆! :)

@proofrock
Copy link
Owner

I am quite satisfied of the on the air protocol (so to speak) and the security provisions aren't too dependent on the database after all. So it shouldn't be too difficult to switch engine... and clients would be already there. When I'll have some time I'll try. Thanks again!

@proofrock
Copy link
Owner

I will not do this, at least for now. The problem is, the build of marcboeker/go-duckdb uses CGO, and ws4sqlite migrated away from that (to improve cross-compilation). I confirm the possibility of forking ws4sqlite and integrating duckdb without many problems; I will leave this issue open if someone is interested. Thanks!

@proofrock
Copy link
Owner

@mskyttner After a loooong time (but it never left my mind for some reason) here it is:

https://github.com/proofrock/duckrg

It took a rewrite to rust (sqliterg) and then a port to duckdb, much easier because the drivers for it are modeled on the ones for sqlite. Lucky.

Let me know if you like it ;-)

@mskyttner
Copy link
Author

@proofrock Wow, so very nice!

I got excited about this and tried to fiddle with getting a container build to work in a fork, in order to test it, and ran into some quirks with the alpine / cross-build / static binary builds ... but I was able to get a (quite fat) container image in the end that seemed to be able to use duckdb extensions too (nice!); details are here... and I tried using this GitHub Action to push the container to GHCR:

Testing it a little bit I was thinking of the storedStatement which is nice, and of something similar, like a storedCommand which could take parameters something along the lines of this (broken quotes, think "^C1")...

{
    "transaction": [
        {
            "statement": "install 'httpfs';"
	},
        {
            "statement": "load 'httpfs';"
	},
	{
            "query": "from read_json_auto('https://api.github.com/users/proofrock/events?per_page=100') select type, count(*) as event_count group by 1 order by 2 desc limit 10"
        },
	{
	    "command": "curl -sL 'https://api.github.com/users/##_username_##/events?per_page=100' \
    | duckdb -s 'COPY (SELECT type, count(*) AS event_count FROM read_json_auto('/dev/stdin') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)' \
    | uplot bar -d, -H -t 'GitHub Events for @##_username_##'",
	    "values": { "username": "proofrock" }
	}
    ]
}

if one knows that the server side have these commands available that can be chained together, it could be nice to be able to make use of them... perhaps a crazy idea :) ... for getting "text/plain" result like the one below from the API...

docker run --rm -v $(pwd)/gh_youplot.sh:/tm
p/gh_youplot.sh ghcr.io/mskyttner/duckrg /tmp/gh_youplot.sh
                           GitHub Events for @dproofrock
                     ┌                                        ┐ 
           PushEvent ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 51.0   
         CreateEvent ┤■■■■■■■■■■■■■■■ 22.0                      
         DeleteEvent ┤■■■■■■■■■■■ 17.0                          
        ReleaseEvent ┤■■■ 4.0                                   
         IssuesEvent ┤■ 2.0                                     
          WatchEvent ┤■ 2.0                                     
    PullRequestEvent ┤■ 1.0                                     
   IssueCommentEvent ┤■ 1.0                                     
                     └                                        ┘ 

@proofrock
Copy link
Owner

@mskyttner thanks for all this. Do you have time/interest to move your last message to duckrg's discussion section? Also, if you want and can, would you mind to contribute a PR for a "sane" dockerfile (ubuntu or debian-based is ok, maybe without the extra stuff)? If not, I can adapt yours and give proper attribution, of course, but a PR would give a better visibility. Thanks for taking the time to tinker with build scripts, I just ported them from sqliterg but didn't have time to properly check them.

For the rest, cool ideas. I'll give you answers over at the discussion, or if you can't open there I'll follow up here.

Thanks again!

@mskyttner
Copy link
Author

Sure, a pleasure, I moved that idea to proofrock/duckrg#1 and will attempt a PR for a Dockerfile (starting with a x64-amd variant).

@proofrock
Copy link
Owner

Hi Markus! Please take a look at #44 and contribute if you want.

@proofrock
Copy link
Owner

Oh well, it took a while but proper duckdb integration is here.

See https://github.com/proofrock/ws4sqlite/releases/tag/v0.17dev3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants