-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ws4duck? #7
Comments
Haha 😅 the duck is there because my first name in my language also means a mallard, so it's sort of a trademark. I am not familiar with duckdb, admittedly; it could be interesting. I'll take a look. Thanks! |
First thing, the go interface needs CGO; it's very doable of course, but it limits the cross-compilation possibilities. For example, I would need a macos/arm64 and a macos/intel to be able to provide binaries for both... and I don't have a macos/arm64... yet. I just got rid of CGO last week for ws4sqlite, to run in the same thorn bush is something I wouldn't want to do. But in general it seems feasible. A word: how do in your opinion SQLite and DuckDB compare? Different use cases, or "just" two solutions for the same problem? Sorry for the stupid question, but if DuckDB is tailor-made for a specific use case, then it could be possible that a REST interface isn't useful for that. |
For reading and analytical queries, duckdb is fast. And often there is more reading than writing at least in the use cases I mostly work with, such as data analytics. But also I think in general. There are examples of duckdb being able to be quite resource effective, such as requiring just one single machine to do the equivalence of what needs 32 spark worker nodes using Apache Spark (source: this talk here). And I like how it can work smoothly with new and old data formats, CSV-files as well as large parquet files on S3 or Arrow data. Also the R and Python integrations are great. So, I prefer using duckdb for most of my use cases, but for use cases with a lot write-heavy applications (OLTP) with many concurrent writes all the time, I think sqlite (for in-process) or postgres is probably more suitable. Even so, when looking at that data though, I'd personally prefer to use duckdb for fast queries against it. The write-intensive transactional use case might even be the more specialized use case for most applications, perhaps? I experimented with putting some HTTP/JSON web services on top of duckdb databases here and even cheated with a http bash server directly calling the duckdb CLI to return JSON (so beware if you look at that!). When I found ws4sqlite, I thought it looked very nice and lightweight for using when "dockerizing" a database file to expose it as a msa web service. Just quacking my few cents 🦆! :) |
I am quite satisfied of the on the air protocol (so to speak) and the security provisions aren't too dependent on the database after all. So it shouldn't be too difficult to switch engine... and clients would be already there. When I'll have some time I'll try. Thanks again! |
I will not do this, at least for now. The problem is, the build of |
@mskyttner After a loooong time (but it never left my mind for some reason) here it is: https://github.com/proofrock/duckrg It took a rewrite to rust ( Let me know if you like it ;-) |
@proofrock Wow, so very nice! I got excited about this and tried to fiddle with getting a container build to work in a fork, in order to test it, and ran into some quirks with the alpine / cross-build / static binary builds ... but I was able to get a (quite fat) container image in the end that seemed to be able to use duckdb extensions too (nice!); details are here... and I tried using this GitHub Action to push the container to GHCR: Testing it a little bit I was thinking of the storedStatement which is nice, and of something similar, like a storedCommand which could take parameters something along the lines of this (broken quotes, think "^C1")... {
"transaction": [
{
"statement": "install 'httpfs';"
},
{
"statement": "load 'httpfs';"
},
{
"query": "from read_json_auto('https://api.github.com/users/proofrock/events?per_page=100') select type, count(*) as event_count group by 1 order by 2 desc limit 10"
},
{
"command": "curl -sL 'https://api.github.com/users/##_username_##/events?per_page=100' \
| duckdb -s 'COPY (SELECT type, count(*) AS event_count FROM read_json_auto('/dev/stdin') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)' \
| uplot bar -d, -H -t 'GitHub Events for @##_username_##'",
"values": { "username": "proofrock" }
}
]
} if one knows that the server side have these commands available that can be chained together, it could be nice to be able to make use of them... perhaps a crazy idea :) ... for getting "text/plain" result like the one below from the API... docker run --rm -v $(pwd)/gh_youplot.sh:/tm
p/gh_youplot.sh ghcr.io/mskyttner/duckrg /tmp/gh_youplot.sh
GitHub Events for @dproofrock
┌ ┐
PushEvent ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 51.0
CreateEvent ┤■■■■■■■■■■■■■■■ 22.0
DeleteEvent ┤■■■■■■■■■■■ 17.0
ReleaseEvent ┤■■■ 4.0
IssuesEvent ┤■ 2.0
WatchEvent ┤■ 2.0
PullRequestEvent ┤■ 1.0
IssueCommentEvent ┤■ 1.0
└ ┘ |
@mskyttner thanks for all this. Do you have time/interest to move your last message to duckrg's discussion section? Also, if you want and can, would you mind to contribute a PR for a "sane" dockerfile (ubuntu or debian-based is ok, maybe without the extra stuff)? If not, I can adapt yours and give proper attribution, of course, but a PR would give a better visibility. Thanks for taking the time to tinker with build scripts, I just ported them from sqliterg but didn't have time to properly check them. For the rest, cool ideas. I'll give you answers over at the discussion, or if you can't open there I'll follow up here. Thanks again! |
Sure, a pleasure, I moved that idea to proofrock/duckrg#1 and will attempt a PR for a Dockerfile (starting with a x64-amd variant). |
Hi Markus! Please take a look at #44 and contribute if you want. |
Oh well, it took a while but proper See https://github.com/proofrock/ws4sqlite/releases/tag/v0.17dev3 |
There is a 🦆 figuring here so I wonder if this nice web service would need considerable refactoring to work with the sqlite-like database duckdb?
There is a go binding and also a C++ REST server component... and if I understand correctly, SQLite databases can also be attached.
Would be interesting to hear your thoughts on this.
The text was updated successfully, but these errors were encountered: