Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance testing triplestores for Islandora community recommendations #30

Open
ruebot opened this issue Mar 24, 2015 · 20 comments
Open
Labels
help wanted Seeking a volunteer or co-worker Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. Type: question asks for support (asks a question)

Comments

@ruebot
Copy link
Member

ruebot commented Mar 24, 2015

  1. Identify triplestores
  2. Identify benchmark tests
@DiegoPino
Copy link
Contributor

Nice @ruebot

I would like to add also some requirements for the possible candidates if possible:

  • 100% Sparql 1.1 compliant
  • Opensource =)
  • Good/active community+developers
  • Nice if capable of making distributed/cross queries (my future needs)
  • Horizontal Scaling and Clustering
  • Multiple storage choices

Quick list from google

  1. Identify triplestores
    • Apache Jena Fuseki (2.0?)
    • BigOWLIM
    • D2R Server
    • Sesame
    • Open Link Virtuoso
    • 4store
    • AllegroGraph (not opensource)
    • BlazeGraph (Thanks @daniel-dgi )

Some existing work on benchmarks

@ruebot
Copy link
Member Author

ruebot commented Mar 24, 2015

@DiegoPino++

@daniel-dgi
Copy link
Contributor

Wanna throw BlazeGraph into the mix: http://www.blazegraph.com/ . It's what wikipedia is using.

@DiegoPino
Copy link
Contributor

Nice addition @daniel-dgi. BlazeGraph Looks really good. ++ for testing that one first.

@ruebot
Copy link
Member Author

ruebot commented Apr 23, 2015

Shall we identify benchmarks, and datasets from this RdfStoreBenchmarking list? Maybe we can coordinate with the Fedora community? Get some input there as well?

looks at @awoods

@awoods
Copy link

awoods commented Apr 23, 2015

It would be good to identify usage characteristics and expectations of the community in order to ensure that we are looking at the right metrics. As a side note, I believe @no-reply at DPLA is also planning on such an analysis. Maybe we can extend the coordination.

@DiegoPino
Copy link
Contributor

Hi, do we have some stats on how many triples do we will get for every FF object?

@awoods
Copy link

awoods commented Apr 23, 2015

No, but that should be easy to determine. My guess is 20.

@DiegoPino
Copy link
Contributor

Ok, that's less than what we got now in Fedora 3. A simple object with RELS-EXT + full DC document gives me about 30.

@awoods
Copy link

awoods commented Apr 23, 2015

You will want to check what the F4 triples look like from your specific data, of course. I was just throwing out a guess. 30 may be closer to the truth.

@DiegoPino
Copy link
Contributor

Thanks @awoods! , i just wan't to try to infer what will be the reality for the largest (and ever growing) islandora implementations we have on the community. @ruebot , do you think we could make a quick and dirty poll about this on the google group? Like "how many objects are you handling right now, and how fast are you growing every year"?. I have read in the group of repos with over 250000 objects. That's 7.500.000 triples. To have this as basis to identify usage "characteristics and expectations" as @awoods correctly stated.

@DiegoPino
Copy link
Contributor

Looks like LUBM: http://swat.cse.lehigh.edu/projects/lubm/ is a standard test sets and tools used on benchmarking triple stores. At least Oracle thinks so!
http://download.oracle.com/otndocs/tech/semantic_web/pdf/OracleSpatialGraph_RDFgraph_1_trillion_Benchmark.pdf

@dmoses
Copy link
Contributor

dmoses commented Apr 24, 2015

fyi ... Open Link Virtuoso (i believe) is also used by the OSF for Drupal project

@DiegoPino
Copy link
Contributor

Nice Donald! OSF for drupal looks like a nice addition, reading quickly through the documentation i see there is a lot of things we could do without having to write custom code, even importing whole ontologies. Also 3.2 version does not require Virtuoso anymore, you can use any Triple store, even better. Thanks a lot, this could make the bridge and bring Linked data to Drupal.

@whikloj whikloj mentioned this issue May 27, 2015
@ruebot ruebot added help wanted Seeking a volunteer or co-worker question labels Apr 7, 2016
@ruebot
Copy link
Member Author

ruebot commented Apr 23, 2016

This could be done as Fedora community Performance Scaling & Testing; relevant agenda item from this meeting.

@ruebot
Copy link
Member Author

ruebot commented Jun 29, 2016

Because sometimes we have a conversation on Twitter a year or so later:
https://twitter.com/ruebot/status/747955866385539072

...and a document now thanks to @cmh2166
https://docs.google.com/document/d/1EoD-JD4OxF9M-pfifQxF_0U7CLGThd8cMzjFH0DwKgU/edit#heading=h.84vdault4l0g

@no-reply
Copy link

For Ruby users, I've done some initial work on a benchmark suite for ruby-rdf at: https://github.com/ruby-rdf/rdf-benchmark

My hope is that this will become a general purpose benchmark for RDF.rb, using the Berlin Benchmark data generator. It's early days, still, but the work might have more general usefulness.

@bradspry
Copy link

bradspry commented Jul 13, 2016

Blazegraph GPU on AWS EC2 G2 Family :-)

@ruebot
Copy link
Member Author

ruebot commented Jul 18, 2016

Add Stardog to the list. h/t @ajs6f

http://sparqlscore.com/ too

@ajs6f
Copy link

ajs6f commented Jul 18, 2016

Stardog is not open source, although in my experience @kendall at @Complexible is approachable and very willing to have discussions about favorable licensing terms. I had that experience in the context of work I did for @ddavis at @Smithsonian, so YMMV.

@kstapelfeldt kstapelfeldt added the Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. label Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Seeking a volunteer or co-worker Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. Type: question asks for support (asks a question)
Projects
Development

No branches or pull requests

9 participants