-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A general "refresh" to the Accumulo binding #947
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine from an Accumulo perspective. A few minor comments that don't change my opinion.
// Couldn't spit out the mutations we wanted. | ||
// Ignore this for now. | ||
System.err.println("MutationsRejectedException: " + e.getMessage()); | ||
public BatchWriter getWriter(String t) throws TableNotFoundException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just call it table? Vestigial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, good catch. This was originally because of the table
member (checkstyle rule about a local variable masking a class member). Will fix.
if (null != numThreadsValue) { | ||
numThreads = Integer.parseInt(numThreadsValue); | ||
} | ||
System.out.println("Using " + numThreads + " threads to write data"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if this println remains? Given what the harness does I'm inclined not to really care if this is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be on stderr of it's going to be manually printed. Otherwise it'll mess up folks using stdout to read results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise it'll mess up folks using stdout to read results.
Will do!
new ByteArrayByteIterator(buf)); | ||
} | ||
} catch (Exception e) { | ||
System.err.println("Error trying to reading Accumulo table" + key + e); | ||
System.err.println("Error trying to reading Accumulo table " + t + " " + key); | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^same?
Iterator<BatchWriter> iterator = writers.values().iterator(); | ||
while (iterator.hasNext()) { | ||
BatchWriter writer = iterator.next(); | ||
writer.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this throw MutationsRejectedException? Should we safeguard this and remove it from the writers anyway? Will it matter at this point in the process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the cleanup point, there isn't much else we can do but throw an exception. The cleanup here is best-effort, but probably unnecessary (I don't have a use-case for how YCSB would reasonably recover from this state)
// a concurrent data structure is overkill (especially in such a hot code path). | ||
// However, the impact seems to be relatively negligible in trivial local tests and it's | ||
// "more correct" WRT to the API. | ||
BatchWriter writer = writers.get(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you look into using MultiTableBatchWriter.getBatchWriter(String)? It does what you are doing here, but its performance may be worse.
I looked at the MTBW implementation, it may do more reads to main memory than what you have in this patch. The MTBW is trying to handle the case of a table being renamed, which causes more reads to main memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the MTBW implementation, it may do more reads to main memory than what you have in this patch. The MTBW is trying to handle the case of a table being renamed, which causes more reads to main memory.
I did forget that MTBW was a thing. It would be nice if YCSB ever actually provided more than one table to write to. It would be nice to explore this as a follow-on.
Would switching to a logging api still be in scope for the amount of effort you're looking to do here? |
Please start the commit message with "[accumulo]". |
Could be. I wasn't sure if there was a reason a logger wasn't used in the first place :)
Will do. Thanks for the review, folks. |
accumulo/README.md
Outdated
Ruby script, based on the HBase README, can generate adequate split-point. 10's of Tablets per | ||
TabletServer is a good starting point. | ||
|
||
$ echo 'num_splits = 20; puts (1..num_splits).map {|i| "user#{1000+i*(9999-1000)/num_splits}"}' | ruby > /tmp/splits.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
accumulo/README.md
Outdated
|
||
On repeated data loads, the following commands may be helpful to re-set the state of the table quickly. | ||
|
||
accumulo> createtable tmp -cs usertable -cc usertable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if these flags have long versions that more clearly self-document setting things up. If they do, those would be preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I've switched the non-obvious ones over (everything but -t
and -s
).
@@ -36,7 +36,38 @@ Git clone YCSB and compile: | |||
cd YCSB | |||
mvn -pl com.yahoo.ycsb:aerospike-binding -am clean package | |||
|
|||
### 3. Load Data and Run Tests | |||
### 3. Create the Accumulo table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you spell out what versions these commands work with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like >=1.7 would work with what's in the readme, with table durability and the server-side mutation buffer lacking prior. I've tried to clarify this.
c16977f
to
144e9f6
Compare
the travis failure is a checkstyle violation for line length, fyi. |
Ah, thanks. I was unintentionally ignoring that check :) I'll fix that up and then squash/rebase these together |
* Expand on the README, covering table creation and best-practices for table config * Avoid unnecessary Text object creations (in loops and instead of byte[] usage) * Use a ConcurrentHashMap to better match the DB API * Fix error messages and always call printStackTrace() on exceptions * Use BATCHED_OK instead of OK in insert() (more correct)
5026a87
to
94358fd
Compare
@busbey are you able to merge this for me? :) |
@joshelser looks like all comments were addressed. I can merge it. PS - Guess we keep crossing internet paths today :P |
Thanks @risdenk!
I like it :) |
…per#947) * Expand on the README, covering table creation and best-practices for table config * Avoid unnecessary Text object creations (in loops and instead of byte[] usage) * Use a ConcurrentHashMap to better match the DB API * Fix error messages and always call printStackTrace() on exceptions * Use BATCHED_OK instead of OK in insert() (more correct)
* master: [core] Fixing squid:S1319 - Declarations should use Java collection interfaces such as "List" rather than specific implementation classes such as "LinkedList". (manolama - updated bindings added since the PR) [core] Use longs instead of ints to support larger key spaces. Changed int to long in Measurements code to support large scale workloads. (manolama - fixed checkstyle errors) [core] Export totalHistogram for HdrHistogram measurement [core] Add an operation enum to the Workload class. This can eventually be used to replace the strings. [core] Add a Fisher-Yates array shuffle to the Utils class. [core] Fix an issue where the threadid and threadCount were not passed to the workload client threads. Had to use setters to get around the checkstyle complaint of having too many parameters. Upgrading googlebigtable to the latest version. The API used by googlebigtable has had quite a bit of churn. This is the minimal set of changes required for the upgrade. [geode] Update to apache-geode 1.2.0 release [core] Update to use newer version of Google Cloud Spanner client and associated required change [core] Add a reset() method to the ByteIterator abstract and implementations for each of the children. This lets us re-use byte iterators if we need to access the values again (when applicable). [hbase12] Add HBase 1.2+ specific client that relies on the shaded client artifact provided by those versions. (brianfrankcooper#970) [distro] Refresh Apache licence text (brianfrankcooper#969) [memcached] support binary protocol (brianfrankcooper#965) [accumulo] A general "refresh" to the Accumulo binding (brianfrankcooper#947) [cloudspanner] Add binding for Google's Cloud Spanner. (brianfrankcooper#939) [aerospike] Change the write policy to REPLACE_ONLY (brianfrankcooper#937)
Text
object creations (in loops and instead ofbyte[]
usage)ConcurrentHashMap
to better match the DB APIprintStackTrace()
on exceptionsBATCHED_OK
instead ofOK
ininsert()
(more correct)Tagging some potentially interested folks.. @busbey @madrob @keith-turner