Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor & SelectMapper Refactor (WIP) #4180

Merged
merged 9 commits into from
Sep 22, 2015

Conversation

benbjohnson
Copy link
Contributor

Overview

This pull request refactors the SelectMapper into separate raw and aggregate mappers and it refactors the Cursor implementation to return int64 keys and interface{} values.

This pull request is largely complete but still has test failures

@otoolep
Copy link
Contributor

otoolep commented Sep 21, 2015

Can you explain why this will help with the upcoming engine changes? We started months ago with a single Mapper type, the first round of DQ changes split it into Raw and Aggregate, and @dgnorton merged it back to a single type. Now this change splits it again.

I'd like to make sure I have the full context.

@dgnorton
Copy link
Contributor

I didn't merge raw and agg mappers. At least not that I remember. :)

func (tx *Tx) Cursor(key string, direction tsdb.Direction) tsdb.Cursor {
// Retrieve key bucket.
b := tx.Bucket([]byte(key))
// Cursor returns an iterator for a key over a single field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be reading this wrong, but we say single field here, do we mean single point? Later, in the cursor.read method we check to see if we have a single field or multiple fields, which implies we can cursor over multiple fields right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cursor() was to do a single field originally. It was too hard to split that out right now so we'll have to wrap the pd storage with a wrapping cursor to join individual field cursors.

@otoolep
Copy link
Contributor

otoolep commented Sep 21, 2015

@dgnorton -- yeah, I stated that incorrectly. What you did do is embed the Raw Mapper inside a LocalMapper i.e. set a Local into Raw mode. In other words, we're churning the implementation of this, and I wanted to understand why we're reverting to an earlier form. Perhaps it's for the same reasons I first separated the two Mappers.

@benbjohnson
Copy link
Contributor Author

@otoolep I resolved some more test failures but it's really slow going. I'm getting a type assertion error now. It's difficult to keep track of which fields are getting processed where. I pushed my latest code (98328ab).

@otoolep
Copy link
Contributor

otoolep commented Sep 21, 2015

OK @benbjohnson - I will take a look.

I was mistaken about work @dgnorton had done, I forgot that I merged the mappers.

@@ -7,7 +7,7 @@ if [ $fmtcount -gt 0 ]; then
fi

# Due to the way composites work, vet will fail for some of our tests so we ignore it
vetcount=`go tool vet --composites=false ./ 2>&1 | wc -l`
vetcount=`go tool vet --composites=false --methods=false ./ 2>&1 | wc -l`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you have a better method name than Seek().

It probably makes sense to move to iterators (with just a Next() method) instead of cursors but that's a refactor for another time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit this before and just used the name SeekTo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see you also used SeekTo, according to a comment below. So can this go vet change be pulled?

// Read items until we have a key that doesn't match the previously read one.
// This is to perform deduplication when there's multiple items with the same key.
// The highest priority cursor will be read first and then remaining keys will be dropped.
for {
// Return nil if there are no more items left.
// Return EOF marker if there are no more items left.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil isn't allowed now that we're using int64 so I wanted to declare the marker more explicitly than simply doing -1.

@otoolep
Copy link
Contributor

otoolep commented Sep 22, 2015

I took a look at this, and it generally looks fine.

Can you give us a couple of lines on your thinking, about why you wanted this change?

@benbjohnson benbjohnson force-pushed the refactor-select-mapper branch from b99da42 to 56cb2fa Compare September 22, 2015 19:10
@benbjohnson benbjohnson force-pushed the refactor-select-mapper branch from 639db30 to 96715d7 Compare September 22, 2015 19:23
benbjohnson added a commit that referenced this pull request Sep 22, 2015
@benbjohnson benbjohnson merged commit 522f9ea into influxdata:master Sep 22, 2015
@benbjohnson benbjohnson deleted the refactor-select-mapper branch September 22, 2015 20:07
@benbjohnson
Copy link
Contributor Author

@otoolep Sure, the main goal for this change was to make it so that Tx.Cursor() could accept a field name instead of returning an encoded byte slice. That's because @pauldix's new engine is columnar-based and it handles encoding/decoding inside the engine. Unfortunately, breaking it into a single field was really difficult so it accepts a list of fields ([]string) instead. We'll have to wrap Paul's cursors in a parent cursor to handle multiple fields being grouped for the short term.

There were a few issues with passing fields into the cursor:

  1. Decoding had to move from tsdb down into the individual engines. This is how it should have been from the beginning so it's good that we made this change. It's ultimately the engine's responsibility to determine how to handle encoding.
  2. Cursor and mapper code was very, very tightly coupled in many spots. This caused me to split out the aggregate, raw, and remote mappers out separately. These could still use some refactoring still but they're much cleaner than they were.

@otoolep
Copy link
Contributor

otoolep commented Sep 22, 2015

Thanks @benbjohnson -- both of those points are good ones, with which I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants