Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add comments #18

Merged
merged 1 commit into from
Jul 22, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ private PixelsCacheIdx search(ByteBuffer keyBuffer)
int bytesMatchedInNodeFound = 0;

// get root
// TODO: does root node have an edge?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. root node can also have an edge.
The first entry in the tree is stored in the root node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. root node can also have an edge.
The first entry in the tree is stored in the root node.

Where is the root edge processed?

int currentNodeHeader = indexFile.getInt(currentNodeOffset);
dramAccessCounter++;
int currentNodeChildrenNum = currentNodeHeader & 0x000001FF;
Expand Down Expand Up @@ -179,10 +180,13 @@ private PixelsCacheIdx search(ByteBuffer keyBuffer)
dramAccessCounter++;
currentNodeChildrenNum = currentNodeHeader & 0x000001FF;
currentNodeEdgeSize = (currentNodeHeader & 0x7FFFFE00) >>> 9;
// TODO: does max length of edge = 12? can we move currentNodeEdge allocation out before this loop?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max length of edge is far larger than 12. It takes 22 bits for the size of edge in the node.

Node Header: isKey(1 bit) + edgeSize(22 bits) + childrenSize(9 bits).

I think we can move currentNodeEdge out of the loop and make it a buffer only allocated once, if we make it large enough.

Copy link
Contributor Author

@bianhq bianhq Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max length of edge is far larger than 12. It takes 22 bits for the size of edge in the node.

Node Header: isKey(1 bit) + edgeSize(22 bits) + childrenSize(9 bits).

I think we can move currentNodeEdge out of the loop and make it a buffer only allocated once, if we make it large enough.

What is an edge? is it the key part to match on a node? 22bit -> 4MB edge size, it is so large.
Currently, we have search key = 8 byte block id + 2 byte row group id + 2 byte column id.

byte[] currentNodeEdge = new byte[currentNodeEdgeSize];
// TODO: can we get header, edge and children of a node in one memory access?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, no, we can't do this.
Since the size of edge and children are all stored in header.
We cannot access edge and children without accessing the header first.

Copy link
Contributor Author

@bianhq bianhq Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, no, we can't do this.
Since the size of edge and children are all stored in header.
We cannot access edge and children without accessing the header first.

I see. but can we merge the children and edge into one memory access?

indexFile.getBytes(currentNodeOffset + 4 + currentNodeChildrenNum * 8,
currentNodeEdge, 0, currentNodeEdgeSize);
dramAccessCounter++;
// TODO: numEdgeBytes seems redundant.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It can be replaced by currentNodeEdgeSize.

for (int i = 0, numEdgeBytes = currentNodeEdgeSize; i < numEdgeBytes && bytesMatched < keyLen; i++)
{
if (currentNodeEdge[i] != keyBuffer.get(bytesMatched))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,30 +47,32 @@ public void putIfAbsent(long blockId, short rowGroupId, short columnId, PixelsCa
* |root| -> |te|
* -> |am|
* -> |st|
* <p>
* 1. If we put <test, new_value> into the original tree, it's a EXACT_MATCH with the |st| node.
* Then we add the new_value into the node or replace the node.
* <p>
* </p><p>
* 2. If we put <tea, new_value> into the original tree, it's a KEY_ENDS_AT_MID_EDGE with the |am| node.
* Then we split the |am| node into two.
* |root| -> |te|
* -> |a|
* -> |m|
* -> |st|
* <p>
* </p><p>
* 3. If we put <teak, new_value> into the original tree, it's a MATCH_END_AT_MID_EDGE with the |am| node.
* Then we split the |am| node into three.
* |root| -> |te|
* -> |a|
* -> |m|
* -> |k|
* -> |m|
* -> |k|
* -> |st|
* <p>
* </p><p>
* 4. If we put <teamster, new_value> into the original tree, it's a MATCH_END_AT_END_EDGE with the |am| node.
* Then we add a new node containing the trailing bytes from the key, and append it to the |am| node.
* |root| -> |te|
* -> |am|
* -> |ster|
* -> |ster|
* -> |st|
* </p>
*/
private void putInternal(long blockId, short rowGroupId, short columnId, PixelsCacheIdx cacheIdx, boolean overwrite)
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ private List<Layout> getLayouts(SchemaTableName st)
}

/**
*
* TODO: reload input paths so that LOCACTION in a hive table can be empty or any path.
* @param job
*/
protected void init(JobConf job) throws IOException
Expand Down