Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ctc edit distance operator #5300

Merged
merged 16 commits into from
Jan 10, 2018
Merged

Conversation

kuke
Copy link
Contributor

@kuke kuke commented Nov 2, 2017

Resolve #4744

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • About the op's name, I'd like to name it to edit_distance_op, because it is not limited to ctc.
  • Need to support batch_size, the inputs should be LoDTensor.
  • Need to call path2String to remove the uncared symbols link blank.

"hypothesis string");
AddInput("X2",
"(2-D tensor with shape [N x 1]) The indices "
"for reference string.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • As I know, the inputs of this evaluator are predicted result and ground truth, please use more meaningful names.
  • What is the meaning of M and N? Please give more explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"(bool, default false) Indicated whether "
"normalize the Output(Out) by the length of reference "
"string (X2).")
.SetDefault(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ctc, the blank should be removed from the predicted result. In attention, the eos and sos should be removed. Maybe here, we need an attribute named uncare of type std::vector<int>, to set the elements that need to be removed.
It is not implemented in CTCErrorEvaluator.cpp, but is a real need from ocr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that the operator needs to accept some uncared input tokens. But it would make the code dirty allowing for the CUDA kernel. How about implementing an independent op to remove the uncared tokens?

.SetDefault(false);
AddOutput("Out",
"(2-D tensor with shape [1 x 1]) "
"The output distance of CTCEditDistance operator.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size should be supported in this op, the output's shape should be [batch_size, 1].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
framework::Tensor dist_t;
dist_t.Resize({m + 1, n + 1});
dist_t.mutable_data<T>(ctx.GetPlace());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary because filling the first row and column of this matrix is enough.


auto m = x1_t->numel();
auto n = x2_t->numel();
T distance = 0.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old codes count the substitution, deletion, insertion error, why not implement it in the op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually these counts seem useless.

void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X1"), "Input(X1) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("X2"), "Input(X2) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasOutput("Out"), "Output(Out) shouldn't be null.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check the shape of inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update by following most the comments. Thanks!

void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X1"), "Input(X1) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("X2"), "Input(X2) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasOutput("Out"), "Output(Out) shouldn't be null.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"hypothesis string");
AddInput("X2",
"(2-D tensor with shape [N x 1]) The indices "
"for reference string.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"(bool, default false) Indicated whether "
"normalize the Output(Out) by the length of reference "
"string (X2).")
.SetDefault(false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that the operator needs to accept some uncared input tokens. But it would make the code dirty allowing for the CUDA kernel. How about implementing an independent op to remove the uncared tokens?

.SetDefault(false);
AddOutput("Out",
"(2-D tensor with shape [1 x 1]) "
"The output distance of CTCEditDistance operator.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


auto m = x1_t->numel();
auto n = x2_t->numel();
T distance = 0.0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually these counts seem useless.

} else {
framework::Tensor dist_t;
dist_t.Resize({m + 1, n + 1});
dist_t.mutable_data<T>(ctx.GetPlace());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary because filling the first row and column of this matrix is enough.

"The indices for hypothesis strings.");
AddInput("Refs",
"(2-D LoDTensor, 2nd dim. equal to 1) "
"The indices for reference strings.");
Copy link
Contributor

@qingqing01 qingqing01 Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should tell the users the type of Hyps and Refs is int.

2-D  LoDTensor <int> 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

n);
distance[num] = distance[num] / n;
}
out[num] = distance[num];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, distance can be variable in T type, not a std::vector<T>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
SetOutput<T><<<1, 1, 0, stream>>>(out + num, dist, m, n, normalized);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GPU implementation may be less efficient, it may be slower than CPU implementation. The for loop in line 97 also can be paralleled. But you can not change it in this PR. We can optimize it in the future when necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There should be a lot efficiency improvement for the batch input

Copy link
Contributor Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks

"The indices for hypothesis strings.");
AddInput("Refs",
"(2-D LoDTensor, 2nd dim. equal to 1) "
"The indices for reference strings.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
SetOutput<T><<<1, 1, 0, stream>>>(out + num, dist, m, n, normalized);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There should be a lot efficiency improvement for the batch input

n);
distance[num] = distance[num] / n;
}
out[num] = distance[num];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kuke kuke merged commit 861b84f into PaddlePaddle:develop Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants