-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major refactoring of package structure #189
Conversation
A bit more explanation of the new revised package structure: For all scripts, the two most important imports are:
Think of the first as "base" AUT, and then all the UDFs in matchbox.
|
Okay here is the revised test script I have used so far. It works so far with example, but I have some other warcs I can use locally as well.
and the initial results (which seem in kind with previous tests conducted at #121):
|
Same code, different warcs (a collection of files this time):
|
Okay after contending with some Java Heap Space Errors, I got a really good sized set of warcs to run fine. From my perspective, this is ready to merge.
|
As I work on the docs - when does nevermind - just when it was called previously? |
Should I fold it so it gets auto-imported in This is a tradeoff between convenience/magic and "knowing exactly what you're doing". If there's too much magic, it become difficult to understand certain behaviors... |
My vote would be to fold it in so it gets auto-imported in
and not have to worry about explaining when to use x and/or y, etc. |
I have this branch of AUT hooked up to AUK, btw, so will run some tests overnight. So far on smaller collections all looking good. |
I think StringUtils is a good utility for most aut jobs but JsonUtils would be better to keep out unless someone specifically wants Json-y output. Same for ImageUtils if that idea becomes part of the refactoring (an alternative might be to do Image-related work [not image urls] as an aut plugin). |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks fine to me. Just a couple doc comment, and license header issues to sort out.
Once @ianmilligan1 and @greebie are satisfied with their testing, and I get some time to do a smoke test, I'll merge this.
After that, I'll update https://github.com/archivesunleashed/aut/tree/issue-184 and @helgeho will need to update #186.
Sound good?
import scala.xml.Utility._ | ||
|
||
/** | ||
* Created by jimmylin on 4/4/18. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we can an actual description here? Created by isn't needed since it'll show up in the commit history as authored by you.
@@ -1,34 +1,31 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave the license headers in?
|
||
import scala.reflect.ClassTag | ||
import scala.util.matching.Regex | ||
|
||
/** | ||
* RDD wrappers for working with Records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comment?
Addressed comments. |
Hrm... something is very screwy. All I did was add documentation. How can it possibly have affected test coverage? |
It's total lines. I haven't fine-tuned the codecov configuration. If it's around 0.5% I usually ignore it. |
Looking good! I'm running some trials overnight using AUK, and should hopefully be able to give my thumbs up tomorrow morning. 👍 |
Some failures on
Have you tested that function @greebie? This is while running some variation of
|
should work. |
Oh, I see - ok, let me try again. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been running this continuously using AUK (and some spot checks with spark shell) and it is all working now. It's good to merge as far as I'm concerned.
Just thought I'd add a check to confirm I'm happy with the current PR. |
Smoke test is good. Merging now. |
This is a major refactoring of package structure to simplify and rationalize the class and import structure.
GitHub issue(s):
If you are responding to an issue, please mention their numbers below.
What does this Pull Request do?
This big patch simplifies the package import structure, such that a script now looks like this:
Note that the import statements make sense now: we import AUT, and also the UDFs in matchbox.
How should this be tested?
Our tests should be a start, but we won't know issues until we deploy w/ AUK.