Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor extract methods on submit command #775

Merged
merged 36 commits into from
Jan 6, 2019

Conversation

jdsutherland
Copy link
Contributor

@jdsutherland jdsutherland commented Dec 8, 2018

The submit command seems more difficult to work with than it has to be.

These changes are an attempt at improvement by extracting the core components into methods.

@jdsutherland jdsutherland changed the title Extract methods of submit command Refacotr extract methods on submit command Dec 8, 2018
@jdsutherland jdsutherland changed the title Refacotr extract methods on submit command Refactor extract methods on submit command Dec 8, 2018
@jdsutherland jdsutherland force-pushed the refactor-submit branch 2 times, most recently from 6618581 to 8a65a19 Compare December 8, 2018 12:23
@jdsutherland jdsutherland force-pushed the refactor-submit branch 6 times, most recently from 0119930 to 556f8a5 Compare December 11, 2018 08:17
@jdsutherland jdsutherland force-pushed the refactor-submit branch 8 times, most recently from ff7c7d1 to f0a5755 Compare December 17, 2018 01:38
Copy link
Member

@kytrinyx kytrinyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a first pass look at this, and I'm not quite sure what I think about the change overall. I've got a couple of minor questions, and I also went ahead and pulled out the piece that I was sure about, so that we could get it into master.

Would you mind rebasing this onto the upstream master?

cmd/submit.go Outdated
@@ -75,10 +123,10 @@ func runSubmit(cfg config.Config, flags *pflag.FlagSet, args []string) error {

%s

`
return fmt.Errorf(msg, arg)
`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning behind changing the indentation on these messages?

If it's a thing we should do, I'd like to do it consistently across the entire package, whereas if it's an unintentional change, I'd like to make sure it stays the same.

Copy link
Contributor Author

@jdsutherland jdsutherland Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, there are inconsistencies with formatting related to template string newlines starting with tabs vs spaces throughout the codebase. I'm guessing in many of the template strings, newlines start with a space as an intentional style of output. In my editor (using vim-go) gofmt runs on save and gofmt uses tabs for indentation so any time one of the existing template strings gets yanked and pasted, the newlines that started with a space become a tab. Making it identical requires manually replacing the tab with the space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also use vim-go and run gofmt on save. I didn't think that gofmt changed literal strings, though, for exactly this reason.

I do know that I was intentional about the indentation that I put in, but I don't know if others made other choices.

I'd like to make template string changes in a different PR (if at all).

Copy link
Contributor Author

@jdsutherland jdsutherland Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me double check that there is actually something to this that isn't dependent on my editor config. At the time, I remember that gofmt seemed to be editing the string literals but it's possible that I was confused. It could be that the act of pasting was what was replacing spaces with tabs.

IIRC, the reason that this specific line was changed is that I used a substitution command to replace string literal lines starting with tabs (with spaces) and this line matched.

I'll report back.

Edit: AFAICT, it seems like I was confused about gofmt causing this - it doesn't seem to. I notice when I yank/paste lines involving template strings and newlines starting with spaces, the pasted result replaces the spaces with tabs. I'm not sure what is causing this but it's probably my own editor config. When I use system yank/paste I don't have this problem.

I think there are probably some inconsistent areas in the codebase (a few show up as a diff in this PR) involving tabs vs spaces with template strings. If you want to make these changes as part of a separate PR, let me know if you want me to remove the tab/space changes that are part of this PR.

cmd/submit.go Outdated

exercise.Documents = make([]workspace.Document, 0, len(args))
for _, file := range args {
func (s *submitCmdContext) _documents(filepaths []string, exercise workspace.Exercise) ([]workspace.Document, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leading underscore is a Go idiom that I'm unfamiliar with. Would you mind pointing me towards some documentation that explains what it means?

Copy link
Contributor Author

@jdsutherland jdsutherland Dec 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a workaround to avoid errors due to naming clashes (there is a documents() method that is saved as a documents field). IIRC, there was previous mention of avoiding "getter" type naming for methods so using an underscore seemed like a reasonable alternative. If there is a idiomatic choice for this situation, I'm not aware of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that if there's a naming clash, there's likely a problem with the abstraction, not really with the naming convention.

cmd/cmd.go Outdated
@@ -33,3 +40,18 @@ const msgMissingMetadata = `
Please see https://exercism.io/cli-v1-to-v2 for instructions on how to fix it.

`

// validateUserConfig validates the presense of required user config values
func validateUserConfig(usrCfg *viper.Viper) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this change. I've extracted it from this PR into #786 and applied it where I can.

Copy link
Member

@kytrinyx kytrinyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think my first thought is that the runSubmit() function should read a bit more like a story and less like a summary. Right now it more or less says:

run submit makes sure the client is configured, then submits the exercise.

Whereas the story is probably more like:

run submit checks that:

  • the client is configured
  • the submitted paths all exist
  • the submitted paths are all files, not directories
  • the files all belong to the same exercise.

Then it migrates any legacy metadata, if necessary.

Once that is done, it will ensure that none of the submitted files are empty.
Then it checks that none of the files are too big.
If there isn't anything to submit at this point, it will bail.

Next, it verifies that the metadata refers to the exercise they intend to submit, and that the person submitting it is the one that is listed as the author in the metadata.

Finally, it will submit the files.

I think there's a chance that extracting methods for each of the steps in the story would help with readability, whereas having all the steps buried in a handful of methods is more likely to obscure than to clarify.

cmd/submit.go Outdated
return err
}

ctx, err := newSubmitCmdContext(cfg.UserViperConfig, flags, args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hides a lot of stuff behind it. While I think the submitCmdContext might be an idea worth exploring, I find it confusing to move all the logic into the new** function for the context.

cmd/submit.go Outdated
}

// sanitizeArgs validates args and swaps with evaluated symlink paths.
func (s *submitCmdContext) sanitizeArgs(args []string) ([]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name sanitizeArgs is a bit confusing to me. I suspect that the function is doing two different things, both of which should be named. E.g. ensurePathsExist and ensurePathsAreFiles or something like that.

These changes attempt to make submit easier to work with by composing
methods for submit.
`submitContext` isn't doing much but it avoids namespace conflicts for methods
like `metadata`
Already have the exercise dir
This makes submit testable in isolation.
Distinguish that this is a context for the submit cmd itself.
printing really depends on the submitCmd rather than the submission
itself
Size can get large if submitting many docs
Remove submit type and move methods to submitCmdContext
@jdsutherland
Copy link
Contributor Author

jdsutherland commented Jan 2, 2019

Thanks for the feedback. Having spent some time away and now looking at the current state of this PR with fresh eyes, it seems like it's a still a rough draft. I may have a different view (or misunderstanding) about the utility of reading like a summary vs a story. Given that you have much more experience than me, this might be a good opportunity for me to improve the way I think about design. I'll try to explain where I'm coming from:

It seems that the main detail we care about here is that submit submits documents. The logic is essentially a series of validations on the args (submit paths) in the process of building up documents to be submitted (though a few other things happen: migrate metadata, submit, and print). There are a few types that are created along the way: Exercise, Metadata, and Documents. Given that we're dealing with an dependent chain of steps in the service of building the documents to be submitted, it seems appropriate to have methods that are basically containers for these steps. Unless we place high value on making each validation rule explicit, it might be more pragmatic to do the validations in the same method that the relevant type is initialized. This makes it easy to infer where each rule lives and avoids the duplication of having multiple methods that involve the same loops to do the validation. It might be useful to drill down further on these methods and make each rule explicit but I'm not sure how much value this adds. To my eye, increasing the amount of detail directly in runSubmit may be more noise than utility.

Overall, I think my first thought is that the runSubmit() function should read a bit more like a story and less like a summary.
Whereas the story is probably more like:

run submit checks that:

the client is configured
the submitted paths all exist
the submitted paths are all files, not directories
the files all belong to the same exercise.
Then it migrates any legacy metadata, if necessary.

Once that is done, it will ensure that none of the submitted files are empty.
Then it checks that none of the files are too big.
If there isn't anything to submit at this point, it will bail.

Next, it verifies that the metadata refers to the exercise they intend to submit, and that the person submitting it is the one that is listed as the author in the metadata.

Finally, it will submit the files.

I take your point and agree that it currently reads as quite blunt and not very story-like. But it seems like submit is dull and narrowly focused enough that I don't see why this is an issue. As of 3491328 run submit:

  • checks the client is configured
  • gets a list of submitted filepaths by evaluating symlinks (and performs essential fail-fast validations)
  • creates an exercise (and performs exercise related validations)
  • migrates legacy metadata (if necessary)
  • creates documents (and peforms document related validations)
  • creates metadata (and peforms metadata related validations)
  • submits the documents
  • prints the result to the console

In my view, this provides a high level description, separating the "what" from the "how". Each part is delineated and available for closer inspection if more detail is needed. This provides boundaries, makes dependencies explicit, makes it clear where each change should likely live, but most importantly makes it easy to understand what the submit command does without getting into the weeds. If a change involves one of the parts, say, metadata, it is obvious that any changes will likely be there.

I think there's a chance that extracting methods for each of the steps in the story would help with readability, whereas having all the steps buried in a handful of methods is more likely to obscure than to clarify.

I see how this seems like the steps are buried in a handful of methods but I would argue that at present we're buried in complexity and that these changes (while still rough and surely require improvement) have the benefit of providing a map. I don't see why extracting the main parts of a long method may lead to increased obscurity. Even if we decided to extract further from the handful of methods in the PR to increase the clarity, having a method that can be read without scrolling compared to one that requires scanning 200 lines to understand is usually significantly easier to work with. To me this seems like a clear win - I'm curious why you'd be hesitant here.

If extracting methods in a very specific story-like way leaves us tepid on the prospects of improving readability then it seems like the sparse methods might be a middle way between that and leaving runSubmit as is. I'll make the case for not leaving it as is (on master):

Currently cmd/submit consists of one long method where everything happens. It seems hard to work with in the way that long methods are usually hard to work with. It's not easy to tell what submit does quickly without expending effort. There's no high level description of what submit is actually doing and the programmer is burdened with every detail while trying to distinguish where the boundaries are between the different conceptual pieces.

Imagine we're a new contributer trying to implement 717 submit enormous files. First we need to find out what submit is actually submitting and this isn't obvious and requires some sleuthing. The experience might be something like:

  1. What exactly are we submitting? Whatever the answer, it must surely a involve a request so let's check that.
  2. We find where the request occurs and see that body is submitted. What is body?
  3. After some reading I discern that body gets built from exercise.Documents.
  4. It seems safe to assume that our file check needs to occur near where exercise.Documents is built. Where does that happen?
  5. We search above and find the loop involving exercise.Documents and are able to determine where the change should occur, though we're still a bit unsure because the method is so large it's not obvious how far the scope of the variable reaches so we read more of the code than is really necessary.

In my mind, this seems more difficult than it really has to be. The steps above might reduced to:

  1. I see that a submit method depends on metadata and documents.
  2. I see there is a documents method, so the change probably goes here.

Perhaps I hid too much. Putting everything in newSubmitCmdContext looks much stranger now than when I first wrote it. I made things more explicit in 3491328 - maybe this helps?

And sorry if this is longwinded. If you're busy and this ultimately seems not worth pursuing, I understand if we want to just drop it.

@kytrinyx
Copy link
Member

kytrinyx commented Jan 4, 2019

Overall I think our goals are the same, and largely I think we agree on how to get there. This is where I think the biggest difference is:

In my view, this provides a high level description, separating the "what" from the "how". Each part is delineated and available for closer inspection if more detail is needed. This provides boundaries, makes dependencies explicit, makes it clear where each change should likely live, but most importantly makes it easy to understand what the submit command does without getting into the weeds. If a change involves one of the parts, say, metadata, it is obvious that any changes will likely be there.

My problem with the abstractions that are in place in this PR is that it's not possible to know what validations even exist by looking at the run method, and there's no obvious place to look for them. They're all hidden, and they're all in different places.

When I refactor I tend to tackle it by making the smallest number of abstractions first, and then see what bugs me the most. I do that in several layers. I don't think that my top-level story the way it's written is how I'd end up, but it's probably the first step that I'd take so that I could see what next step might make sense.

@jdsutherland jdsutherland force-pushed the refactor-submit branch 5 times, most recently from ddc8a05 to 6ee01fe Compare January 5, 2019 05:17
@jdsutherland
Copy link
Contributor Author

jdsutherland commented Jan 5, 2019

I moved all the validations to a new validator type. Having an explicit type seems useful as we can easily see all the rules and any future rules will have a clear home. Before, I wasn't quite convinced that validations were obscured but these changes make it much more apparent.

@jdsutherland jdsutherland force-pushed the refactor-submit branch 6 times, most recently from af2647c to b1e0140 Compare January 5, 2019 07:11
This separates validations from the getter-like methods for a better
separations of concerns. Each rule is explicitly defined under
submitValidator.
exercise() uses ws.ExerciseDir() to find the root dir of an exercise.
Since ws.ExerciseDir requires a single path, this tacitly assumes that
the submitted paths are all part of the same exercise. There is
a validation for this called `filesBelongToSameExercise`. This isn't an
issue from the point of view of `runSubmit` but it might be an issue if
`exercise()` gets moved around in the future as this isn't explicitly
stated. These changes attempt to make it more clear by stating it in the
godoc.
@kytrinyx
Copy link
Member

kytrinyx commented Jan 6, 2019

Having an explicit type seems useful as we can easily see all the rules and any future rules will have a clear home.

I think this is a really good decision! I'll go through the changes in detail today. Thank you.

Copy link
Member

@kytrinyx kytrinyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like where this went. At first glance I wanted to suggest that we don't print a warning down in the guts of the logic, but rather store off the files that got skipped in the empty context, but it's super minor and I'm not even sure I feel that strongly about it.

I'd like to pull this in and live with it for a bit, and then we can have another go at iterating if we feel the need to.

Thanks so much for tackling this, @jdsutherland, it's a massive improvement.

@kytrinyx kytrinyx merged commit a86f829 into exercism:master Jan 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants