Refactor extract methods on submit command #775

jdsutherland · 2018-12-08T10:00:18Z

The submit command seems more difficult to work with than it has to be.

These changes are an attempt at improvement by extracting the core components into methods.

kytrinyx

I've taken a first pass look at this, and I'm not quite sure what I think about the change overall. I've got a couple of minor questions, and I also went ahead and pulled out the piece that I was sure about, so that we could get it into master.

Would you mind rebasing this onto the upstream master?

kytrinyx · 2018-12-26T17:35:04Z

cmd/submit.go

@@ -75,10 +123,10 @@ func runSubmit(cfg config.Config, flags *pflag.FlagSet, args []string) error {

        %s

-		`
-				return fmt.Errorf(msg, arg)
+        `


What is the reasoning behind changing the indentation on these messages?

If it's a thing we should do, I'd like to do it consistently across the entire package, whereas if it's an unintentional change, I'd like to make sure it stays the same.

IIRC, there are inconsistencies with formatting related to template string newlines starting with tabs vs spaces throughout the codebase. I'm guessing in many of the template strings, newlines start with a space as an intentional style of output. In my editor (using vim-go) gofmt runs on save and gofmt uses tabs for indentation so any time one of the existing template strings gets yanked and pasted, the newlines that started with a space become a tab. Making it identical requires manually replacing the tab with the space.

I also use vim-go and run gofmt on save. I didn't think that gofmt changed literal strings, though, for exactly this reason.

I do know that I was intentional about the indentation that I put in, but I don't know if others made other choices.

I'd like to make template string changes in a different PR (if at all).

Let me double check that there is actually something to this that isn't dependent on my editor config. At the time, I remember that gofmt seemed to be editing the string literals but it's possible that I was confused. It could be that the act of pasting was what was replacing spaces with tabs.

IIRC, the reason that this specific line was changed is that I used a substitution command to replace string literal lines starting with tabs (with spaces) and this line matched.

I'll report back.

Edit: AFAICT, it seems like I was confused about gofmt causing this - it doesn't seem to. I notice when I yank/paste lines involving template strings and newlines starting with spaces, the pasted result replaces the spaces with tabs. I'm not sure what is causing this but it's probably my own editor config. When I use system yank/paste I don't have this problem.

I think there are probably some inconsistent areas in the codebase (a few show up as a diff in this PR) involving tabs vs spaces with template strings. If you want to make these changes as part of a separate PR, let me know if you want me to remove the tab/space changes that are part of this PR.

kytrinyx · 2018-12-26T17:35:12Z

cmd/submit.go


-	exercise.Documents = make([]workspace.Document, 0, len(args))
-	for _, file := range args {
+func (s *submitCmdContext) _documents(filepaths []string, exercise workspace.Exercise) ([]workspace.Document, error) {


The leading underscore is a Go idiom that I'm unfamiliar with. Would you mind pointing me towards some documentation that explains what it means?

It's just a workaround to avoid errors due to naming clashes (there is a documents() method that is saved as a documents field). IIRC, there was previous mention of avoiding "getter" type naming for methods so using an underscore seemed like a reasonable alternative. If there is a idiomatic choice for this situation, I'm not aware of it.

I suspect that if there's a naming clash, there's likely a problem with the abstraction, not really with the naming convention.

kytrinyx · 2018-12-26T21:22:03Z

cmd/cmd.go

@@ -33,3 +40,18 @@ const msgMissingMetadata = `
    Please see https://exercism.io/cli-v1-to-v2 for instructions on how to fix it.

 `
+
+// validateUserConfig validates the presense of required user config values
+func validateUserConfig(usrCfg *viper.Viper) error {


I really like this change. I've extracted it from this PR into #786 and applied it where I can.

kytrinyx

Overall, I think my first thought is that the runSubmit() function should read a bit more like a story and less like a summary. Right now it more or less says:

run submit makes sure the client is configured, then submits the exercise.

Whereas the story is probably more like:

run submit checks that:

the client is configured

the submitted paths all exist

the submitted paths are all files, not directories

the files all belong to the same exercise.

Then it migrates any legacy metadata, if necessary.

Once that is done, it will ensure that none of the submitted files are empty.
Then it checks that none of the files are too big.
If there isn't anything to submit at this point, it will bail.

Next, it verifies that the metadata refers to the exercise they intend to submit, and that the person submitting it is the one that is listed as the author in the metadata.

Finally, it will submit the files.

I think there's a chance that extracting methods for each of the steps in the story would help with readability, whereas having all the steps buried in a handful of methods is more likely to obscure than to clarify.

kytrinyx · 2018-12-26T21:34:07Z

cmd/submit.go

+		return err
+	}
+
+	ctx, err := newSubmitCmdContext(cfg.UserViperConfig, flags, args)


This hides a lot of stuff behind it. While I think the submitCmdContext might be an idea worth exploring, I find it confusing to move all the logic into the new** function for the context.

kytrinyx · 2018-12-26T21:36:32Z

cmd/submit.go

+}
+
+// sanitizeArgs validates args and swaps with evaluated symlink paths.
+func (s *submitCmdContext) sanitizeArgs(args []string) ([]string, error) {


The name sanitizeArgs is a bit confusing to me. I suspect that the function is doing two different things, both of which should be named. E.g. ensurePathsExist and ensurePathsAreFiles or something like that.

These changes attempt to make submit easier to work with by composing methods for submit.

`submitContext` isn't doing much but it avoids namespace conflicts for methods like `metadata`

Already have the exercise dir

This makes submit testable in isolation.

Distinguish that this is a context for the submit cmd itself.

printing really depends on the submitCmd rather than the submission itself

Size can get large if submitting many docs

Remove submit type and move methods to submitCmdContext

jdsutherland · 2019-01-02T12:59:52Z

Thanks for the feedback. Having spent some time away and now looking at the current state of this PR with fresh eyes, it seems like it's a still a rough draft. I may have a different view (or misunderstanding) about the utility of reading like a summary vs a story. Given that you have much more experience than me, this might be a good opportunity for me to improve the way I think about design. I'll try to explain where I'm coming from:

It seems that the main detail we care about here is that submit submits documents. The logic is essentially a series of validations on the args (submit paths) in the process of building up documents to be submitted (though a few other things happen: migrate metadata, submit, and print). There are a few types that are created along the way: Exercise, Metadata, and Documents. Given that we're dealing with an dependent chain of steps in the service of building the documents to be submitted, it seems appropriate to have methods that are basically containers for these steps. Unless we place high value on making each validation rule explicit, it might be more pragmatic to do the validations in the same method that the relevant type is initialized. This makes it easy to infer where each rule lives and avoids the duplication of having multiple methods that involve the same loops to do the validation. It might be useful to drill down further on these methods and make each rule explicit but I'm not sure how much value this adds. To my eye, increasing the amount of detail directly in runSubmit may be more noise than utility.

Overall, I think my first thought is that the runSubmit() function should read a bit more like a story and less like a summary.
Whereas the story is probably more like:

run submit checks that:

the client is configured
the submitted paths all exist
the submitted paths are all files, not directories
the files all belong to the same exercise.
Then it migrates any legacy metadata, if necessary.

Once that is done, it will ensure that none of the submitted files are empty.
Then it checks that none of the files are too big.
If there isn't anything to submit at this point, it will bail.

Next, it verifies that the metadata refers to the exercise they intend to submit, and that the person submitting it is the one that is listed as the author in the metadata.

Finally, it will submit the files.

I take your point and agree that it currently reads as quite blunt and not very story-like. But it seems like submit is dull and narrowly focused enough that I don't see why this is an issue. As of 3491328 run submit:

checks the client is configured
gets a list of submitted filepaths by evaluating symlinks (and performs essential fail-fast validations)
creates an exercise (and performs exercise related validations)
migrates legacy metadata (if necessary)
creates documents (and peforms document related validations)
creates metadata (and peforms metadata related validations)
submits the documents
prints the result to the console

In my view, this provides a high level description, separating the "what" from the "how". Each part is delineated and available for closer inspection if more detail is needed. This provides boundaries, makes dependencies explicit, makes it clear where each change should likely live, but most importantly makes it easy to understand what the submit command does without getting into the weeds. If a change involves one of the parts, say, metadata, it is obvious that any changes will likely be there.

I think there's a chance that extracting methods for each of the steps in the story would help with readability, whereas having all the steps buried in a handful of methods is more likely to obscure than to clarify.

I see how this seems like the steps are buried in a handful of methods but I would argue that at present we're buried in complexity and that these changes (while still rough and surely require improvement) have the benefit of providing a map. I don't see why extracting the main parts of a long method may lead to increased obscurity. Even if we decided to extract further from the handful of methods in the PR to increase the clarity, having a method that can be read without scrolling compared to one that requires scanning 200 lines to understand is usually significantly easier to work with. To me this seems like a clear win - I'm curious why you'd be hesitant here.

If extracting methods in a very specific story-like way leaves us tepid on the prospects of improving readability then it seems like the sparse methods might be a middle way between that and leaving runSubmit as is. I'll make the case for not leaving it as is (on master):

Currently cmd/submit consists of one long method where everything happens. It seems hard to work with in the way that long methods are usually hard to work with. It's not easy to tell what submit does quickly without expending effort. There's no high level description of what submit is actually doing and the programmer is burdened with every detail while trying to distinguish where the boundaries are between the different conceptual pieces.

Imagine we're a new contributer trying to implement 717 submit enormous files. First we need to find out what submit is actually submitting and this isn't obvious and requires some sleuthing. The experience might be something like:

What exactly are we submitting? Whatever the answer, it must surely a involve a request so let's check that.
We find where the request occurs and see that body is submitted. What is body?
After some reading I discern that body gets built from exercise.Documents.
It seems safe to assume that our file check needs to occur near where exercise.Documents is built. Where does that happen?
We search above and find the loop involving exercise.Documents and are able to determine where the change should occur, though we're still a bit unsure because the method is so large it's not obvious how far the scope of the variable reaches so we read more of the code than is really necessary.

In my mind, this seems more difficult than it really has to be. The steps above might reduced to:

I see that a submit method depends on metadata and documents.
I see there is a documents method, so the change probably goes here.

Perhaps I hid too much. Putting everything in newSubmitCmdContext looks much stranger now than when I first wrote it. I made things more explicit in 3491328 - maybe this helps?

And sorry if this is longwinded. If you're busy and this ultimately seems not worth pursuing, I understand if we want to just drop it.

kytrinyx · 2019-01-04T15:47:09Z

Overall I think our goals are the same, and largely I think we agree on how to get there. This is where I think the biggest difference is:

In my view, this provides a high level description, separating the "what" from the "how". Each part is delineated and available for closer inspection if more detail is needed. This provides boundaries, makes dependencies explicit, makes it clear where each change should likely live, but most importantly makes it easy to understand what the submit command does without getting into the weeds. If a change involves one of the parts, say, metadata, it is obvious that any changes will likely be there.

My problem with the abstractions that are in place in this PR is that it's not possible to know what validations even exist by looking at the run method, and there's no obvious place to look for them. They're all hidden, and they're all in different places.

When I refactor I tend to tackle it by making the smallest number of abstractions first, and then see what bugs me the most. I do that in several layers. I don't think that my top-level story the way it's written is how I'd end up, but it's probably the first step that I'd take so that I could see what next step might make sense.

jdsutherland · 2019-01-05T05:23:46Z

I moved all the validations to a new validator type. Having an explicit type seems useful as we can easily see all the rules and any future rules will have a clear home. Before, I wasn't quite convinced that validations were obscured but these changes make it much more apparent.

This separates validations from the getter-like methods for a better separations of concerns. Each rule is explicitly defined under submitValidator.

exercise() uses ws.ExerciseDir() to find the root dir of an exercise. Since ws.ExerciseDir requires a single path, this tacitly assumes that the submitted paths are all part of the same exercise. There is a validation for this called `filesBelongToSameExercise`. This isn't an issue from the point of view of `runSubmit` but it might be an issue if `exercise()` gets moved around in the future as this isn't explicitly stated. These changes attempt to make it more clear by stating it in the godoc.

kytrinyx · 2019-01-06T15:33:03Z

Having an explicit type seems useful as we can easily see all the rules and any future rules will have a clear home.

I think this is a really good decision! I'll go through the changes in detail today. Thank you.

kytrinyx

I really like where this went. At first glance I wanted to suggest that we don't print a warning down in the guts of the logic, but rather store off the files that got skipped in the empty context, but it's super minor and I'm not even sure I feel that strongly about it.

I'd like to pull this in and live with it for a bit, and then we can have another go at iterating if we feel the need to.

Thanks so much for tackling this, @jdsutherland, it's a massive improvement.

jdsutherland changed the title ~~Extract methods of submit command~~ Refacotr extract methods on submit command Dec 8, 2018

jdsutherland changed the title ~~Refacotr extract methods on submit command~~ Refactor extract methods on submit command Dec 8, 2018

jdsutherland force-pushed the refactor-submit branch 2 times, most recently from 6618581 to 8a65a19 Compare December 8, 2018 12:23

nywilken added cmd/submit labels Dec 9, 2018

jdsutherland force-pushed the refactor-submit branch 6 times, most recently from 0119930 to 556f8a5 Compare December 11, 2018 08:17

jdsutherland mentioned this pull request Dec 11, 2018

Refactor download cmd - compose & extract shared behavior for new doctor cmd #756

Closed

jdsutherland force-pushed the refactor-submit branch 8 times, most recently from ff7c7d1 to f0a5755 Compare December 17, 2018 01:38

kytrinyx mentioned this pull request Dec 26, 2018

Extract user config validation helper #786

Merged

kytrinyx requested changes Dec 26, 2018

View reviewed changes

jdsutherland added 6 commits January 2, 2019 17:18

Extract methods of submit cmd

61d0998

These changes attempt to make submit easier to work with by composing methods for submit.

Refactor functions to methods

09a3349

`submitContext` isn't doing much but it avoids namespace conflicts for methods like `metadata`

Replace param with query

78add72

Prefer field params rather than entire object

0051d74

Simplify documents param

e6f67ef

Already have the exercise dir

Add error checking to critical params

9e94e35

jdsutherland added 7 commits January 2, 2019 20:39

Make dependencies explicit

f021f71

Move methods to submit type

65848da

This makes submit testable in isolation.

Rename submitContext => submitCmdContext

d0afa9f

Distinguish that this is a context for the submit cmd itself.

Move printResult to submitCmdContext

9dd0499

printing really depends on the submitCmd rather than the submission itself

Submission uses pointer receivers

d296636

Size can get large if submitting many docs

godoc

23afc48

Make runSubmit more explicit

3491328

Remove submit type and move methods to submitCmdContext

jdsutherland force-pushed the refactor-submit branch from c815a2e to 3491328 Compare January 2, 2019 12:41

jdsutherland force-pushed the refactor-submit branch 5 times, most recently from ddc8a05 to 6ee01fe Compare January 5, 2019 05:17

jdsutherland force-pushed the refactor-submit branch 6 times, most recently from af2647c to b1e0140 Compare January 5, 2019 07:11

Extract validations to new submitValidator type

3023109

This separates validations from the getter-like methods for a better separations of concerns. Each rule is explicitly defined under submitValidator.

jdsutherland force-pushed the refactor-submit branch from b1e0140 to 3023109 Compare January 5, 2019 07:14

jdsutherland force-pushed the refactor-submit branch from eebe7df to e98d457 Compare January 6, 2019 12:06

godoc

e98d457

kytrinyx approved these changes Jan 6, 2019

View reviewed changes

kytrinyx merged commit a86f829 into exercism:master Jan 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor extract methods on submit command #775

Refactor extract methods on submit command #775

jdsutherland commented Dec 8, 2018 •

edited

Loading

kytrinyx left a comment

kytrinyx Dec 26, 2018

jdsutherland Jan 2, 2019 •

edited

Loading

kytrinyx Jan 4, 2019

jdsutherland Jan 5, 2019 •

edited

Loading

kytrinyx Dec 26, 2018

jdsutherland Dec 31, 2018 •

edited

Loading

kytrinyx Jan 1, 2019

kytrinyx Dec 26, 2018

kytrinyx left a comment

kytrinyx Dec 26, 2018

kytrinyx Dec 26, 2018

jdsutherland commented Jan 2, 2019 •

edited

Loading

kytrinyx commented Jan 4, 2019

jdsutherland commented Jan 5, 2019 •

edited

Loading

kytrinyx commented Jan 6, 2019

kytrinyx left a comment

Refactor extract methods on submit command #775

Refactor extract methods on submit command #775

Conversation

jdsutherland commented Dec 8, 2018 • edited Loading

kytrinyx left a comment

Choose a reason for hiding this comment

kytrinyx Dec 26, 2018

Choose a reason for hiding this comment

jdsutherland Jan 2, 2019 • edited Loading

Choose a reason for hiding this comment

kytrinyx Jan 4, 2019

Choose a reason for hiding this comment

jdsutherland Jan 5, 2019 • edited Loading

Choose a reason for hiding this comment

kytrinyx Dec 26, 2018

Choose a reason for hiding this comment

jdsutherland Dec 31, 2018 • edited Loading

Choose a reason for hiding this comment

kytrinyx Jan 1, 2019

Choose a reason for hiding this comment

kytrinyx Dec 26, 2018

Choose a reason for hiding this comment

kytrinyx left a comment

Choose a reason for hiding this comment

kytrinyx Dec 26, 2018

Choose a reason for hiding this comment

kytrinyx Dec 26, 2018

Choose a reason for hiding this comment

jdsutherland commented Jan 2, 2019 • edited Loading

kytrinyx commented Jan 4, 2019

jdsutherland commented Jan 5, 2019 • edited Loading

kytrinyx commented Jan 6, 2019

kytrinyx left a comment

Choose a reason for hiding this comment

jdsutherland commented Dec 8, 2018 •

edited

Loading

jdsutherland Jan 2, 2019 •

edited

Loading

jdsutherland Jan 5, 2019 •

edited

Loading

jdsutherland Dec 31, 2018 •

edited

Loading

jdsutherland commented Jan 2, 2019 •

edited

Loading

jdsutherland commented Jan 5, 2019 •

edited

Loading