Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update S001156.yaml #1402

Merged
merged 1 commit into from
Jan 16, 2015
Merged

Update S001156.yaml #1402

merged 1 commit into from
Jan 16, 2015

Conversation

crdunwel
Copy link
Contributor

Fix in same vein as one raised in this issue: #1397

Fix in same vein as one raised in this issue: unitedstates#1397
j-ro added a commit that referenced this pull request Jan 16, 2015
@j-ro j-ro merged commit 452e3ae into unitedstates:master Jan 16, 2015
@j-ro
Copy link
Contributor

j-ro commented Jan 16, 2015

So, here's a problem -- this breaks my implementation.

@j-ro
Copy link
Contributor

j-ro commented Jan 16, 2015

Here's the error message, btw:

"status": "error",
"error": "Unable to find css \"#sbutton\"\n[\"/var/www/apps/congress-forms/app/models/congress_member.rb:51:in `rescue in fill_out_form'\", \"/var/www/apps/congress-forms/app/models/congress_member.rb:39:in `fill_out_form'\", \"/var/www/apps/congress-forms/app/helpers/form_fill_handler.rb:15:in `block in create_thread'\"]",
"run_at": "2015-01-16T20:42:35+00:00",
"screenshot": "https://can2-dev.s3.amazonaws.com/development/screenshots/050116111a393e5935c35e1088.png"

So, with javascript we don't see that extra submit button. Maybe it only shows when submitting without javascript? Unsure.

@j-ro
Copy link
Contributor

j-ro commented Jan 16, 2015

Thinking more about this, I'm not sure what the solution is if our systems need two different yamls to function. Selfishly, I'd probably propose going to javascript versions as the standard, given that I think this is the most common use case (us, EFF, NGP, etc...). And also given that not using javascript is probably untenable long term -- I've hit forms at the state level that use javascript-based form elements, so no javascript isn't even an option.

But, thoughts on how to handle this? I'm going to revert these commits as we discuss.

j-ro added a commit that referenced this pull request Jan 16, 2015
This reverts commit 452e3ae, reversing
changes made to c68215b.
@Hainish
Copy link
Contributor

Hainish commented Jan 16, 2015

I'd agree - I think as time goes on it will increasingly be the case that forms cannot be completed without javascript. I'd suggest composing the yaml files under the assumption that javascript is enabled. I currently think that is the case for many files in the repo - our volunteers were testing with phantomjs, js enabled, with the congress-forms implementation.

@crdunwel
Copy link
Contributor Author

Yeah, the javascript redirects you before finding the element. I thought that this might be an issue and I'm actually quite surprised that the issue of javascript was never addressed in the original drafting of the schema. Without revisions to the schema to allow for no-javascript steps then this might be difficult to reconcile. Alternatively, we could construct fallback no javascript files of, for instance, the form "bioguide_no-js.yaml" for those which may require different steps. Let me know if either of these are a possibility. It wouldn't be the worst thing in the world to create our own fork but would prefer not to if possible.

@j-ro
Copy link
Contributor

j-ro commented Jan 16, 2015

I'd prefer the no-js yaml versions -- keeps everyone's yamls clear of extraneous code. It's not quite DRY, but I think for our purposes it might be best.

@crdunwel
Copy link
Contributor Author

@j-ro it would be the fastest solution but as you say, not quite DRY. I think the optimum solution would be to alter the schema to allow for no-javascript step clauses. This would be flexible enough to allow for minor granular differences in submission steps like the one from this pull request or complete rewrites if we happen to encounter that case.

We might also consider adding some optional meta-tags at the top such as a javascript-required and captcha-required boolean.

If this isn't a possibility then a no-js fallback version wouldn't be the worst plan.

@j-ro
Copy link
Contributor

j-ro commented Jan 16, 2015

I think the issue with altering the scheme is that it means everyone must update their apps to work with the new schema before it works. With the no-js version, we can just ignore these files and you can update your app to work with them, which you'd be doing anyway if we went the other way, just with us having to do the same.

@crdunwel
Copy link
Contributor Author

Understood - that may cause a lot of complications for others. Alright, if we can include no-js fallback files then that'll work fine for now.

@j-ro
Copy link
Contributor

j-ro commented Jan 16, 2015

Cool -- and, you know, I'd encourage you to perhaps start using congress-forms too! As @Hainish and I mentioned, I think the no js thing is going to quickly become untenable, given how js is the runtime for the web these days. And more people using the same software means more power for everyone.

@crdunwel
Copy link
Contributor Author

Thank you. Yeah, I looked into that and the problem is how deeply tied our current implementation of sending letters is with our app. Decoupling them and plugging in congress-forms would take a lot of time that I don't have. Longer term that would be ideal but for now just need to keep the wheels turning while we focus on the redesign. Perhaps after the release I will go back and hack at it though.

@j-ro
Copy link
Contributor

j-ro commented Jan 16, 2015

totally, makes sense. Well then, for now I think the no-js solution minimizes dev time on all sides.

@Hainish
Copy link
Contributor

Hainish commented Jan 16, 2015

For testing purposes, I can provide an authenticated endpoint to both of you out-of-band which gives me a listing of which forms are working with congress-forms and which aren't. I don't believe I have your PGP keys but if you want to provide them I can email you.

@j-ro
Copy link
Contributor

j-ro commented Jan 16, 2015

I'm pretty good, we've got our own endpoint for that. But thanks!

@plantfansam
Copy link

@j-ro, I'd like to push back on this. Consulting the README: "This project defines an open data format to describe the contact forms of members of Congress." I think it's dangerous to reject the idea that a variety of people can build separate systems using the same YAML files.

To me, making JS and non-JS files is a reasonable, systemic accommodation for to broadly different implementation types (JS and non-JS), while a hard-coded double click in individual YAMLs is a specific fix related to a specific implementation. So I think this situation is different than the JS/non-JS one (and that there should be a single click in the YAML file).

That said, if you want to chat about debugging your double-click issue, I'm more than happy to lend eyes!

Edited to include link to 'this situation.'

@j-ro
Copy link
Contributor

j-ro commented Jan 20, 2015

I think the readme is out of step with the reality of the project, at this point. It's a nice idea, but in practice, it doesn't work.

Already, these yamls are full of hacks to get our systems to fill out forms correctly. Any wait commands you see aren't describing pages, they're helping systems fill out forms. Same with find commands before success messages. Extraneous click commands. All kinds of things.

Is the double click solution stopping other implementations from working? I'm not sold on the double click as the solution -- if you can find a way that congress-forms will submit these yamls without it, then we should definitely use that.

But more generally, these yamls are really instructions for systems to fill out the form. And, because they're instructions, they are, at some level, system specific. We aim to describe them as generally as possible, but when we need to get them working, we get them working.

@drinks
Copy link
Member

drinks commented Jan 21, 2015

It seems to me that this sort of problem--while it's infuriating that this even happens!--can be worked around pretty amicably for all systems. I'd suggest augmenting find with a timeout, after which a failing system can be instructed to proceed or do something else (like click on a button). Non-JS systems can recognize that timeouts don't apply to them and proceed to the failure state immediately, JS systems can get location-redirected and confirm success. Anyone see any glaring flaws in such a course of action?

@plantfansam
Copy link

So I should mention that I was referencing this commit about double clicks in my comment: 12fe982.

@drinks
Copy link
Member

drinks commented Jan 21, 2015

Got it. Yeah I'll reply there too, though all I've got to offer is whatever perspective I can give about what I was thinking when I started the project, and I'm not sure what that's worth w/r/t mature systems implementing the schema. Hopefully something can be figured out that plays nice all around.

@j-ro
Copy link
Contributor

j-ro commented Jan 21, 2015

So, I think we should make sure we're talking about the same concepts.

One issue is if a yaml for one system breaks another. That's the case in changes like #1402

This change is necessary for no-js systems, but not for js systems. So, that's a place where we need to figure out a way to have two yamls, one for each type of system. I haven't heard whether these double clicks actually break other systems, they may not. If they don't, maybe this is a wontfix.

However, there's a larger concept here, which is the idea that these yamls describe the forms as they are vs. the idea that these yamls are a set of instructions for how a particular system would fill out these forms.

While this repo was created with the former in mind, I think it's actually the latter. These yamls are useless without a system to fill them out, and indeed, without testing against a system to see if they're valid it's hard to say they describe the form in the former sense. You always need something to validate against.

So, we can decide that one type of system is the reference system and that all yamls get validated to that system. Or we can decide we're going to support more than one system as a reference and then we have some kind of logic (perhaps as @drinks described above, or the no-js yaml file type, though perhaps with a more system-specific name) to distinguish between what works in one system and what works in another. But I don't think we can say "these yamls just describe forms, irrespective of systems" because without something to test and validate against I'm not sure how you could say any of these yamls are "complete" or "accurate." And I'm not sure basing our categories on functionality is the way to go either -- maybe these issues are caused by javascript vs. no javascript, but maybe they're not. Or in the future, maybe they're not for new issues we come accross. As we've seen, each system has its own quirks. If we're building to systems as references we should use that as our categorization technique instead of looking for other categories to break things down into. That and I'm generally not too keen about adding more waits into the system, seeing as it takes enough time to fill out forms as it is, and waits don't always equate with nonworking, given network connectivity on some of these sites.

If we're going to pick one reference system, then I think it should be congress-forms, seeing as that's what the majority of users of this repo are using (us, EFF, NGP VAN). If we're going to pick a few reference systems, then I think the alternative yaml scheme is the best one for now -- no changes are needed to all of our congress-forms systems to make it work, and the other supported systems can develop whatever yaml naming logic they want to implement their versions of nonworking yamls for them. Or if you don't want to implement that logic you could fork and we can share yaml data between forks but keep our own versions where we need to. Though I think forking is more of a last resort.

But generally, I think we can't ignore the reality that these yamls have to be built towards certain systems, at least some of the time. Most yamls work fine across them, which is great. But a lot of yamls already have hacks for the way systems use them, and that's only going to continue. I think we should be ok with that. No reason this can't be resolved, and resolved well, but it helps to recognize this I think to get us to an amicable resolution.

Editing to add: I think we need to be on the same page about this issue, then we can approach how to solve it. Without it, I think we're going down some paths we'll regret, such as the no-js functional differences.

@konklone
Copy link
Member

This seems resolvable without a fork, and without making congress-forms the only supported reference client.

It sounds like all parties agree that JS is inevitably the future of form submission, development priorities notwithstanding. In that case, making some separate YAMLs for no-JS, that only the people who stick to no-JS need to reference in their code, is probably the path that wastes the least time overall.

As for double-clicking -- I'm very sympathetic to the idea that implementation concerns are not fully separable from the data spec, but I wonder if we can't be more creative about how we describe the forms.

If a reasonable system, pre-12fe982, walks through the steps in the YAML and doesn't work, the YAML has to change. But right now the change is to just bluntly add in double-clicks, and the discussion indicates that the root cause of the problem is not understood. I think that if the problem were understood, the YAML files could be changed in a different way that allowed paths that don't need the change (like Sunlight's) to either ignore the changes, or at worst do extraneous work.

If that's really not true, and the only solution is to make a change that makes @j-ro's implementation work, and Sunlight's break, then let's find a way to express that so that one of the implementations can choose to ignore that instruction. Perhaps you could add clicks: 2, or retry: true, that one implementation uses and one does not.

In any case, I really think we can arrive at YAMLs that multiple implementations can use. It sounds like it's been a while since the teams talked in any in depth way about implementation differences. =) So this seems healthy to me.

@j-ro
Copy link
Contributor

j-ro commented Jan 22, 2015

I'd still like to know if the double clicks break Sunlight's implementation -- that's certainly a different class of issue. Right now, the objection is that double clicks don't really describe the form, which is fine, but academic.

Re: the no-js fork, I'm mostly saying that we should not call it the no-js fork, we should call it the [sunlight system] fork. (formageddon, is that what it's called?) Because we're recognizing that yamls are attached to systems.

Overall, I think my main concerns are:

  1. We should recognize that yamls are sometimes system-specific for reasons that aren't easily rooted in the exact HTML of a form. Sometimes waits help. Sometimes we need to use webkit and not firefox. It's a bit mysterious around the edges, but that's what we need to do to make it work. Given that understanding, we can come up with a solution that works. We should share as much code as we can, of course, but sometimes we'll have to have our own yaml versions and that's fine.
  2. We should avoid making breaking changes to the yamls as they stand. Right now, it would be trivial for me to swap out this repo for a fork. And I'm assuming the same is true for you. It is not trivial for me to make my modified congress-forms system respond to new yaml commands.

Or, I guess in other words, I'd rather maintain two files/forks than add in logic to our yamls, because one requires dev time I don't have and another doesn't.

And, of course, if someone can get these double-click yamls to work some other way with congress-forms, that's great. I couldn't after trying for a few hours. But a fresh set of eyes is always useful.

Edited to add:

I'm also good with the previous no-js solution (though maybe we want to call it something else, more system-specific), where you modify your system to look for some custom file extension on yamls and prefer those versions over similarly named yamls without the extension. No breaking changes on my end, so I'm good. And no forking.

@konklone
Copy link
Member

Re: the no-js fork, I'm mostly saying that we should not call it the no-js fork, we should call it the [sunlight system] fork. (formageddon, is that what it's called?) Because we're recognizing that yamls are attached to systems.

Recognizing that the YAMLs are not totally separable from systems is different than how to handle JS/no-JS approaches. If the future is JS, Sunlight's no-JS implementation can do the work of maintaining some divergent data to support no-JS work.

The disagreement is over whether coding in a double-click is appropriate in the form, or whether that's a hack. It really seems like the need for that should be understood and justified (and not just tossed in as a quick way to get through the problem) before users of congress-forms insist that other users deal with potential breaking changes. Maybe the burden is on Sunlight to handle non-JS-specific paths while they find development time to adapt to JS -- but the burden is on the person making an implementation-specific change to at least show the root cause and why it's necessary before proceeding forward.

@j-ro
Copy link
Contributor

j-ro commented Jan 22, 2015

That's fair, but does double-click break sunlight's implementation? We've got plenty of hacks in our yamls -- every wait is a hack -- but they're not breaking so they stay. If this is breaking then sure, we should find another way if we can before we fork. But is it? That's not been said so far as I can tell.

@Hainish
Copy link
Contributor

Hainish commented Jan 22, 2015

I was always working with the assumption that the YAML files should be as descriptive of the form state as possible, while doing what needs to be done to get the form working in actual implementation. I would hope that introducing a change in the YAML to fix one implementation would not negatively affect the other, but if that is the case we should find a way to modify our implementations so they better fit the description. For instance, instead of a click action we could add an additional instruction such as "run_js" to submit a specific form. We're already implementing hacky solutions, right?

If we acknowledge the differences from one implementation to another, we should also acknowledge the differences in how within one implementation the actual filling process can differ, given different system states: amt of memory, phantomjs versions, etc. Should we then create different YAML files for every one of these system states? Of course this suggestion is in jest, but if we're in the business of forking the YAML whenever anything doesn't work, this is where it can lead.

Maybe a better way of thinking about the YAML files is not a description of how the form behaves in the abstract, but a generalization of existing implementations. Thinking of it this way allows for us to have hacky solutions, given that we're not breaking other systems implementations. Of course, how would we know, without some kind of CI for all existing implementations?

I think diverging the YAML into implementation-specific descriptions would be a wasted effort and introduce more problems, such as keeping these files in sync with one another and avoiding duplication of efforts.

@j-ro
Copy link
Contributor

j-ro commented Jan 22, 2015

I don't think we disagree -- we should have implementations that work for all systems where we can. The question is mostly what do we do when we can't?

I also like the run_js command, to be interpreted as running arbitrary js, which is useful sometimes. But that's long term.

I'd still like to know if these double clicks break Sunlight's implementation. If the answer is no, then I think we leave it as is -- a necessary, non-breaking hack.

For the forms that need js to submit (and thus Sunlight needs breaking changes to submit), a duplicate file should be made for Sunlight to use, via fork or another naming convention based method or something.

This means essentially that congress-forms is the reference system, because it sees the widest use right now. All efforts should be made to make one yaml that works for everyone, but in situations where that's not possible, the congress-forms implementation is the one that's canonical.

Editing to add:

We can make another system the reference system if that's what folks decide. We just need to pick one reference system so we know what to build against and where the copies need to be. Bottom line is, I think we need to figure out how we deal with situations where a yaml works for one system but not another that doesn't involve breaking yaml schema changes. That basically means two copies, either in the same repo or in a fork. Either way is fine for me, as long as I get a repo of files that I can use successfully without having to implement new yaml commands at this moment. Might be something we want to do in the future, but I think it's a longer term change.

@crdunwel
Copy link
Contributor Author

We should recognize that yamls are sometimes system-specific for reasons that aren't easily rooted in the exact HTML of a form.

Incorrect IMO. The YAMLs are only system specific if one writes them to work on a particular system as it currently exists (i.e. one writes ruby for a ruby interpreter). Perhaps @Hainish put it best:

I was always working with the assumption that the YAML files should be as descriptive of the form state as possible, while doing what needs to be done to get the form working in actual implementation.

In the spirit of the communal unitedstates repository, I believe the YAMLs should capture the form state and the implementation using them should handle the correct interpretation. It seems that you're treating the YAMLs as if they were psuedo-code for your particular system to ingest when it should be your system that adapts and handles the various cases of failures. In this particular case, if a single click fails then perhaps your system should try the steps again with a double click? Perhaps also implement a wait between every action by default between steps since your system must mimic an actual user's input and the headless browser may not be able to keep up with near instantaneous typing or input element switching. These are the type of things that I feel a particular implementation should be responsible for handling.

I'd still like to know if these double clicks break Sunlight's implementation. If the answer is no, then I think we leave it as is -- a necessary, non-breaking hack.

It did break our implementation and I had to create a workaround as a result. The issue isn't whether this particular case can be handled now - it's the precedent that this case will set for the future of the repository. Will the YAMLs be specific instructions for EFF's congress-form project, or will they be generalized descriptions of how to submit a form on a member of congress' website so anybody can create their own system that ingests and interprets them?

For the forms that need js to submit (and thus Sunlight needs breaking changes to submit), a duplicate file should be made for Sunlight to use, via fork or another naming convention based method or something.

There are relatively few forms that actually need javascript to submit (Reid is a good example - remind me to petition him). I feel most of the javascript issues can be resolved if we can add an attribute to a step such as "javascript: boolean". If true, only javascript systems should perform the step. If false, only no-javascript systems should perform the step. If omitted, both system types execute the step. Presumably congress-form project ignores attributes that it doesn't understand so this shouldn't even break that implementation. This would give each of us the power to designate if a step should be for us only, but perhaps more importantly it will also accurately describe the generalized process of submitting the form for both a javascript and non-javascript browser. Maintaining two versions when only one or two instructions differ invites all sorts of problems as @Hainish points out.

Final note and sort of obvious, but perhaps the integration tests should include testing from Sunlight's system. This would make determining whether a YAML breaks another's system easier than going back and forth like this with pull requests and reverting. If we agree to adding the javascript attribute to steps then in most cases it may be as simple as adding that option on a step or two to make the YAML file work for both of us.

it sounds like it's been a while since the teams talked in any in depth way about implementation differences. =) So this seems healthy to me.

Perhaps some coffee is in order...

@j-ro
Copy link
Contributor

j-ro commented Jan 22, 2015

I guess the place where I disagree is the idea that there is such thing as a generalized description of a form:

Will the YAMLs be specific instructions for EFF's congress-form project, or will they be generalized descriptions of how to submit a form on a member of congress' website so anybody can create their own system that ingests and interprets them?

If you don't pick a reference system as something you test yamls against, how will you know when the yaml you've made describes the form?

In my mind, we need to:

  1. Pick a reference system so we know what baseline we're writing to, and what standard we're using to judge whether a yaml is passing or not.
  2. Decide on a way to deal with situations where the yaml passes the reference test but breaks someone else's implementation.

For one, I think congress-forms should be the reference system, because it has more users. That's just my opinion.

For two, I think forking/duplicates are the best option now, because they require the least dev time all around.

Of course, our goal should be to reduce the number of duplicates/forks that we maintain. Those that use the reference system shouldn't just throw out hacks in yamls to make things work. Rather, they should endeavor to find yaml structures that work for all, and amicably entertain ideas and suggestions from others to make that happen.

I'm certainly in that space with the double clicks, especially because it's breaking for Sunlight. I've spent more than a few hours with it, trying to find other implementations. I'm happy to take suggestions and test them, now that I'm out of ideas, at least for the moment. If we can find a way to make it work for all of us, that would be ideal.

But at the end of the day, we might need to resort to whatever solution we choose for need two.

Do folks have a response to my two needs above, pick a reference system and decide on a way to deal with difference from the reference? If we can at least agree that those are the issues we need to solve, then we can debate those issues.

@j-ro
Copy link
Contributor

j-ro commented Jan 22, 2015

good news! the double click issue I think can be bypassed with webkit!

But, that doesn't really mean we don't have to think about these two issues, IMO. I'd still like for us to decide on a reference system and a way of handling yaml instructions that have to deviate from the reference for certain implementations. We have a problem now with the js-based submissions, and there will likely be more in the future, even if double clicking isn't one of them. So worth deciding.

If folks agree that these are indeed the two issues to decide, we can make new issues for each of them and debate them separately, perhaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants