Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spec_url data for JavaScript features #2983

Merged
merged 1 commit into from
Jan 8, 2019
Merged

Conversation

sideshowbarker
Copy link
Collaborator

@sideshowbarker sideshowbarker commented Oct 14, 2018

This change adds spec URLs in the *.json sources for all JavaScript features that have an mdn_url for an MDN article with a Specification(s) table — modulo the following exception:

  • no spec data is added for any cases where a URL found in an MDN Specification(s) table has no fragment-ID part

The new field that holds the spec-URL data is named spec_url, and its value is either a single URL (in the case where the feature is only associated with a single specification references), or any array of URLs (in the case where the feature is associated with multiple specification references).

The change also:

  • Includes an add-specs.py script that can be used to (re)generate all the spec data and update all the *.json sources. (The script works by scraping MDN Specification(s) tables.)

  • Updates schemas/compat-data.schema.json to allow the additional spec data and to specify the structure it must conform to.

@Elchi3
Copy link
Member

Elchi3 commented Oct 17, 2018

Thanks for doing this work! It might take a while to get back to you and to agree how to exactly add this to the schema or the repo, but this is an excellent start.

Some initial thoughts:

  • My plan would have been to add this alongside with mdn_url (called spec_url), without the "name" property. Why do you think the spec name is needed?
  • I'm talking to TC39 representatives and other spec folk and they all seem to say that MDN should stop referencing old specs and just link to the latest drafts. So, I'm not sure if we really want to add arrays of specs and historical data when it is of no real use and actually bad when people end up reading old or snapshotted specs.
  • "no spec data is added for "status": "deprecated" features" What's the reasoning behind this?
  • "no spec data is added for any cases where a URL found in an MDN Specification(s) table has no fragment-ID part" This makes sense to me.

@sideshowbarker
Copy link
Collaborator Author

  • My plan would have been to add this alongside with mdn_url (called spec_url), without the "name" property. Why do you think the spec name is needed?

I don’t think the spec name is needed and don’t feel strongly at all about it being included. I had thought it could be useful mostly just for the purpose of making the JSON source more human-readable — so that somebody reading or editing the JSON source can quickly see what spec is being referenced. But on balance I guess that’s not worth having the extra data in there, given it’s not something for applications to consume, but instead just for people.

So for now I’ll regenerate the patch without the "name" part. If we end up deciding later that it’s useful to have the "name" part in there, I could always easily change the code back later to pull the part back in again.

  • I'm talking to TC39 representatives and other spec folk and they all seem to say that MDN should stop referencing old specs and just link to the latest drafts.

I strongly agree with that. So I’m glad somebody else does too :)

So, I'm not sure if we really want to add arrays of specs and historical data when it is of no real use and actually bad when people end up reading old or snapshotted specs.

I agree that it’s bad for people to be reading old or snapshotted specs. So let’s figure out how to not migrate those old spec links from MDN into BCD.

But the immediate problem I have is that so far my code that generates the changes in this patch just simplemindedly pulls in every link it finds in a Specifications table in an MDN article. I don’t know of a simple way for the code to programatically discern which spec link from a Specifications table is to a current spec, and which spec link is to and old or snapshotted spec.

So, short of going in and removing all the links to old/snapshotted specs from all the MDN articles that have them, the only way I can see to handle it is for me to hardcode a list of old/snapshotted spec into my Python script in this patch, and to make it just drop those old ones on the floor.

That’s totally do-able and the level of effort for it isn’t high — at least in most cases, I suspect I’m familiar enough with most specs to know which are the old/outdated ones — so I’m happy to add that to the patch (in the next few days when I can make time to do it).

  • "no spec data is added for "status": "deprecated" features" What's the reasoning behind this?

The same reasoning that the TC39 representatives and other spec folks have for not wanting to reference old specs: If a feature is deprecated, we don’t want to be pointing readers to the whatever place it’s specified — or that it used to be specified. Because its outdated information.

In the case of the HTML spec and other WHATWG specs, if a feature really is deprecated, then the spec text for it has actually already been removed from the spec — so the only thing available to link to in that case would be an old/snapshotted previous version of the spec.

In fact, in a number of MDN articles I’ve edited that are for now-deprecated features, I removed the HTML spec link (because it’s now 404, since the fragment ID it linked to is no longer in the spec). So the only spec links left in the Specifications table in that case are to any other old/outdated spec versions (e.g., to some link to the W3C copy/fork of HTML — some W3C HTML 5.x version).

And IMHO in that case, those links to other outdated specs/copies should rightly be removed from the Specifications table too, which I have done in some cases but not in all (part of the reason being that IMHO, MDN rightly shouldn’t be linking to those W3C 5.x copies/forks of the real HTML spec to begin with — if I had my way, none of those links would be in MDN articles to begin with…)

@Elchi3
Copy link
Member

Elchi3 commented Oct 18, 2018

So for now I’ll regenerate the patch without the "name" part. If we end up deciding later that it’s useful to have the "name" part in there, I could always easily change the code back later to pull the part back in again.

Awesome. so then I would expect this to be like this:

"__compat": {
"mdn_url": "https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Symbol",
"spec_url": "https://tc39.github.io/ecma262/#sec-symbol-objects",
"support": {
 "chrome": {
  "version_added": "38"
},

I agree that it’s bad for people to be reading old or snapshotted specs. So let’s figure out how to not migrate those old spec links from MDN into BCD.

Great! So, I think instead of having a PR for all page trees at the same time, it would be nice to split up this work a little bit so it becomes more manageable and reviewable. For instance, for JS I got the word that we should only point to https://tc39.github.io/ecma262/ in most if not all cases. So, we could start with the javascript/ folder and see how it could work.

One schema question that I think we still need to answer, is whether it is really possible to come up with a single spec for all features, or if there are cases where we actually want more than one spec to be listed. If that's the case. then we would rather have spec_urls and type array instead of spec_url and type string. My hope is that spec_url is possible, though.

In the case of the HTML spec and other WHATWG specs, if a feature really is deprecated, then the spec text for it has actually already been removed from the spec

I see, this makes sense in such cases. Not sure it's true for all deprecated features, as our definition for deprecated seems to be more like "not recommended" as you've found out in the other issue.

I'm leaving #1531 here so this PR and the original issue are linked.

@sideshowbarker
Copy link
Collaborator Author

Awesome. so then I would expect this to be like this:

"__compat": {
"mdn_url": "https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Symbol",
"spec_url": "https://tc39.github.io/ecma262/#sec-symbol-objects",
"support": {
 "chrome": {
  "version_added": "38"
},

One schema question that I think we still need to answer, is whether it is really possible to come up with a single spec for all features, or if there are cases where we actually want more than one spec to be listed. If that's the case. then we would rather have spec_urls and type array instead of spec_url and type string. My hope is that spec_url is possible, though.

I think there are cases where a feature legitimately has multiple (non-old/snapshotted/obsolete) specs associated with it. So I do think we’re going to need end up allowing for an array value.

A common case where I think a feature can have multiple spec URLs is when the feature is an interface for which the base interface (IDL) is defined in one spec (e.g., in the HTML spec) but there are one or more other specs that have a partial interface/IDL definition for that interface.

See https://developer.mozilla.org/en-US/docs/Web/API/Performance#Specifications for example. But that’s just one example and there are a lot of other interfaces like that — as well as other kinds of cases where a feature might legitimately have multiple spec URLs associated with it.

I guess another alternative is that we could allow the field value to be either a single string or an array. But if we did it that way, then what would we name the field? spec_url or spec_urls?

I agree that it’s bad for people to be reading old or snapshotted specs. So let’s figure out how to not migrate those old spec links from MDN into BCD.

Great! So, I think instead of having a PR for all page trees at the same time, it would be nice to split up this work a little bit so it becomes more manageable and reviewable. For instance, for JS I got the word that we should only point to https://tc39.github.io/ecma262/ in most if not all cases.

That makes me really happy to hear. I hope we can end up using that as a precedent for other cases (e.g., for HTML features, only pointing to https://html.spec.whatwg.org/).

So, we could start with the javascript/ folder and see how it could work.

Yep, sounds good — I can update the patch, and after that I can update the title and description of this PR (so we don’t need to close it and reopen a new one).

That said, if we start by limiting this to just the javascript/ folder at first, I think we need be careful to not over-constrain the requirements (and schema) to just what we find we end up needing for that case — because, for example, if we’re able to decide we only want to point to https://tc39.github.io/ecma262/ then of course we’ll only need one spec URL per feature, and we’d end up just naming the field spec_url.

In the case of the HTML spec and other WHATWG specs, if a feature really is deprecated, then the spec text for it has actually already been removed from the spec

I see, this makes sense in such cases. Not sure it's true for all deprecated features, as our definition for deprecated seems to be more like "not recommended" as you've found out in the other issue.

Right. But even in the case where something’s just "not recommended" rather than deprecated formally (as far as the normative spec language) — even in that case IMHO we still don’t want to be pointing readers to the spec about it. I think for the most part we want readers to act going forward as if the feature doesn’t exist — and not, by providing readers with easy access to the spec for it, contribute to further proliferation of the feature continuing to be used in web content (and risk having readers do things like citing the existence of the spec link in MDN to justify their continued use of the feature in spite of it not being recommended).

@Elchi3
Copy link
Member

Elchi3 commented Oct 18, 2018

See https://developer.mozilla.org/en-US/docs/Web/API/Performance#Specifications for example. But that’s just one example and there are a lot of other interfaces like that — as well as other kinds of cases where a feature might legitimately have multiple spec URLs associated with it.

Well true, but maybe you could also go differently about this. The Performance API is structured like this and has sub features:

Now, on https://developer.mozilla.org/en-US/docs/Web/API/Performance#Specifications, we probably want to list all of the specs that do extend the Performance interface, but we can get this information from the sub features, right? So, say, we generate spec tables from this repo in the future, and so the interface spec table could walk the whole interface and list sub features' specs automatically. From the data point of view, the individual features have one spec_url and we aggregate them if needed in the rendered spec table views on interface pages.
I don't know if this makes sense from a data point of view, but generally I'd like the data to be as good as it can be, and worry about the MDN rendering secondarily. Let me know what you think.

If you agree that the above case could also be dealt with by just spec_url (as opposed to spec_urls), I think it would make sense to investigate further if there other cases where spec_urls would be needed still or if we could really just use spec_url.

@sideshowbarker
Copy link
Collaborator Author

See https://developer.mozilla.org/en-US/docs/Web/API/Performance#Specifications for example. But that’s just one example and there are a lot of other interfaces like that — as well as other kinds of cases where a feature might legitimately have multiple spec URLs associated with it.

Well true, but maybe you could also go differently about this. The Performance API is structured like this and has sub features:

  • Performance

    • __compat (this is the main interface, and it seems https://w3c.github.io/hr-time/#sec-performance is the spec that doesn't define a partial interface or "Extensions to the Performance interface", so this is what I would refer to as the spec for the interface itself.

Right, but as far as making that determination programatically, then I don’t think that given as input the set of specs in the Specifications that’s in the https://developer.mozilla.org/en-US/docs/Web/API/Performance#Specifications article, there’s any way to programatically determine which of those specs is the one that doesn’t define a partial interface.

So the only way I could imagine it could be done is to

  1. Have the script look at the MDN articles for each of the subfeatures that has an mdn_url of its own
  2. Get the spec URL(s) from the Specifications table of the MDN article table for that subfeature
  3. Remove that mdn_url from the set of URLs we collected from the MDN article for the interface itself — and hope in the end we wind up with only one spec URL left in that set (in which case we assume that’s the spec URL for the interface itself).

Those steps would fail to produce a single URL in the end for any cases of a subfeature that lacks an mdn_url of its own (which there are real cases I know of) or that lack an MDN article that actually exists at whatever the specified mdn_url is (which there are also cases I know of).

But doing it that way might get us most of the way there, with only some (relatively) minimal manual editing needed to make the output have only spec URL per feature. So I can make some time to update the Python script in this patch to try to do that.

Otherwise, identifying the one spec among the set of specs in a Specifications that should be the one to associate with the interface itself (rather than with one of its subfeatures) would require a human to do some relatively extensive manual editing of the output of the current Python script in this patch.

Which would certainly be doable — but it seems like there’s going to be a significant level of effort needed to do that N different times for the N different cases where an MDN article has multiple specs in its Specifications table. It will take a significant amount of time.

That’s not to say I don’t think it should be done — I agree that rightly each interface should probably have only one spec URL associated with it. But I’m just not volunteering to be the one who does all the manual editing that would be required to do it — because I’m not sure I can make the time to do it, nor do I even have a good idea at all at this point about how much time it would take.

It seems like the work/time required would be equivalent to what’d be needed to review all the current MDN articles that have multiple specs in the Specifications tables, and to reduce them one-by-one to having just a single spec each.

Now, on https://developer.mozilla.org/en-US/docs/Web/API/Performance#Specifications, we probably want to list all of the specs that do extend the Performance interface, but we can get this information from the sub features, right?

Sure. The question is how well that could be done programatically.

So, say, we generate spec tables from this repo in the future, and so the interface spec table could walk the whole interface and list sub features' specs automatically. From the data point of view, the individual features have one spec_url and we aggregate them if needed in the rendered spec table views on interface pages.

Sure

I don't know if this makes sense from a data point of view, but generally I'd like the data to be as good as it can be, and worry about the MDN rendering secondarily. Let me know what you think.

I agree completely on all counts. I’m just not sure yet if/how well I can programatically generate the data with those characteristics without doing (a lot of) manual editing of the output. But I’ll give it a try and see how close I can get.

If you agree that the above case could also be dealt with by just spec_url (as opposed tospec_urls), I think it would make sense to investigate further if there other cases where spec_urls would be needed still or if we could really just use spec_url.

Yes, agreed. But going in and doing that investigation is something else that’s going to take additional time, so I’m not sure yet how soon I might be able to get around to doing it.

I still have a vague sense that we’re eventually going to find that there really are legitimate cases where we’ll find we need to have multiple spec URLs for a feature — but I’ll be very happy if I end up being proven wrong

@Elchi3
Copy link
Member

Elchi3 commented Oct 18, 2018

Thanks @sideshowbarker, I don't want to load all the work on you here, it's just that we should come to some agreement on how we go about this. I'm happy to help with this project, as I believe it will make this dataset a lot more powerful (after this you have a mapping between compat-data, specs and mdn docs, which I think doesn't exist at all on the internet and could be powerful).

So, to reduce the workload we could split this up to work in chunks and you don't have to do all the parts here. I think we should give spec_url (no "s") a try for JS and attempt to get proper URLs into the repo. Hopefully, we'll learn a lot while doing that and maybe we'll then adjust our plans if needed. After that, we'll expand this out to other areas like HTTP or CSS and see how these go. If all goes well, we can then tackle the whole large api folder and will then discuss how to deal with some special cases.

@sideshowbarker
Copy link
Collaborator Author

Thanks @sideshowbarker, I don't want to load all the work on you here, it's just that we should come to some agreement on how we go about this. I'm happy to help with this project, as I believe it will make this dataset a lot more powerful (after this you have a mapping between compat-data, specs and mdn docs, which I think doesn't exist at all on the internet and could be powerful).

Yes — I think what we’ll eventually have from this is something that’ll end up being useful in ways we’re not even anticipating yet

So, to reduce the workload we could split this up to work in chunks and you don't have to do all the parts here. I think we should give spec_url (no "s") a try for JS and attempt to get proper URLs into the repo.

OK, yeah agreed — that definitely gives a manageable scope to start out from

Hopefully, we'll learn a lot while doing that and maybe we'll then adjust our plans if needed. After that, we'll expand this out to other areas like HTTP or CSS and see how these go. If all goes well, we can then tackle the whole large api folder and will then discuss how to deal with some special cases.

Sounds good to me!

So for now I’ll update the the patch in this PR soon to reduce it to just the JS features, with a single spec URL for each

@sideshowbarker
Copy link
Collaborator Author

I think it would make sense to investigate further if there other cases where spec_urls would be needed still or if we could really just use spec_url.

One case I’ve found so far: https://developer.mozilla.org/en-US/docs/Web/API/AudioNode/connect

That essentially links to two different signatures of the AudioNode.prototype.connect method. (The docs/spec are a bit confusing in that it looks like one of those connect methods is a property of the AudioParam object, but that’s actually not the case — this is no AudioParam.prototype.connect.)

Anyway, it occurs to me that a general case where it seems we need spec_urls, to associate multiple spec URLs with a single feature, is when the feature is a method that has multiple signatures.

@Elchi3
Copy link
Member

Elchi3 commented Oct 23, 2018

Thanks, interesting case!

Sometimes in bcd we nest a level deeper and add sub features, like it is done here for "Promise-based syntax" https://developer.mozilla.org/en-US/docs/Web/API/BaseAudioContext/decodeAudioData#Browser_compatibility

Now, a sub feature like "Promise-based syntax" can again come with an mdn_url or a spec_url if needed. I wonder if that would solve this case as we likely want to surface both signatures in BCD anyway?

@Elchi3 Elchi3 added the not ready ⛔ This is not yet ready to be merged. It's pending a decision, other PR, or a prerequisite action. label Oct 29, 2018
@sideshowbarker
Copy link
Collaborator Author

sideshowbarker commented Oct 31, 2018

Sometimes in bcd we nest a level deeper and add sub features, like it is done here for "Promise-based syntax" https://developer.mozilla.org/en-US/docs/Web/API/BaseAudioContext/decodeAudioData#Browser_compatibility

I think in cases where that’s really necessary, it’s a useful solution.

But for this case it’s not clear to me it’s really necessary. It seems like it could just be adding additional complexity without clear benefit. IMHO for this case, simply having multiple spec URLs would hit the sweet spot as far as providing what’s needed without introducing more complexity than needed.

Now, a sub feature like "Promise-based syntax" can again come with an mdn_url or a spec_url if needed. I wonder if that would solve this case as we likely want to surface both signatures in BCD anyway?

It’s not clear to me that for BCD (as opposed to the actual prose of the docs) it’s necessary to surface compat data for all possible signatures in cases where there are multiple signatures. The prose content of the docs themselves can provide the information about the different signatures and their parameters. I really wonder if there are actually any cases where method has multiple signatures and a browser supports one particular signature of the method by not any others.

If there aren’t any actual cases of that, then it seems like it would be overkill to bake in additional complexity to handle such cases.

@sideshowbarker
Copy link
Collaborator Author

sideshowbarker commented Nov 4, 2018

@Elchi3 looking at just the JavaScript features, it seems to me there is another type of case where it appears necessary to allow multiple spec URLs. So I’d like to get your decision about how to handle it.

See https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array/toLocaleString

The Specifications table in that article has links to two current specs:

The text of the ECMA 402 spec section there says this:

This definition supersedes the definition provided in ES2019, 22.1.3.27.

…which on the face of it would seem to suggest we should then just drop the ECMA 262 spec link.

However, reading the text at https://tc39.github.io/ecma262/#sec-array.prototype.tolocalestring, I see:

An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the Array.prototype.toLocaleString method as specified in the ECMA-402 specification.

OK so far, because that’s restating what the ECMA 402 spec also says: ECMA 402 supersedes this.

But the next sentence is this:

If an ECMAScript implementation does not include the ECMA-402 API the following specification of the toLocaleString method is used.

That seems to be an important statement. It makes this part of ECMA 262 non-redundant with 402.

Further, it says:

The meanings of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.

So, given there are in fact implementations that do not include ECMA-402 support, then it seems necessary to continue to cite both spec URLs — because the information in ECMA-262 is necessary for anyone wanting to know what the required behavior is for any implementation that doesn’t support ECMA-402. (And of course the information in ECMA-402 is necessary for any anyone wanting to know what the required behavior is for any implementation that does support ECMA-402.)

So it seems to me that for this case, there’s no single spec URL that we can choose to link in BCD.

And in addition to toLocaleString, the above also seems to be true of all the following:

@Elchi3
Copy link
Member

Elchi3 commented Nov 5, 2018

Thanks for your continued work on this, @sideshowbarker.

This is another really interesting case and maybe there it would make sense for toLocaleString to have both specs in the data. If I would be forced on a single url per feature, I would probably have the data like this: (the main feature has the ecma262 link, and the two subfeatures have the ecma402 link)

{
  "javascript": {
    "builtins": {
      "Array": {
        "__compat": {
          "mdn_url": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/toLocaleString",
          "spec_url": "https://tc39.github.io/ecma262/#sec-array.prototype.tolocalestring",
          "support": {
            "chrome": {
              "version_added": "38"
            },
          "locales": {
            "__compat": {
              "description": "Optional <code>locales</code> parameter",
              "spec_url": "https://tc39.github.io/ecma402/#sup-array.prototype.tolocalestring",
              "support": {
                "chrome": {
                  "version_added": null
                },
            }
          },
          "options": {
            "__compat": {
              "description": "Optional <code>options</code> parameter",
              "spec_url": "https://tc39.github.io/ecma402/#sup-array.prototype.tolocalestring",
              "support": {
                "chrome": {
                  "version_added": null
                },
          }
},

However, I can see that it would also make sense to list both for the main feature. So lets assume we would introduce spec_urls then:

{
  "javascript": {
    "builtins": {
      "Array": {
        "__compat": {
          "mdn_url": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/toLocaleString",
          "spec_urls": [
            "https://tc39.github.io/ecma262/#sec-array.prototype.tolocalestring",
            "https://tc39.github.io/ecma402/#sup-array.prototype.tolocalestring"
          ],
          "support": {
            "chrome": {
              "version_added": "38"
            },
          "locales": {
            "__compat": {
              "description": "Optional <code>locales</code> parameter",
              "spec_urls": [
                "https://tc39.github.io/ecma402/#sup-array.prototype.tolocalestring", 
              ],
              "support": {
                "chrome": {
                  "version_added": null
                },
            }
          },
          "options": {
            "__compat": {
              "description": "Optional <code>options</code> parameter",
              "spec_urls": [
                "https://tc39.github.io/ecma402/#sup-array.prototype.tolocalestring",
              ],
              "support": {
                "chrome": {
                  "version_added": null
                },
          }
},

So, spec_urls could be an array of urls. Does that make sense or do we need an object structure? Which URL would come first?

I still believe multiple specs would be an edge case, so maybe we should also allow spec_urls to be just string? Should we then call it spec_url (singular) and allow single URL as a string and multiple URLs as array at the same time?

I still kind of wish we could have spec_url and only a single URL, because then the complexity would be a lot less, but if that's absolutely not possible (which it looks like it isn't?) then we need to answer how we want to represent multiple spec URLs exactly.

@sideshowbarker
Copy link
Collaborator Author

So, spec_urls could be an array of urls. Does that make sense or do we need an object structure?

I think an array makes sense. I don’t think we need an object structure.

Which URL would come first?

I think it doesn’t matter. But if we decide it does matter later, I guess we could adopt a convention where the first URL is whatever’s seems to be the primary spec reference for the feature.

But as far as programmatically seeding the BCD data from what’s in MDN now, the order will just end up being whatever the order is currently in the Specifications table it’s pulled from.

I still believe multiple specs would be an edge case,

Yes, agreed. It will be.

so maybe we should also allow spec_urls to be just string?

Yes. I can update the JSON schema to make it allow both a string or array for the value.

Should we then call it spec_url (singular) and allow single URL as a string and multiple URLs as array at the same time?

Yes, I think spec_url would make the most sense — especially since it’ll only be a single URL in the vast majority of cases. (As far as I can see, in all of BCD we have at most 686 features that might end with multiple spec URLS. I think that’s about 6.5% of the total number of features.

I still kind of wish we could have spec_url and only a single URL, because then the complexity would be a lot less, but if that's absolutely not possible (which it looks like it isn't?) then we need to answer how we want to represent multiple spec URLs exactly.

Summary I get from our discussion is that it seems to make the most sense to have a spec_url key with string and array as the possible data types for the value.

I’ll go ahead and generate a patch with it that way — just for the JavaScript features for now (as we’d discussed here previously) — and then we can take a look at the result of that.

@Elchi3
Copy link
Member

Elchi3 commented Nov 5, 2018

Summary I get from our discussion is that it seems to make the most sense to have a spec_url key with string and array as the possible data types for the value.

Sounds good to me. Lets try this out. Thank you again for all your work!

@sideshowbarker sideshowbarker changed the title Add spec data for features w/ Specifications table Add spec_url data for JavaScript features Nov 5, 2018
@sideshowbarker
Copy link
Collaborator Author

sideshowbarker commented Nov 5, 2018

Sounds good to me. Lets try this out.

OK, (force) pushed 840e878 to update this PR branch, so the patch here is now restricted to just updating the JavaScript features — with the spec_url data, as we discussed here.

Thank you again for all your work!

Thanks for your guidance and feedback on making the patch what it should be

@sideshowbarker
Copy link
Collaborator Author

Just now force pushed fbeec5b as an update, to get pulls some URLs from MDN for a few cases, after I updated the spec URLs in the corresponding MDN articles.

Note that if you’re interested in doublechecking yourself the behavior of the Python script I made to generate the patch, you can run it like this:

./add-specs.py fullupdate javascript

@sideshowbarker sideshowbarker force-pushed the add-spec-data branch 3 times, most recently from ec534d6 to eca786d Compare November 6, 2018 14:32
@Elchi3
Copy link
Member

Elchi3 commented Nov 7, 2018

I did some reviewing today, will attempt to do more soon.
Here's is a gist of all URLs this adds: https://gist.github.com/Elchi3/7bd702e7b89692afa709de76903b1b7f

One thing that I like to see changed is that the spec_url should be below the mdn_url and not at the end of the object. I think it is just nicer when these two are together.

@Elchi3 Elchi3 added schema ⚙️ Isses or pull requests regarding the JSON schema files used in this project. data:js 📟 Compat data for JS/ECMAScript features. https://developer.mozilla.org/docs/Web/JavaScript labels Nov 7, 2018
@sideshowbarker
Copy link
Collaborator Author

One thing that I like to see changed is that the spec_url should be below the mdn_url and not at the end of the object. I think it is just nicer when these two are together.

Ah yeah, agreed — that’s where I should have thought to put it to begin with

Anyway, I’ve made it so now — thanks!

@Elchi3
Copy link
Member

Elchi3 commented Nov 9, 2018

I've reviewed the builtins folder today.

Overall I'm really impressed, the scraper works really well and I'm more than happy to see the relevant specs added to the files. I think I found three things that are worth to be discussed or to be fixed, though.

(1) I think I disagree that the script should skip pages when the data marks them as deprecated: true. For example, I think the following pages should have spec_url added to their data:

(2) Some APIs are nested deeper and your script seems to not expect that. There are missing spec_url properties for these pages:

(3) This is probably not really fixable automatically, but for https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function/toString it would be really nice if "toString_revision" in the data had the second spec_url.

sideshowbarker added a commit to w3c/browser-compat-data that referenced this pull request Nov 10, 2018
Per mdn#2983 (comment)
review comments. This change should be squashed before this is merged to master.
@sideshowbarker
Copy link
Collaborator Author

sideshowbarker commented Nov 10, 2018

(1) I think I disagree that the script should skip pages when the data marks them as deprecated: true.

OK, updated the script to no longer drop those. See 6b35508

For example, I think the following pages should have spec_url added to their data:

After running the updated script, all those do now have spec_url

(2) Some APIs are nested deeper and your script seems to not expect that.

OK, updated the script to go one level deeper (5 levels).

There are missing spec_url properties for these pages:

After running the updated script, all those do now have spec_url

(3) This is probably not really fixable automatically, but for https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function/toString it would be really nice if "toString_revision" in the data had the second spec_url.

Yeah, I think that case (and any others like it that we might come across) would need to be done with a manual edit to the BCD source. That object doesn’t have a mdn_url, and the script is only able to do something with objects that have an mdn_url — because without an mdn_url, there’s nowhere to look for the Specifications data to scrape.

Although it seems like that could be addressed by adding an mdn_url to that object, I think doing that’d be a kind of a “now you have two problems” case — because then we have situation where a feature and one of its subfeatures have exactly the same mdn_url. That’s not intrinsically absolutely bad — but the thing is, if the data did that consistently, then every single subfeature would repeat the mdn_url of its parent feature. That pretty clearly seems undesirable.

On that note, 6b35508 also includes another change which is: It ensures that spec_url doesn’t get added to objects that have an mdn_url with a fragment ID — because in those cases they are subfeatures who necessarily currently have exactly the same spec URL as their parent feature (given that the spec URLs are pulled from the same Specifications table in the same MDN article). And it makes no more sense to have N number mdn_url-has-hash subfeatures with the same spec URL as their parent feature than it does for any other subfeatures to have exactly the same spec URL as their parent feature.

So if we want those subfeatures to have their own spec URLs, it’s another case where we’d need to make manual edits to the BCD source to add them — because unless they’re each actually separate articles in MDN (rather than separate list items in the same article), then there’s no way in MDN to associate separate Specifications information with each of them.

@sideshowbarker
Copy link
Collaborator Author

@Elchi3 Is this waiting on some action from me…?

@Elchi3
Copy link
Member

Elchi3 commented Nov 28, 2018

No, it is waiting on me. I'm trying to get back to this, but I'm swamped with work. Thanks for your patience.

Copy link
Member

@Elchi3 Elchi3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this!

  • I have one minor schema change
  • I guess for finally committing this, we don't want to commit the add-specs.py file?

I'm not yet 100% sure I understand the interdiff: 6b35508

It does remove some things that look strange to me. Like the removals in javascript/operators/logical.json. Or am I not reading the interdiff and total diff right?

To speed up review could you create something like this gist https://gist.github.com/Elchi3/7bd702e7b89692afa709de76903b1b7f but also with the bcd "query strings".

So, like this:

javascript.builtins.array.pop: https://tc39.github.io/ecma262/#sec-array.prototype.pop
javascript.builtins.array.push: https://tc39.github.io/ecma262/#sec-array.prototype.push
javascript.builtins.array.push.something: <no-spec-url>
javascript.builtins.something: <no-spec-url>

That way, we could quickly see which item gets which spec_url and also when we provide no spec-url.

Thanks again, this is so cool, and I will try to get it merged before I go into holidays later in December.

schemas/compat-data.schema.json Show resolved Hide resolved
@sideshowbarker
Copy link
Collaborator Author

I'm not yet 100% sure I understand the interdiff: 6b35508

It does remove some things that look strange to me. Like the removals in javascript/operators/logical.json.

Those are due to the fact I updated the script to ignore any mdn_url that has a fragment ID, for the reason explained in #2983 (comment):

On that note, 6b35508 also includes another change which is: It ensures that spec_url doesn’t get added to objects that have an mdn_url with a fragment ID — because in those cases they are subfeatures who necessarily currently have exactly the same spec URL as their parent feature (given that the spec URLs are pulled from the same Specifications table in the same MDN article). And it makes no more sense to have N number mdn_url-has-hash subfeatures with the same spec URL as their parent feature than it does for any other subfeatures to have exactly the same spec URL as their parent feature.

But I see that javascript/operators/logical.json doesn’t actually have a single parent feature; instead it has just those three subfeatures, all of which are for documented in the same MDN article but just in separate sections.

That’s… atypical. It seems to me to expose some problems with how the information is structured in MDN and how the data is structured in BCD — and suggests that maybe the MDN article should be split into separate articles, and/or the BCD data should be structured different.y

Otherwise I’m not sure how to address that case. But if you prefer, I can just remove the logic from the script that causes any mdn_url to be ignored if it contains a fragment ID — but then that’s going to cause the general problem described in #2983 (comment).

@sideshowbarker
Copy link
Collaborator Author

To speed up review could you create something like this gist https://gist.github.com/Elchi3/7bd702e7b89692afa709de76903b1b7f but also with the bcd "query strings".

So, like this:

javascript.builtins.array.pop: https://tc39.github.io/ecma262/#sec-array.prototype.pop
javascript.builtins.array.push: https://tc39.github.io/ecma262/#sec-array.prototype.push
javascript.builtins.array.push.something: <no-spec-url>
javascript.builtins.something: <no-spec-url>

Yup — will get that generated some time this weeken

@sideshowbarker
Copy link
Collaborator Author

To speed up review could you create something like this gist https://gist.github.com/Elchi3/7bd702e7b89692afa709de76903b1b7f but also with the bcd "query strings".

So, like this:

javascript.builtins.array.pop: https://tc39.github.io/ecma262/#sec-array.prototype.pop
javascript.builtins.array.push: https://tc39.github.io/ecma262/#sec-array.prototype.push
javascript.builtins.array.push.something: <no-spec-url>
javascript.builtins.something: <no-spec-url>

Created at https://github.com/w3c/browser-compat-data/wiki/JavaScript-spec_url-list

sideshowbarker added a commit to w3c/browser-compat-data that referenced this pull request Dec 27, 2018
Per mdn#2983 (comment)
review comments. This change should be squashed before this is merged to master.
@sideshowbarker
Copy link
Collaborator Author

  • I guess for finally committing this, we don't want to commit the add-specs.py file?

Yeah so in 878552d I’ve gone ahead and removed it

This change adds spec URLs in the `*.json` sources for all JavaScript
features that have an `mdn_url` for an MDN article with a
**Specification(s)** table — with the exception that no spec data is
added for any cases where a URL found in an MDN **Specification(s)**
table has no fragment-ID part.

The new field that holds the spec-URL data is named `spec_url`, and its
value is either a single URL (in the case where the feature is only
associated with a single specification references), or any array of URLs
(in the case where the feature is associated with multiple
specification references).

The change also updates `schemas/compat-data.schema.json` to allow the
additional spec data and to specify the structure it must conform to.
@Elchi3 Elchi3 removed the not ready ⛔ This is not yet ready to be merged. It's pending a decision, other PR, or a prerequisite action. label Jan 8, 2019
Copy link
Member

@Elchi3 Elchi3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this initial landing. Some things might need manual adding, but that's fine I think.
Happy New Year to you @sideshowbarker and thanks for your patience with this. 👍

@sideshowbarker
Copy link
Collaborator Author

@Elchi3 🎉 thanks much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:js 📟 Compat data for JS/ECMAScript features. https://developer.mozilla.org/docs/Web/JavaScript schema ⚙️ Isses or pull requests regarding the JSON schema files used in this project.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants