-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linguist should understand JSX in .js files #3677
Conversation
Do you want the JSX portions of If you're trying to recognize the JSX portions, that is not currently supported by Linguist and would require major changes. Currently, Linguist assigns a single language per file. If you want to fix the syntax highlighting, then you should submit a pull request to atom/language-javascript, which github.com uses to highlight JavaScript code. |
Uh, there's nothing wrong with the syntax highlighting. At all. JSX is not JavaScript. Users are advised to use the correct file extension ( |
Hm, I read this a bit too quickly :/ @buzinas So you're point is that many JSX files have a |
@@ -1975,6 +1975,7 @@ JSX: | |||
type: programming | |||
group: JavaScript | |||
extensions: | |||
- ".js" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is .js
really the primary extension for JSX? If not, please move this below .jsx
as the first extension is considered the primary extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it's kind of impossible to do that research on Github because it ignores some important characters - but from my knowledge much more people use .js
extension for writing JSX code than .jsx
.
If you go to React's official repo examples, you'll see that all of them uses JSX in .js
files (and the entire ecosystem does - Redux etc). Based on that, I assume that if the creators of JSX use the .js
extension, it's the primary extension for the language. But I can't ensure that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks for the confirmation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, another useful info: facebook/create-react-app#87
Should we add a heuristic to distinguish JS and JSX files? I'm afraid the classifier alone might not do a very good job. |
@pchaigno yeah, that sounds reasonable. Can you help me by giving the directions on how to do that? |
You can follow the examples in The heuristic itself doesn't need to be perfect. It should be correct when it identifies a file as being JSX (or JS), but it doesn't need to identify all JSX (or JS) files. |
Question has been addressed, but final overall approval is still required.
Just a friendly nudge. |
Can anyone jump in and help me with adding the heuristic? I'm having a hard time figuring out how to identify if a file is JSX or JS. |
I don't know anything about JSX but I'm pretty good at pattern matching 😉. From a cursory glance at https://reactjs.org/docs/jsx-in-depth.html and files already on GitHub, it appears JSX files will almost always include I've also noticed that some form of |
My two cents: in a JavaScript file, you could consider JSX anywhere you find a For example in JavaScript I think it’s virtually impossible to determine JSX based on it’s location (whether inside a component's |
@exalted that's what I was trying, but the problem is: there can be a |
Maybe it's worth looking at how JSX is detected to be then converted into regular JavaScript in tools like babel? |
@buzinas I don’t think you can look for |
That's another thing that I tried, but since Babel parses the JavaScript code as an AST, that would be too complex for a heuristic. |
I am not sure what do you mean by heuristic in this context, however I don’t think a RegEx approach suffice, sadly. I haven‘t worked with linguist, so I shouldn’t judge but, if an AST is not a viable option here, I think what we have left with is to take a best guess approach. I am sure that doesn't hold for long though and you’d have to add many edge cases to guess better and better… |
Nudge. |
Hey, just a friendly bump. I currently have PRs into private projects with Just a suggestion: maybe detecting either an |
Jumping in here. I believe Linguist can already syntax JSX files correctly with no problems, as you know, everything inside a JSX files is valid JavaScript, except for the HTML-like tags, so... would it be too crazy of an idea to use the already working JSX grammar to parse JS files by default instead and call it a day? I think it would have no actual implications, correct me on this? In fact, this is what a lot of devs are doing manually now like @zacanger just stated, not only for Linguist, but for Atom too (see https://discuss.atom.io/t/how-do-i-make-atom-recognize-a-file-with-extension-x-as-language-y/26539?u=wliu). By the way, I'm coming from here: atom/language-javascript#220 (comment), in which I made a similar argument, just in case anyone wants to check that out. |
I don't think so, there is however one teeeeny weeeny heeeawg fat problem... #3044 In short, the version of language-babel we currently use is pinned to a really old version with a known issue because the maintainer changed some of the regexes in the grammar to those only supported by the Oniguruma engine used in Atom, but not GitHub.com, and isn't prepared to entertain GitHub.com compatibility because "It works in atom..." If you know of a method of automatically converting the Oniguruma-only regexes to be PCRE-friendly or know another PCRE-friendly grammar that is actively maintained (we looked into switching to https://github.com/babel/babel-sublime but found it lacking - see #3775) we'll definitely entertain the solution to our current problem so we can consider your suggestion. |
github-linguist/linguist#3677 My understanding is that github uses linguist to identify what type of source file. And there are two problems: - The syntax highlighter used by linguist when it gets the wrong file type doesn't work with githubs regex engine. - And Linguist doesn't correctly identify the jsx files as jsx. I saw this gitattributes fix mentioned, so I'm giving it a try.
I think we can all agree that it's at least common, if not standard, for projects to have jsx code in One reason why projects haven't adopted Can we plz merge this for the benefit of people that love both JSX and Github? Arguing over whether |
@JonAbrams Have you read the above discussion? We are not arguing over whether As discussed above, we could use a heuristic rule (regular expression) as a first filter before the Bayesian classifier. There doesn't seem to be an obvious way to distinguish the two languages, but if anyone as an idea that may work, we'd welcome a pull request! In my opinion, the best option would be that mentioned by @reyronald: we would use a single grammar for both JS and JSX. JSX files wouldn't be detected as such, but they would at least be correctly highlighted. Distinguishing between JS and JSX files inside the grammar is easier because we're not limited by the computational power of regular expressions. @Alhadis mentioned in #4030 (Comment) that he's working on such a JS/JSX grammar, but that's obviously a huge task and I don't know if he found time to make progress. |
Not just a huge task, but slow as well. Much of this has to do with the fact that Atom doesn't live reload grammars as they're being worked on†, reducing me to reloading the entire workspace to see changes. This cumbersome workflow, in addition to the predicted complexity of the grammar code, has motivated me to start work on a compiler for an intermediate grammar format optimised for reading/writing regexp-based grammars. The long-term goal is to have a maintainable format which can be kept updated by the Babel maintainers, the TC39 committee, or whoever else is guaranteed to keep it updated. The compiler won't be limited to TextMate-style grammars alone, meaning it'll generate output for CodeMirror and Pygments as well. If I were to continue the way I started, with frequent repetition of CSON blocks and hacky idioms, I 100% guarantee the grammar would become unmaintainable in future. († — A less easily-justified reason is that half this year I've been limited to working on an antique laptop running on OS which Atom doesn't support, and I only recently scabbed a MacBook from a friend to resume Atom-related projects. But enough of that) |
This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions. |
This pull request has been automatically closed because it has not had activity in a long time. Please feel free to reopen it or create a new issue. |
For what it's worth, Facebook (more specifically, Dan Abramov) decrees that the facebook/create-react-app#87 (comment) This should happen as I'd even suggest that this PR is merged to allow syntax highlighting and a new ticket is opened about fixing the statistics in a new PR. This would get the broken highlighting fixed sooner. |
I'm not sure why we're even bothering to differentiate between JS and JSX anymore, actually. Since it falls under the usage statistics of the former, we may as well roll it into the Since JSX is a superset of JavaScript, it should be possible for one grammar to accommodate both JS and sugary extensions like embedded XML and type annotations. Moreover, Linguist has no way of knowing if something like this is a JS or JSX file: export const foo = "Valid ES6"; So in a large codebase, many files are likely to be flagged as ordinary JS because they don't use tag syntax, meaning users end up with a jarring classification like "20% JavaScript, 80% JSX" or what-have-you. And Linguist has no way of identifying files based on the presence of another language in the repository (which would be helpful, as it would vastly improve its interpretation of C++/C). |
This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions. |
Not stale, my god I hate these bots... |
Self-assigning this to shut Stalebot up. 👍 See #4358 |
Thank you @Alhadis :) |
This is now resolved by moving to using the treesitter JavaScript grammar GitHub uses in #5133. GitHub.com will reflect this after the next release due some time this month. Closing. |
Search for
.js
/ Search for.jsx
Search for
.js
with "return <div>" (the<
characters are ignored, but you can see that many people uses JSX in.js
files by opening a few pages).Please, let me know what I can do to help getting this PR merged at some point.
Refs #3144