Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[READY] C/C++ semantic highlighting. #291

Closed
wants to merge 15 commits into from
Closed

Conversation

davits
Copy link
Contributor

@davits davits commented Jan 2, 2016

Hi Guys and Happy New Year,
I've decided to turn on Christmas lights in the Vim for the occasion :)

This is a proof of a concept and lot more polishing needs to be done.
I just wanted to share the initial results, and see your concerns and remarks about the current implementation and the direction it is going. The process of extraction and usage of semantic tokens is pretty much similar to the one with diagnostics, so I implemented semantics by changing return value of file reparse function to be pair of diagnostics and semantics.
For now it supports/returns tokens of the following types: namespace, class, struct, union, typedef, enum, enum constant, macro, function, function argument. And all these token types can be linked to the different highlighting groups.

PS
I will create another pull request for YouCompleteMe, so you can test this.

PPS
Here are some screenshots to please your eyes. (Note that bold version of Type used to highlight user defined types, and bold version of Normal to highlight function arguments).
Image
Image
Image

Review on Reviewable

@davits davits changed the title C/C++ semantic higlighting. C/C++ semantic highlighting. Jan 2, 2016
@vheon
Copy link
Contributor

vheon commented Jan 2, 2016

Do you know https://github.com/jeaye/color_coded? It does the thing you want. True, you would have to pay for two parsing time since you would not reuse the parsed AST. We already had this kind of proposal ycm-core/YouCompleteMe#933 and apparently we turned that down, I don't know if @Valloric changed his mind around this. @jeaye you might be interested.

@davits
Copy link
Contributor Author

davits commented Jan 2, 2016

Yes I came across color_coded while researching libclang interface for the syntax highlighting.
It's not only the parsing time, you also have to configure second plugin for the compilation flags.
Didn't know we had such proposal and it was rejected, maybe for good, wouldn't put effort into this otherwise. @Valloric what do you think about this now.

BTW: @jeaye taking this opportunity to ask; you are adding matches only for the visible lines, and updating them on viewport change. Are you doing this for performance reasons?

@Valloric
Copy link
Member

Valloric commented Jan 2, 2016

@davits Thank your for the PR and your hard work, but I still think syntax highlighting is out-of-scope for YCM. I don't think it provides enough value to be worth the extra complexity.

OTOH, this PR is pretty convincing by demonstrating that this wouldn't really be that much extra code in ycmd. So I'm somewhat on the fence about this now. I'm curious what the necessary impact in YCM itself would be. Also, I'd like to hear what others think about this: @puremourning @micbou

And as @vheon pointed out, color_coded already exists so adding this feature wouldn't necessarily move the needle much.

@puremourning
Copy link
Member

Honestly, I was pretty stoked when I looked at this earlier. I think it is a neat feature, and I think the simplicity of the implementation is pretty compelling. It's really nice work. I'm also, as I think you know, in favour of more IDE-like features in Vim and only having one thing to configure for all my code-comprehension features (particularly for C/C++), which in fairness, is a goal of YCM. (FWIW, YCM's .ycm_extra_conf.py is the only such system I have found which allows me to write the config once and submit it to version control and have it work no matter who/where the code is being edited, and where the third-party libs and dependencies happen to be!)

I would like to give it a test drive though - I've not tried color_coded and TBH I've never used any semantic highlighting, but my understanding is that these features are becoming more commonplace in the IDE market, and many folk are coming to expect them elsewhere.

On the actual implementation, the server-side stuff looks pretty neat - it provides the annotated tokens in the FileReadyToParse response. I can even see this working for other completers. I think there is a non-backward compatible change in API (response shape) here, which we'd need to either design around, or boldly signpost for other ycmd clients (emacs, etc.).

On the Vim client side, though, I have more pressing concerns; I understand the caveats around application of the highlighting as an "overlay" in Vim are quite stark - Vim syntax highlighting really isn't currently designed for this type of thing - I read a thread where someone complained about design choices in matchaddpos() and that the main purpose was something else (squiggles? error highlighting? something like that..). I'm concerned that the fixed offsets of the highlight regions and the delayed nature of our parse requests will lead to effectively indeterminate highlighting during typing, which is actually a pretty large part of the time. Perhaps only applying them in Normal and Visual modes would work (i.e. use standard highlighting in Insert mode ?).

Also, as always, client performance is a of a worry - i particular if it requires Vim to do lots of redrawing (i don't know tbh). I very often work with exported X display over a somewhat annoyingly slow network, which means that redraw performance (strictly, I guess, frequency) is actually pretty important. I'm somewhat less worried about server performance, but testing it out would certainly be on the TODO list.

@vheon
Copy link
Contributor

vheon commented Jan 2, 2016

I was also puzzled by the little code that was necessary to implement this feature; I don't know if it is because ycmd already has the code to make this work or if it is something else. What I'm not convinced is how to handle it client side. I'm not buying the fixed offsets; maybe you should see how https://github.com/jeaye/color_coded works? Does it do it differently? If I remember correctly it play with the colorscheme directly and add tokens to a after/syntax/something.vim. I believe that until this aspect is sorted out we could not consider this feature to be merged (but I'd love to see it, specially if it is so easy to add). Good work anyway 👍

@davits
Copy link
Contributor Author

davits commented Jan 2, 2016

@Valloric I can see why this is out-of-scope for YCM, but on the other hand with YCM becoming de-facto code completion standard for the Linux systems and editors, it is just a matter of time when necessity of this will rise again.

@puremourning There will be some lag between typing and highlighting updates. I checked the semantic highlighting in Visual C++ few days ago, it reparses file after cursor idle and updates the highlighting, so you can literally see the lag between type and highlight. Beside semantic highlighting is mostly useful when you are exploring an alien source code.
I wrote to vim-dev about matchaddpos() using absolute positions and performance concerns, still no reply. @jeaye did a workaround (add only viewport tokens) in his color_coded plugin as I believe for performance reasons, and I would hate to add the another workaround for the YCM.

I don't know much about eclim, but there was a goal to add semantic highlighting support to neovim. @tarruda are you planning to add new mechanism for the semantic highlighting, or reusing/improving matchaddpos() ?

@vheon
Copy link
Contributor

vheon commented Jan 2, 2016

Looking at the code of color_coded more carefully it actually use matchaddpos and as @davits pointed out it updates only the view of the buffer for performance reason I believe. Since here we are in ycmd-land we could consider this feature for some clients where this is easy to use, maybe our friends of emacs-ycmd? @abingham

@jeaye
Copy link

jeaye commented Jan 3, 2016

BTW: @jeaye taking this opportunity to ask; you are adding matches only for the visible lines, and updating them on viewport change. Are you doing this for performance reasons?

Oh yeah. It's unimaginably slow, scrolling through a highlighted 2K line buffer on a powerful machine. This is entirely due to vim's rendering, not extra work being done in color_coded. I keep track of what's visible and update only when needed; the performance becomes much better.

A common complaint is having two plugins, which have two separate configs, for clang-based work. There's a simple solution to that: https://github.com/rdnetto/YCM-Generator which can generate both YCM and color_coded configs.

As far as the duplicate parsing goes, the option to which I was looking forward requires a vim-based (or vim-compatible) generic libclang plugin which just exposes all of the annotated tokens for others to use. YCM could do this, or another could, and color_coded could be reworked to just pull the tokens from there. The benefit, as Valloric has been saying since I originally proposed this feature for YCM, is that we keep highlighting logic separated.

@Valloric
Copy link
Member

Valloric commented Jan 3, 2016

@jeaye You make an excellent point with separating the highlighting logic. I could see us merging the necessary changes in ycmd to expose tokens to clients. YCM could then expose an external function that color_coded could use to pull tokens out and do whatever it wants.

I just really don't want to see more vimscript nastiness in YCM. If we could decouple this so that we mention in the YCM readme "hey if you want semantic syntax highlighting, go install this other plugin," that would just be dandy.

I'll openly admit that my apprehension about this feature lies entirely with the necessary Vim client layer. I can't see how it would be fast or anything but annoying to implement such that the whole thing works satisfiably enough. Not with Vim the way it is. Maybe with Neovim in the future (@tarruda :)).

But "the Vim client would be nasty" is not a good reason to not implement support for this in ycmd. Emacs, Sublime Text, Atom etc clients should not suffer because of Vim. So begrudgingly, I'll admit we should have it in ycmd.

Exposing a nice hook in YCM so that other plugins that want to deal with the syntax highlighting hassle can go do it makes sense as well.

So I'm curious to hear what others (especially you @jeaye) think about such a design. How do you envision a good hook in YCM for this data to look like?

@jeaye
Copy link

jeaye commented Jan 3, 2016

I don't think ycmd needs to do more than provide an API which dumps a JSON version of the annotated cursors. This will allow it to remain editor-agnostic, so the Vim, Emacs, etc folks can all do their own thing outside of ycmd. If I can get my hands on that JSON, I'd rework color_coded to use it and we can ride on the existing efforts put into color_coded on the vim/highlighting front, as well as its existing user base.

One thing that worries me is the potential dichotomy between YCM's highlighting and color_coded's highlighting, as well as the expanded feature set of YCM. If we have several hundred/some thousand people using color_coded, adding this functionality into YCM will lead to duplicate efforts with little gain.

I'd like to push to keep both projects going and I think color_coded can only be strengthened by the induction of a ycmd backend.

@jeaye
Copy link

jeaye commented Jan 3, 2016

To expand on my point a bit, I think that addressing the duplicate compilation issue transcends the highlighting discussion to encompass other possible uses of libclang within editors. Refactoring, searching, documenting, etc all can make use of ycmd sharing the results of the heavy lifting it's already doing.

If we bring in highlighting, as part of the C-style editing suite of YCM, what's next? Will the refactoring and other IDE-like behaviors also make it in? Nay, it'd be prudent to generalize the problem by providing the JSON API and let plugin writers have at it.

@davits
Copy link
Contributor Author

davits commented Jan 3, 2016

When I started working on this, I implemented very first version as a new handler in ycmd/handlers.py which calls completer's GetSemanticTokens function. But I failed to integrate syntax highlighting into YCM in a way I envisioned it, so I switched to this approach. For the Vim it isn't obvious when or where to call the handler, or what to do when file is reparsing at the moment, of course this isn't problem for editors with async support.
I like the idea with handler better, because it fits in existing ycmd infrastructure and more easily extendable to other completers.

Another thing is that I'm currently extracting tokens from the whole translation unit, but libclang has interface to extract tokens only from certain source range. We can expose this in the ycmd API, and let user/editor to decide whether to extract all the tokens once, or by portions. This will allow us to have ultra-fast token extraction for the visible source range.

Conclusion: How I see it.
ycmd: Provide handler which will allow to extract tokens from specific range and return them as JSON.
YCM: Provide Blocking function ycm#GetTokensFromRange(buffer, start_line, end_line, timeout). Syntax highlighting plugin will call this function for visible range with ~10-20ms timeout on cursor hold when viewport is changed. And also will be responsible for caching results and invalidating token cache on buffer changes.

@davits
Copy link
Contributor Author

davits commented Jan 3, 2016

Follow up:
I've encountered some weird bug in libclang while testing token extraction from source range.
For example when extracting tokens from TranslationUnit.cpp file from line 497 to 544, all tokens on lines 497-530 are marked as Invalid and skipped.
But when extracting tokens from the whole translation unit, everything is extracted normally, no anomalies detected. So we need to do token extraction all at once.

I also did some performance testing for token extraction (CPU AMD-FX8350 4Ghz, test is single threaded, time is measured with clock() function):

TranslationUnit.cpp 546 lines, 2105 Tokens, 580 Identifier tokens,
    Compile time: ~0.59s
    Token extraction time: ~0.0021s
TranslationUnitStore.cpp 140 lines, 524 Tokens, 134 Identifier tokens,
    Compile time: ~0.8s (more include files?)
    Token extraction time: ~0.0013s

Compared with the compilation time, token extraction does not have impact on performance.

Passing the tokens of the whole translation unit also shouldn't be a problem, currently we are passing the whole unsaved buffer from client to server. @Valloric have you ever measured how long it takes to pass the buffer? It should be a RAM to RAM copy, but it is interesting anyway.

@vheon
Copy link
Contributor

vheon commented Jan 4, 2016

I'll write this here because is related to the whole change. I don't think I'm comfortable with the bare name "semantics".


Reviewed 1 of 12 files at r1.
Review status: 1 of 12 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from the review on Reviewable.io

@davits
Copy link
Contributor Author

davits commented Jan 4, 2016

Me neither :) I will change it to "SemanticTokens".
I will start polishing and testing this change tomorrow (It's midnight here).


Comments from the review on Reviewable.io

@Valloric
Copy link
Member

Valloric commented Jan 5, 2016

@jeaye

One thing that worries me is the potential dichotomy between YCM's highlighting and color_coded's highlighting

My point is that YCM itself would not be doing any highlighting, it would only provide an API for color_coded to get a token stream.

I think that addressing the duplicate compilation issue transcends the highlighting discussion to encompass other possible uses of libclang within editors. Refactoring, searching, documenting, etc all can make use of ycmd sharing the results of the heavy lifting it's already doing.

Fair point, but I'd like to see YCM gain all those features to the extent that separate plugins for this aren't necessary... with the exception of semantic syntax highlighting.

If we bring in highlighting, as part of the C-style editing suite of YCM, what's next? Will the refactoring and other IDE-like behaviors also make it in?

Yes! We already do GoTo, FixIts, error diagnostics and now the ever-awesome @puremourning is working on rename support (for JS at least).

it'd be prudent to generalize the problem by providing the JSON API and let plugin writers have at it.

I agree that it would be ideal to provide some general way to interact with a YCM-managed instance of ycmd, but this PR is probably not where we should be tackling it. A problem worth solving though.

@davits

I agree that we should have a ycmd handler that accepts a range of locations (which are (line, column) pairs; this type of location notation is already present in ycmd) for which tokenization should happen. Let's start with that; in the future, we might want to return a full-file token stream like you already implemented along with diagnostics on every file parse.

@davits
Copy link
Contributor Author

davits commented Jan 7, 2016

@Valloric I've changed the handler to be range specific, but the old approach when tokens are extracted for the whole file after parsing. GetSemanticTokens returns correct sub-vector for the specified range.
I did this because I've encountered some weird bugs related to the range token extraction in my test application, which seems to be absent when done in the ycmd. I will do some more testing with ycmd and switch to the clang's range extraction if no issues found.

@tarruda
Copy link

tarruda commented Jan 13, 2016

@tarruda are you planning to add new mechanism for the semantic highlighting, or reusing/improving matchaddpos() ?

I didn't think much about this subject yet, but ideas are welcome(PRs are even better :))

@bfredl
Copy link

bfredl commented Jan 14, 2016

Well there is neovim/neovim#1817 which kinda is an incremental improvement of matchaddpos (async support, faster implementation for large amounts of matches (I hope), buffer rather than window associated, adapts to basic changes like added/deleted blank lines), which reminds me to update that branch and write some documentation...

@davits
Copy link
Contributor Author

davits commented Jan 20, 2016

@bfredl Great Job! You addressed all issues/concerns we raised here.
Another think you might want to consider is adding mechanism for users to get regions where matches need to be updated or added (i.e. for added/updated lines).

Because in my tests for 7000 lines source file (NOT counting include files):

  • tokenization and values return from ycmd takes ~130ms for whole file,
  • the same takes ~3ms for the 50 line viewport from that 7000 line file.

@bfredl
Copy link

bfredl commented Jan 20, 2016

That is something we definitely want as well (notifictations for changed buffer contents). Some disscussion in neovim/neovim#2224

@bfredl
Copy link

bfredl commented Jan 24, 2016

FWIW, I tried to change the compile diagnostics of YCM to use the new buffer highlight api here. Obviously we should define a python interface for this rather than using the raw session requests, but it works pretty well (responds better to line linenumberings and buffer switches in the same window).
@davits depending on the status of this PR I would be happy to try test it using neovim/neovim#1817

@puremourning
Copy link
Member

@bfredl that's quite cool, but unless i'm missing something it only works in neovim, so we can't really support it (yet). Certainly, I don't think we have any plans to drop support for Vim. Though, I must confess, it's pretty cool what is going on in the nevim world.

Next up: a proper API for auto-completion, rather than the hacks YCM client has for auto triggering...

@bfredl
Copy link

bfredl commented Jan 24, 2016

True, the current branch was just for testing, if YCM wants to add it it would either be in a has('nvim') branch or in a separate add-on plugin.

@puremourning
Copy link
Member

@davits

Follow up:
I've encountered some weird bug in libclang while testing token extraction from source range.
For example when extracting tokens from TranslationUnit.cpp file from line 497 to 544, all tokens on lines 497-530 are marked as Invalid and skipped.

I did this because I've encountered some weird bugs related to the range token extraction in my test application, which seems to be absent when done in the ycmd.

Interesting. I don't know much about this API. Do you have a reduced test case for it (maybe using c-index-test)? We can then raise it with the awesome combined knowledge of the cfe-dev mainline list.

@puremourning
Copy link
Member

This is solid work, so thanks a ton!!

I've added some comments here and there, and there are a couple of overall things below:

  • The PR description needs to include a section detailing the ycmd JSON API for this. This is currently the only way we document it, and it's becoming increasingly important that we do so.
  • The PR needs tests. As Val would put it: 'moar tests, nom nom nom' etc.
  • Should we update the example client to show how to use this new API ?

Reviewed 1 of 12 files at r1, 1 of 12 files at r2, 9 of 10 files at r3, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 20 unresolved discussions.


cpp/ycm/ClangCompleter/ClangCompleter.h, line 113 [r4] (raw file):
while I agree that negative line numbers are not sane, IIRC there is some case where libclang returns them. Moreover, using uint here isn't consistent with the rest of the API to this class. I feel we should use int unless there's a good reason not to.


cpp/ycm/ClangCompleter/Token.cpp, line 1 [r4] (raw file):
The copyright isn't owned by Google. It's (actually) owned by you.

Typically, we mark these with Copyright (C) 2016 ycmd contributors , though you're more than welcome to put your own name there.


cpp/ycm/ClangCompleter/Token.cpp, line 3 [r4] (raw file):
It's part of ycmd, actually. I think these headers are copy/pasted from another file (which is fine), but that file hasn't been loved enough since the YCM/ycmd split. Your file is new, so it should have new copyright headers .


cpp/ycm/ClangCompleter/Token.cpp, line 26 [r4] (raw file):
IMO it is worth writing a comment here that says this method is reentrant (recursive). It helps remind people to not do heinous things with static or global state


cpp/ycm/ClangCompleter/Token.cpp, line 89 [r4] (raw file):
Are there any CXCursor kinds are we not explicitly listing here? I'm in favour of skipping the default block and letting the compiler warn me if i'm missing a case.

This will help in the future if any new kinds are added. It also helps reviewers to check that we're covering everything we need, like objective-c stuff, etc.


cpp/ycm/ClangCompleter/Token.cpp, line 118 [r4] (raw file):
Looking at the docs for clang_getExpansionLocation, I think that you can just pass NULL as the file argument, rather than having an unused local variable.

file [out] if non-NULL, will be set to the file to which the given source location points.


cpp/ycm/ClangCompleter/Token.cpp, line 148 [r4] (raw file):
... and thus this is a partial ordering?


cpp/ycm/ClangCompleter/Token.h, line 21 [r4] (raw file):
Forgive my ignorance; i haven't come across this header before - where does it come from, and what is it used for within this header?


cpp/ycm/ClangCompleter/Token.h, line 29 [r4] (raw file):
I think it is worth adding doxygen comments for this class


cpp/ycm/ClangCompleter/Token.h, line 31 [r4] (raw file):
I think it is worth adding doxygen comments for these types


cpp/ycm/ClangCompleter/TranslationUnit.cpp, line 402 [r4] (raw file):
i suspect there are better names for these 2 locks, unless the numbering is specifically to avoid deadlock (i.e. you must acquire these locks in this specific order). That would make sense, but worth a comment IMO


cpp/ycm/ClangCompleter/TranslationUnit.cpp, line 409 [r4] (raw file):
it's good style to use NULL (or, one day, nullptr) to initialise pointers.


cpp/ycm/ClangCompleter/TranslationUnit.cpp, line 418 [r4] (raw file):
I think this is basically saying that we ignore punctuation like ( and :: etc and language keywords, literals etc. (because they are handled by classic syntax highlighters well enough) and . ?

if so, that's cool, but it would help to make a comment here to say that.


cpp/ycm/ClangCompleter/TranslationUnit.cpp, line 543 [r4] (raw file):
I might be missing something, but I think the implication here is that latest_semantic_tokens_ is a sorted vector. This isn't made completely obvious by UpdateLatestSemanticTokens. It seems to rely on the ordering defined by Token::operator< to be the same as the ordering defined by the return from clang_tokenize. Is that accurate? I think it is a safe assumption. There's nothing in the docs which defines the sequence of the tokens returned, thought the docs do imply they have some sane ordering. It's not clear if overlapping tokens is possible (i can't see exactly how it would be, given the last statement in the doc for clang_tokenize, but i haven't investigated).

The docs on clang_tokenize might explain why you were seeing seemingly spurious results in your tests?


cpp/ycm/ClangCompleter/TranslationUnit.cpp, line 550 [r4] (raw file):
is there an optimisation here to start searching from range_start ? If the start/end are backwardised, then sane results should not be expected...


cpp/ycm/ycm_core.cpp, line 201 [r4] (raw file):
Should we consider exporting these without the trailing _ ? @mispencer is quite strongly against exporting these ClangCompleter internals further than the ClangCompleter python class. I tend to agree.


ycmd/completers/completer.py, line 292 [r4] (raw file):
The docs at the top of the file need updating to go with this change.


ycmd/handlers.py, line 133 [r4] (raw file):
Is it possible to receive this request and for GetFiletypeCompleter to return None ? I think (from an API perspective) that is possible, and we should raise an appropriate error, rather than just a python error? Or does GetFiletypeCompleter already do this?


ycmd/handlers.py, line 138 [r4] (raw file):
there's an extra blank line here.


ycmd/responses.py, line 137 [r4] (raw file):
when would it not have a name attribute?


Comments from the review on Reviewable.io

@homu
Copy link
Contributor

homu commented Jan 24, 2016

☔ The latest upstream changes (presumably #311) made this pull request unmergeable. Please resolve the merge conflicts.

Davit Samvelyan and others added 3 commits October 5, 2016 06:53
Added new Variable and TemplateNonTypeParameter tokens, updated some
token names. Added semantic_token.py file representing single semantic
token, and providing complete list of all supported tokens.
Added utility functions for building token and skipped range response.
Needed to cover semantic_token.py file. Renamed Struct -> Structure.
@davits
Copy link
Contributor Author

davits commented Oct 11, 2016

@micbou @puremourning @Valloric @vheon
Could you please give this another look.
It's ready and waiting for your approval.

@aeremin
Copy link

aeremin commented Oct 13, 2016

Hi @davits, thanks for this amazing effort! I really hope that it would be merged to mainstream ycmd soon.
Some question: are there any special reasons to report both free functions and class method as Token::FUNCTION? I can imagine some situations where users want to know this detail. Is it possible to use separate tokens instead?

@davits
Copy link
Contributor Author

davits commented Oct 13, 2016

Hi @aeremin,
Thanks for pointing that out. It must have slipped my attention.
It is very easy to fix, I'll do it before this PR is merged.

@micbou
Copy link
Collaborator

micbou commented Oct 14, 2016

I didn't realize that the requests for the /semantic_tokens and /skipped_ranges handlers do not contain the contents of the buffers but only the current filepath. This works for the Clang completer because it relies on the FileReadyToParse events to parse the current translation unit then extract the tokens from it using the filepath. However, other completers will not work that way: they will need the buffers contents. I think that these requests should have the same base request structure as other requests (filepath, line_num, col_num, and file_data ) so that clients don't have to think on how to build them: they are like other requests except some extra fields.

About the extra fields in the /semantic_tokens request (start_line, start_column, end_line, and end_column), I would regroup them as follows:

'range': {
  'start': {
    'line_num'
    'column_num'
  },
  'end': {
    'line_num'
    'column_num'
  }
}

similarly to a Range response. Also, these fields should be optional: a client may want the tokens from the whole file.


Reviewed 1 of 13 files at r17, 1 of 8 files at r24, 3 of 5 files at r25, 6 of 6 files at r26, 1 of 1 files at r27, 4 of 4 files at r28, 6 of 6 files at r29.
Review status: all files reviewed at latest revision, 8 unresolved discussions.


ycmd/tests/clang/get_semantic_tokens_test.py, line 27 at r29 (raw file):

from nose.tools import eq_
from hamcrest import ( assert_that, contains, has_items )

Parentheses are not needed.


ycmd/tests/clang/get_semantic_tokens_test.py, line 80 at r29 (raw file):

def BuildSemanticTokenData_unit_test():

This is a generic test, independent to the Clang completer so it should be moved to the ycmd/tests folder probably in a new file. We could even use the DummyCompleter class for this test like it's done in the GetCompletions_ForceSemantic_Works test but I'm fine with a unit test.


ycmd/tests/clang/get_skipped_ranges_test.py, line 27 at r29 (raw file):

from nose.tools import eq_
from hamcrest import ( assert_that, contains, empty )

Parentheses are not needed.


ycmd/tests/clang/get_skipped_ranges_test.py, line 31 at r29 (raw file):

from ycmd.tests.clang import PathToTestFile, SharedYcmd
from ycmd.tests.test_utils import BuildRequest
from ycmd.responses import ( BuildRangeData, Range, Location )

Parentheses are not needed.


ycmd/tests/clang/testdata/token_test_data/.ycm_extra_conf.py, line 3 at r29 (raw file):

def FlagsForFile( filename, **kwargs ):

  flags = [ '-x', 'c++', '-I', '.' ]

The include flag does not seem useful for the tests.


Comments from Reviewable

@davits
Copy link
Contributor Author

davits commented Oct 14, 2016

(sighs) It's been implemented this way from the beginning (9 months)...
Other completers may pass buffer content to the /semantic_tokens handler if they need to, no one is prohibiting that. In fact it can be passed to the clang completer too, but it will be ignored. I would not recommend passing buffer content to the clang completer at all, because extra copy time, no point reparsing file. Reason, speed!

If you want this PR to reach the Ideal state and then merge it: 1. ideal is subjective, 2. sorry I'm not the guy.
Please don't get me wrong on the last part, I'm willing and will work on this feature and improve it further on, I just think that it should be done in incremental steps, nothing is perfect from the beginning.

I wanted to write this 2 times already, but I said to myself "you are near the end, it's the last push".
Sorry guys I can't take this bikeshedding anymore.

The feature is implemented, it's working (using in everyday work for over two months now, with other ~10 people from my organization), it's covered. Yes there are some minor polishing to be done, but it doesn't affect the functionality.

Nothing personal, I understand that you want the best for the project. I want the best for the project too, otherwise I wouldn't be pushing this for over a 9 months.


Review status: all files reviewed at latest revision, 8 unresolved discussions.


Comments from Reviewable

@Valloric
Copy link
Member

Valloric commented Oct 15, 2016

@davits Sorry for this PR taking this long, there's been delays on both sides. We could have done better here, absolutely.

But when it comes to the HTTP API, we can't just release it in a "good enough" state. We have a policy of never breaking backwards compatibility with ycmd. We also have a policy of "every commit is a release." So once we submit this, it becomes part of our API going forward and can't change.

This is also the reason why we're extremely careful about PRs that add things to the API: we'll be maintaining it forever. You're implementing a very large feature that comes with new handlers and we really need to get it right the first time; there's parts of our current API that could be better but that we can't change now without breaking lots of clients.

I think this PR is pretty close to being done, so stay strong! :)

I think that these requests should have the same base request structure as other requests (filepath, line_num, col_num, and file_data ) so that clients don't have to think on how to build them: they are like other requests except some extra fields.

I have to agree with @micbou here; requests should ideally be similar to other requests clients make. I've reviewed this PR several times but it has also changed several times and I can't constantly keep it all in my head so I'm glad @micbou found an issue I agree should be fixed. With big PRs like this one, it's often hard to see the forest from the trees so an issue like this can easily be missed through multiple rounds of reviews.

With regards to copies on the wire, it's not really a problem for a localhost server. I did benchmarks on this 3 years ago and it has stayed true since then. If it ever became a problem, the "standard" request also includes a filepath param in addition to the buffer so we could always just not populate the buffer in the request and have the server read it. But we really want to have that buffer to support unsaved changes in the editor (so the buffer contents we send are not the same as the one on disk); it's the way YCM currently works and it's not at all slow. For libclang, initial parsing is slow, but subsequent parsing is fast, even if we provide the full buffer.

It is now is more JSONish
Seperated tokens for static member variables and static member
functions.
@davits
Copy link
Contributor Author

davits commented Oct 15, 2016

requests should ideally be similar to other requests clients make

Request is similar, but line_num, col_num and file_data will be ignored by clang completer, because they are not needed. Other completers can use them if necessary.
I like the @micbou 's proposal about range structure, it is included in the update. I thought we passed server API finalization long ago.

Also, these fields should be optional: a client may want the tokens from the whole file.

Clients can get tokens from the whole file by passing (1, 1), (file_end_line, file_end_line_end_col), or just (1, 1), (1000000, 1) with a large value as end line, clang supports it. I agree with your proposal, but lets stick with what we have for a while.

With regards to copies on the wire, it's not really a problem for a localhost server. I did benchmarks on this 3 years ago and it has stayed true since then.

So my approximation of the buffer copy time was close :) . When I was testing token extraction for 50 lines on YouCompleteMe side it was taking 35ms, from which the Token extraction itself in C++ was taking <1ms, so I assumed that with buffer copy time around 12ms added to the client-server overhead, it should take 36ms to send buffer content (not related to the discussion).

If it ever became a problem, the "standard" request also includes a filepath param in addition to the buffer so we could always just not populate the buffer in the request and have the server read it. But we really want to have that buffer to support unsaved changes in the editor (so the buffer contents we send are not the same as the one on disk);

I understand why the buffer content is needed for the FileReadyToParse or completion request.
So given the number of 35ms for the token request I've decided to put a barrier of 10ms on each token request, which should have been enough, even if user is using bigger screen (more visible lines), should have provided time space in case of non ideal conditions on the machine, and be comfortable/unnoticeable by user. (Though 2 weeks ago in our multiserver/multiuser environment one of machines got overloaded and I started getting Timeouts for the token requests on it, and noticeable lag when typing, changed timout to 20 to get it work).
So if I add token time and buffer copy time, tested in ideal conditions I will pass 10ms barrier.

For libclang, initial parsing is slow, but subsequent parsing is fast, even if we provide the full buffer.
Subsequent parsing is fast because of the precompiled header. But when you change/add/delete include it's same as initial parse time. Also subsequent is fast compared to the relative, it takes >100ms to complete.

No buffer content copy, No reparsing - for the token extraction, it should be blazingly fast. Clients should take care of sending FileReadyToParse events (as they are doing now) and get tokens when file is parsed and ready.
When you just open file and scroll around, then initial file parse request got everything up and ready.
When you are editing file file reparse requests are sent anyway, for the completion.
In both cases no need to do the same work twice and burden token extraction.


Review status: 9 of 20 files reviewed at latest revision, 8 unresolved discussions.


Comments from Reviewable

@davits
Copy link
Contributor Author

davits commented Oct 15, 2016

Please read penultimate paragraph as:

For libclang, initial parsing is slow, but subsequent parsing is fast, even if we provide the full buffer.

Subsequent parsing is fast because of the precompiled header. But when you change/add/delete include it's same as initial parse time. Also subsequent is fast compared to the relative, it takes >100ms to complete.


Review status: 9 of 20 files reviewed at latest revision, 8 unresolved discussions.


Comments from Reviewable

@davits
Copy link
Contributor Author

davits commented Oct 15, 2016

Review status: 9 of 20 files reviewed at latest revision, 8 unresolved discussions.


ycmd/tests/clang/get_semantic_tokens_test.py, line 27 at r29 (raw file):

Previously, micbou wrote…

Parentheses are not needed.

Done.

ycmd/tests/clang/get_semantic_tokens_test.py, line 80 at r29 (raw file):

Previously, micbou wrote…

This is a generic test, independent to the Clang completer so it should be moved to the ycmd/tests folder probably in a new file. We could even use the DummyCompleter class for this test like it's done in the GetCompletions_ForceSemantic_Works test but I'm fine with a unit test.

Removed, because it is not necessary anymore.

ycmd/tests/clang/get_skipped_ranges_test.py, line 27 at r29 (raw file):

Previously, micbou wrote…

Parentheses are not needed.

Done.

ycmd/tests/clang/get_skipped_ranges_test.py, line 31 at r29 (raw file):

Previously, micbou wrote…

Parentheses are not needed.

Done.

ycmd/tests/clang/testdata/token_test_data/.ycm_extra_conf.py, line 3 at r29 (raw file):

Previously, micbou wrote…

The include flag does not seem useful for the tests.

Done.

Comments from Reviewable

@puremourning
Copy link
Member

So still :lgtm:

Regarding the buffer contents, I'm inclined to agree with @davits - it's YAGNI right now. We're not currently using it, and in practice we can always add it later if/when we need it for other completers. We just make it API-mandatory for those completers. And in fairness, the API isn't really totally stable right now anyway :)


Reviewed 1 of 13 files at r17, 3 of 8 files at r24, 11 of 11 files at r30.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


Comments from Reviewable

@Valloric
Copy link
Member

I think we now need @micbou to share his thoughts. :)

i.e. variables declared in namespace or translation unit scope.
@davits
Copy link
Contributor Author

davits commented Dec 2, 2016

Closing due to lack of interest.

Sorry everyone who was waiting for this to be merged. I've been in a state of anticipation for so long that it started getting on my nerves. I don't want to wait/worry for this PR anymore. I want to break free :)

I will continue to maintain my fork, and it will be easier to keep it up to date with this PR closed.

@aeremin
Copy link

aeremin commented Dec 2, 2016

@davits It's really sad that mainline ycmd is not interested enough in this.
But I am using your fork for months now, so no need to be sorry for me at least :)

BTW, are you interested in further contributions? E.g. I wanted to add ability to differentiate const/mutable entities.

@teto
Copy link

teto commented Dec 2, 2016

@davits I commend you for this PR. I've started following the issue for quite some time now hoping for some merge.
I will try your fork hoping you don't break things too much :)

@davits
Copy link
Contributor Author

davits commented Dec 3, 2016

@aeremin Sure, you are welcome to contribute, though you should wait a little bit. I want to separate type modifiers/qualifiers from token names, otherwise soon it will be impossible to deal with all combinations.
@teto It will be hard to break things, since this project has a very good foundation ;)

@Valloric
Copy link
Member

Valloric commented Dec 4, 2016

Closing due to lack of interest.

That's perhaps for the best. I don't think we, the ycmd maintainers, really handled this PR as well as we could have. I think the root cause of this was that none of us are really excited about this functionality. "Lack of interest" is really the best term here; it's not that we're against it (we're not), it's obviously something that makes sense in ycmd, but eh... it doesn't solve a problem any of us really have.

Couple that with lots of back-and-forth discussion on the design that really tired everyone out (including @davits), and you get the current unfortunate situation.

@davits Even with this being closed, thank you for sending your PR! It has made us really think through about how this feature should look like and do we really want to have it/maintain it.

Thanks again and no hard feelings! :)

@davits
Copy link
Contributor Author

davits commented Dec 5, 2016

no hard feelings!

Absolutely not :)

Huge thanks for the review, it made this PR much better then it was.
And I'm hoping you can help me with in the future as well.

@mellery451
Copy link

Sorry to comment on an ancient/closed issue. The killer feature here is the ability to visualize active blocks of code. For code-bases that extensively use macros to activate/deactivate significant bits of code, the ability for the editor to help me visually ignore large tracts of code is quite helpful. That said, I could also see myself wanting to turn this macro-aware hightlight/lowlight behavior off at times when I just want to treat all blocks as equal.

I'll also note briefly that there have been times in the past when I've misconfigured my build system and ended-up with macros defined (or not defined) incorrectly, thus resulting in unexpected code being included/excluded. In these cases, an editor feature to highlight active #ifdefs might have helped in identifying that situation sooner.

@Innokentiy-Alaytsev
Copy link
Contributor

@mellery451, you could try using ccls or cquery which support semantic highlighting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.