Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dual-pass capability #80

Open
vhda opened this issue Oct 2, 2014 · 15 comments · May be fixed by #1495
Open

Add dual-pass capability #80

vhda opened this issue Oct 2, 2014 · 15 comments · May be fixed by #1495
Assignees
Milestone

Comments

@vhda
Copy link
Contributor

vhda commented Oct 2, 2014

Looking at some omni-completion functions in vim I understood that in order to identify the class of a certain object those functions need to parse the file until the object declaration is found. Since ctags already does this type of work, why not implement this parsing in it?

This is how I see it:

  1. Parse all files as usual.
  2. From the list of tags identify the class-like [1] tags and populate keyword hash table with it.
  3. Re-start parsing of all files.
  4. The parser skips initialization phase and starts parsing the files directly.
  5. Each new entry would need a special string identifying its class (similarly to signature).

There are more details that need to be looked into, but by using something similar to this we should end up with a complete tag list of the files including object declarations.
This should be specially useful when we have a libctags!

[1] Verilog, for example, has multiple tag types that can be used similarly to object declarations.

@vhda vhda added the Core part label Oct 2, 2014
@masatake
Copy link
Member

masatake commented Oct 3, 2014

You enter just another interesting area.

Do you mean running ctags twice or more for file sets?
A tags file generated by the 1st run holds enough information for the 2nd stage?

If the target is single input file, ctags has facilities to run multi-pass parsing.
(It was introduced for objc parser.)

@vhda
Copy link
Contributor Author

vhda commented Oct 3, 2014

I was thinking more of implementing this functionality in the core, because it would be something commonly used by all supporting parsers. Nevertheless, let me take a look at what the objc parser is doing.

@masatake
Copy link
Member

masatake commented Oct 3, 2014

Again, do you think running ctags twice on the same input file?
If yet, look at createTagsWithFallback() of parse.c.

@vhda
Copy link
Contributor Author

vhda commented Oct 3, 2014

The idea is to run ctags twice in a set of files.
Typically there is one class definition per file and we have to know all classes before being able to identify object declarations of those classes.

@masatake
Copy link
Member

masatake commented Oct 3, 2014

So in the 2nd pass, cross reference generated in the 1st stage can be used.
So the facility for multi-pass in a single file is not enough. I think I understand your intent.

It looks big challenge for me.

The biggest question is how the cross-reference data generated by the 1st
pass to the 2nd pass. If you have ideas could you show me a pseudo command
line?

Something like this?

(1st pass)$ ./ctags -o tags-1st-pass input-files....
(2nd pass)$ ./ctags -i tags-1st-pass -o final-tags input-files....

@vhda
Copy link
Contributor Author

vhda commented Oct 3, 2014

Being honest, up until now I've been basically focused on the Verilog parser, so I really do not have any ideas on how to implement this. As such, I was looking for some feedback from the community here :)

I was looking more for something like:

$ ./ctags -R --enable-object-detection input-files

The argument is a bit too big, but is just for demonstration purposes.

Internally I was thinking about having the parser register a list of kinds that can be used to declare variables. In most languages it should be something like "class", "typedef", etc. After the first parse, ctags replaces its keyword hash table with the class-like tags and runs a new parse using the new table.

From a parser point of view, it would only be necessary to add a new list of kinds in the parserDefinition, such that any parser that does not have that definition would not support the 2nd pass and exit cleanly. This way each parser maintainer could gradually implement the support of this feature for any corner-case situations and add relevant test cases.

Update: the parser would need to define to which kind the tag would be used. For example, it could be something like:

  • class -> o
  • typedef -> v

or even merge everything in the "variable" kind like:

  • class, typedef -> v

@masatake masatake added this to the Feature plan milestone Oct 4, 2014
@masatake
Copy link
Member

masatake commented Oct 5, 2014

I would like to use wiki to research this area with you.
Please, wait for implementing preload feature first.

@vhda
Copy link
Contributor Author

vhda commented Oct 6, 2014

Don't worry. I'm using my free time to improve the Verilog parser in order to later include SystemVerilog support. I'm also working on an omni completion script in vim.
It will be several weeks before I can look at this issue in detail.

@masatake
Copy link
Member

masatake commented Oct 8, 2014

I compiled all docuemnts I wrote into hacking guide. I will write internal of ctags next.
It will be the base of this discussion.

@masatake
Copy link
Member

... I would like to hear your idea more with an example.

input:

class Foo {
};
Foo bar;

In the first pass Foo is captured as a tag of class kind.
If I dump the state as tags file it will be:

Foo input /^class Foo {$/"; kind:class

In the second pass bar is captured as a tag of ...what?
Do you mean the kind of the tag is "Foo"?

Foo input /^class Foo{$/"; kind:class
bar input /^Foo bar;$/:"; kind:class:Foo

I'm sorry but I need an example. input and expected tags pairs are very helpful for me.

If we introduced reference field tags file for the input will be...

Foo input /^class Foo{$/"; kind:class
Foo input /^Foo bar;$/:"; ref:???
bar input /^Foo bar;$/:l kind:class:Foo

Multiple pass for multiple files are so powerfull like linker of C language.
But it needs too many work and may change the definition of ctags program itself.
How about multiple pass for single file? Even single file, it is still very interesting.

@vhda
Copy link
Contributor Author

vhda commented Oct 22, 2015

Let try to pass along my ideas.

  1. The first pass only identifies a subset of the supported kinds. In a object oriented language this subset would typically be "class" and/or "typedef".
  2. In the second pass we identify all kinds, including the kinds identified in the first pass.
    • Each parser will define a conversion table for the special subset. E.g.: class->object; typedef->variable, which would require the existence of "object" and "variable" kinds.
    • The conversion can be done to an existing kind.

So, referring to your example, I would expect something like:

Foo input /^class Foo{$/"; kind:class
bar input /^Foo bar;$/:"; kind:object type:class:Foo

Where "type" would be a special extended attribute. Don't know if we can reuse any existing attribute for that purpose.

Typically classes are defined in different files, so I'm not sure this is really useful in a single file. But we should definitely support that possibility, because many (most?) languages do not enforce the requirement of having a single class defined per file.

@masatake
Copy link
Member

I think we can introduce "reference tag" and multiple-input-file multi-path (mm) parser separately.
As @shigio shows the concept "reference tag" can be introduced without introducing mm parser.
mm parser is useful for improving the quality of capturing reference tags. However, it is still useful
ordinary definition tags. Actually bar in your example is a definition tag.

We don't have enough knowledge about how to capturing reference tags. However, we can start from extending tags format: introducing ref: field. Maybe single-input-file multi-path (sm) parser may be
useful to improve the quality of capturing reference tags. Only a few parser using sm facility of ctags.
During expanding the area using sm parser, we will know what kind of features are needed in the cork.
mm parser will come next.

@masatake
Copy link
Member

% cat /tmp/foo.c
struct foo bar;
% ./ctags --fields=+t -o - /tmp/foo.c
bar /tmp/foo.c  /^struct foo bar;$/;"   v   typeref:struct:foo

typeref field is already avaiable. mm parser can used as facility for improving the quality of typeref fields in languages.

@vhda
Copy link
Contributor Author

vhda commented Jun 28, 2017

Replying to #1488 (comment) :

This is what would be the ideal implementation, IMHO:

  • ctags parses a.sv and b.sv, and adds container types found as keywords to language's keywordAssoc. This is pass0.
  • ctags parses a.sv and b.sv, and emits tags file. This is pass1.

@masatake masatake self-assigned this Jun 28, 2017
@masatake
Copy link
Member

I found a good way to implement an infrustructure for multiple-input-file multi-path (mm) parser WITHOUT intermediate file. Newly designed barrel API inspired from cork API is a part of mm API. Surprisingly it is not difficult to implement.

ctags parses a.sv and b.sv, and adds container types found as keywords to language's keywordAssoc. This is pass0.
ctags parses a.sv and b.sv, and emits tags file. This is pass1.

I see. I would like you to make tags for the container types found by the parser with marking "putting it to barrel" in the pass0.

In the pass1, you can access tags in the barrel. The barrel of tags are shared parsers. However, about SystemVerilog, only tags of container types(class kind for example) are in the barrel. Therefore you can
build the keyword table at the first of pass1.

This will be quite powerfull API...There will be many applications. But I myself will just provide API till 6.0.

I will not make you wait long time.

@masatake masatake linked a pull request Jun 29, 2017 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants