Add dual-pass capability #80

vhda · 2014-10-02T18:28:10Z

Looking at some omni-completion functions in vim I understood that in order to identify the class of a certain object those functions need to parse the file until the object declaration is found. Since ctags already does this type of work, why not implement this parsing in it?

This is how I see it:

Parse all files as usual.
From the list of tags identify the class-like [1] tags and populate keyword hash table with it.
Re-start parsing of all files.
The parser skips initialization phase and starts parsing the files directly.
Each new entry would need a special string identifying its class (similarly to signature).

There are more details that need to be looked into, but by using something similar to this we should end up with a complete tag list of the files including object declarations.
This should be specially useful when we have a libctags!

[1] Verilog, for example, has multiple tag types that can be used similarly to object declarations.

masatake · 2014-10-03T03:14:01Z

You enter just another interesting area.

Do you mean running ctags twice or more for file sets?
A tags file generated by the 1st run holds enough information for the 2nd stage?

If the target is single input file, ctags has facilities to run multi-pass parsing.
(It was introduced for objc parser.)

vhda · 2014-10-03T07:59:26Z

I was thinking more of implementing this functionality in the core, because it would be something commonly used by all supporting parsers. Nevertheless, let me take a look at what the objc parser is doing.

masatake · 2014-10-03T08:14:00Z

Again, do you think running ctags twice on the same input file?
If yet, look at createTagsWithFallback() of parse.c.

vhda · 2014-10-03T08:24:12Z

The idea is to run ctags twice in a set of files.
Typically there is one class definition per file and we have to know all classes before being able to identify object declarations of those classes.

masatake · 2014-10-03T09:16:24Z

So in the 2nd pass, cross reference generated in the 1st stage can be used.
So the facility for multi-pass in a single file is not enough. I think I understand your intent.

It looks big challenge for me.

The biggest question is how the cross-reference data generated by the 1st
pass to the 2nd pass. If you have ideas could you show me a pseudo command
line?

Something like this?

(1st pass)$ ./ctags -o tags-1st-pass input-files....
(2nd pass)$ ./ctags -i tags-1st-pass -o final-tags input-files....

vhda · 2014-10-03T09:28:31Z

Being honest, up until now I've been basically focused on the Verilog parser, so I really do not have any ideas on how to implement this. As such, I was looking for some feedback from the community here :)

I was looking more for something like:

$ ./ctags -R --enable-object-detection input-files

The argument is a bit too big, but is just for demonstration purposes.

Internally I was thinking about having the parser register a list of kinds that can be used to declare variables. In most languages it should be something like "class", "typedef", etc. After the first parse, ctags replaces its keyword hash table with the class-like tags and runs a new parse using the new table.

From a parser point of view, it would only be necessary to add a new list of kinds in the parserDefinition, such that any parser that does not have that definition would not support the 2nd pass and exit cleanly. This way each parser maintainer could gradually implement the support of this feature for any corner-case situations and add relevant test cases.

Update: the parser would need to define to which kind the tag would be used. For example, it could be something like:

class -> o
typedef -> v

or even merge everything in the "variable" kind like:

class, typedef -> v

masatake · 2014-10-05T04:55:12Z

I would like to use wiki to research this area with you.
Please, wait for implementing preload feature first.

vhda · 2014-10-06T09:33:24Z

Don't worry. I'm using my free time to improve the Verilog parser in order to later include SystemVerilog support. I'm also working on an omni completion script in vim.
It will be several weeks before I can look at this issue in detail.

masatake · 2014-10-08T18:18:38Z

I compiled all docuemnts I wrote into hacking guide. I will write internal of ctags next.
It will be the base of this discussion.

masatake · 2015-10-22T14:59:11Z

... I would like to hear your idea more with an example.

input:

class Foo {
};
Foo bar;

In the first pass Foo is captured as a tag of class kind.
If I dump the state as tags file it will be:

Foo input /^class Foo {$/"; kind:class

In the second pass bar is captured as a tag of ...what?
Do you mean the kind of the tag is "Foo"?

Foo input /^class Foo{$/"; kind:class
bar input /^Foo bar;$/:"; kind:class:Foo

I'm sorry but I need an example. input and expected tags pairs are very helpful for me.

If we introduced reference field tags file for the input will be...

Foo input /^class Foo{$/"; kind:class
Foo input /^Foo bar;$/:"; ref:???
bar input /^Foo bar;$/:l kind:class:Foo

Multiple pass for multiple files are so powerfull like linker of C language.
But it needs too many work and may change the definition of ctags program itself.
How about multiple pass for single file? Even single file, it is still very interesting.

vhda · 2015-10-22T16:47:28Z

Let try to pass along my ideas.

The first pass only identifies a subset of the supported kinds. In a object oriented language this subset would typically be "class" and/or "typedef".
In the second pass we identify all kinds, including the kinds identified in the first pass.
- Each parser will define a conversion table for the special subset. E.g.: class->object; typedef->variable, which would require the existence of "object" and "variable" kinds.
- The conversion can be done to an existing kind.

So, referring to your example, I would expect something like:

Foo input /^class Foo{$/"; kind:class
bar input /^Foo bar;$/:"; kind:object type:class:Foo

Where "type" would be a special extended attribute. Don't know if we can reuse any existing attribute for that purpose.

Typically classes are defined in different files, so I'm not sure this is really useful in a single file. But we should definitely support that possibility, because many (most?) languages do not enforce the requirement of having a single class defined per file.

masatake · 2015-10-27T05:46:57Z

I think we can introduce "reference tag" and multiple-input-file multi-path (mm) parser separately.
As @shigio shows the concept "reference tag" can be introduced without introducing mm parser.
mm parser is useful for improving the quality of capturing reference tags. However, it is still useful
ordinary definition tags. Actually bar in your example is a definition tag.

We don't have enough knowledge about how to capturing reference tags. However, we can start from extending tags format: introducing ref: field. Maybe single-input-file multi-path (sm) parser may be
useful to improve the quality of capturing reference tags. Only a few parser using sm facility of ctags.
During expanding the area using sm parser, we will know what kind of features are needed in the cork.
mm parser will come next.

masatake · 2015-10-27T05:58:36Z

% cat /tmp/foo.c
struct foo bar;
% ./ctags --fields=+t -o - /tmp/foo.c
bar /tmp/foo.c  /^struct foo bar;$/;"   v   typeref:struct:foo

typeref field is already avaiable. mm parser can used as facility for improving the quality of typeref fields in languages.

vhda · 2017-06-28T14:39:51Z

Replying to #1488 (comment) :

This is what would be the ideal implementation, IMHO:

ctags parses a.sv and b.sv, and adds container types found as keywords to language's keywordAssoc. This is pass0.
ctags parses a.sv and b.sv, and emits tags file. This is pass1.

masatake · 2017-06-28T18:17:10Z

I found a good way to implement an infrustructure for multiple-input-file multi-path (mm) parser WITHOUT intermediate file. Newly designed barrel API inspired from cork API is a part of mm API. Surprisingly it is not difficult to implement.

ctags parses a.sv and b.sv, and adds container types found as keywords to language's keywordAssoc. This is pass0.
ctags parses a.sv and b.sv, and emits tags file. This is pass1.

I see. I would like you to make tags for the container types found by the parser with marking "putting it to barrel" in the pass0.

In the pass1, you can access tags in the barrel. The barrel of tags are shared parsers. However, about SystemVerilog, only tags of container types(class kind for example) are in the barrel. Therefore you can
build the keyword table at the first of pass1.

This will be quite powerfull API...There will be many applications. But I myself will just provide API till 6.0.

I will not make you wait long time.

Ignore /sample/scan.

vhda added the Core part label Oct 2, 2014

masatake added this to the Feature plan milestone Oct 4, 2014

masatake mentioned this issue Oct 16, 2014

introduce sub parser #64

Closed

masatake mentioned this issue Feb 3, 2015

Does ctags have multiline regex support? #219

Closed

vhda mentioned this issue Jul 11, 2015

RFC: Daemon mode #423

Open

vhda mentioned this issue Sep 24, 2015

matlab: Parse variables in Matlab cod #571

Merged

vhda mentioned this issue Oct 22, 2015

reftag in scheme (Was: adding reference tag function) #569

Open

vhda mentioned this issue Nov 20, 2015

Add reference tag #680

Merged

vhda mentioned this issue Apr 5, 2016

Porting memory I/O from Geany #863

Closed

vhda mentioned this issue Apr 15, 2016

C++ Declarations/References? #651

Open

masatake mentioned this issue Sep 2, 2016

Python: detect class attributes #1108

Open

masatake mentioned this issue Sep 12, 2016

cxx: incorrect huge typeref values #1120

Open

vhda mentioned this issue Jun 28, 2017

SystemVerilog: variables/class members of custom type not captured #1488

Closed

masatake self-assigned this Jun 28, 2017

masatake linked a pull request Jun 29, 2017 that will close this issue

[RFC] Multi pass parsing over multi source files #1495

Open

masatake mentioned this issue Mar 18, 2018

Elm: Add support for Elm-Lang via optlib #1260

Merged

idodeclare mentioned this issue Aug 13, 2019

Improve verilog analyzer oracle/opengrok#2892

Open

masatake pushed a commit to masatake/ctags that referenced this issue Mar 12, 2020

Merge pull request universal-ctags#80 from imasahiro/ignore

1805c8f

Ignore /sample/scan.

masatake mentioned this issue Apr 5, 2020

[systemverilog] parser mistakes typedef for a module's port #2413

Closed

masatake removed the Main part label Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dual-pass capability #80

Add dual-pass capability #80

vhda commented Oct 2, 2014

masatake commented Oct 3, 2014

vhda commented Oct 3, 2014

masatake commented Oct 3, 2014

vhda commented Oct 3, 2014

masatake commented Oct 3, 2014

vhda commented Oct 3, 2014

masatake commented Oct 5, 2014

vhda commented Oct 6, 2014

masatake commented Oct 8, 2014

masatake commented Oct 22, 2015

vhda commented Oct 22, 2015

masatake commented Oct 27, 2015

masatake commented Oct 27, 2015

vhda commented Jun 28, 2017

masatake commented Jun 28, 2017

Add dual-pass capability #80

Add dual-pass capability #80

Comments

vhda commented Oct 2, 2014

masatake commented Oct 3, 2014

vhda commented Oct 3, 2014

masatake commented Oct 3, 2014

vhda commented Oct 3, 2014

masatake commented Oct 3, 2014

vhda commented Oct 3, 2014

masatake commented Oct 5, 2014

vhda commented Oct 6, 2014

masatake commented Oct 8, 2014

masatake commented Oct 22, 2015

vhda commented Oct 22, 2015

masatake commented Oct 27, 2015

masatake commented Oct 27, 2015

vhda commented Jun 28, 2017

masatake commented Jun 28, 2017