Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elm: Add support for Elm-Lang via optlib #1260

Merged
merged 9 commits into from
Jan 31, 2017

Conversation

bitterjug
Copy link
Contributor

With support and encouragement from @masatake I've added my Elm optlib and the generated c compiler. Our discussion is on my other repo.

I have added what looks like the right things in the win32 directory but I don't think I can test that.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.008%) to 84.971% when pulling 5755dbe on bitterjug:elm into df33542 on universal-ctags:master.

@masatake
Copy link
Member

masatake commented Jan 3, 2017

Thank you for making this pull request. This is the first pull request about optlib made by other than I!

As appveyor ci tells us, building and testing on win32 are done well.

Just a question. You made the optlib "public domain". Is this o.k. As I wrote in bitterjug/vim-tagbar-ctags-elm#1, I can add --copyright= option.

Next one is important.

First of all could you read http://docs.ctags.io/en/latest/news.html#reference-tags ?

Elm parser captures x in import x.
However, I think it should not. Here x is not defined. x is defined somewhere other places.
import x just refers it. So the x should be captured as a reference tag. The bad news is that
I have not implemented reference tag capturing in regex based (optlib based) parser yet.

Elm parser captures a as b in import a as b.
However, I think it should capture b only because b is newly defined name in this file.
In other hand a is not defined here. As I wrote abouve, it is defined somewhere. So a should be
captured as a reference tag. Python parser can do so:

[yamato@master]~/var/ctags-github% cat /tmp/foo.py 
import x
import a as b
[yamato@master]~/var/ctags-github% u-ctags -o - --extra=r --fields=r /tmp/foo.py 
a	/tmp/foo.py	/^import a as b$/;"	role:indirectly-imported
b	/tmp/foo.py	/^import a as b$/
x	/tmp/foo.py	/^import x$/;"	role:imported

I don't know port but I think some part of my comment here can be applied to it.

@bitterjug
Copy link
Contributor Author

Hi @masatake

  • I made the optlib public domain in vim-tagbar-ctags-elm because I didn't know where it would end up. But my plan was always that it would find its way either here or elm-vim. So, my motive was that there should be no licence clashes and I thought public domain would solve that. However, I'm more concerned that the upstream integration should be easy than that it must have a permissive licence. I'd be delighted for it to be released under GPL as the others here. Let's do that!

  • I take your point about reference tags. And, I'm not sure what action I should take. I added the import tags because I found it useful, in combination with vim-tagbar, to comprehend my source files. Using tagbar leads me to think about one source file at a time, not the whole project. In a single source file it is useful to understand how names came into scope whether defined or not. But that is not the intention of ctags.

    Since I cannot currently implement the import patterns as reference tags, I guess I should remove them from this PR. (I can, optionally, add them in for my own pleasure as local optlib).

  • Your last point about capturing b in import a as b makes sense only if reference tags are available. So I propose to remove the import matching options (and, maybe, add them back in in as optlib in vim-tagbar-ctags-elm).

  • I think ports are okay. They are the way Elm interoperates with Javascript. The name is defined in Elm and referred to in Javascript, whether input or output ports.

bitterjug added a commit to bitterjug/ctags that referenced this pull request Jan 3, 2017
As discussed [here](universal-ctags#1260 (comment))
we ought not be capturing imports (until optlib supports reference
tags).
@coveralls
Copy link

Coverage Status

Coverage increased (+0.008%) to 84.971% when pulling 65baebb on bitterjug:elm into df33542 on universal-ctags:master.

@masatake
Copy link
Member

masatake commented Jan 4, 2017

I made the optlib public domain in vim-tagbar-ctags-elm because I didn't know where it would end up. But my plan was always that it would find its way either here or elm-vim. So, my motive was that there should be no licence clashes and I thought public domain would solve that. However, I'm more concerned that the upstream integration should be easy than that it must have a permissive licence. I'd be delighted for it to be released under GPL as the others here. Let's do that!

I see. About ctags main/ and parsers/ I would like to keep GPL. However, the other area, especially about optlib, the other license is o.k. Anyway I will add --copyright-<LANG>=... option if you want.
My goal here is not just integrating your elm parser. I would like to improve and extend ctags's infrstructure for optlib developers.

I take your point about reference tags. And, I'm not sure what action I should take. I added the import tags because I found it useful, in combination with vim-tagbar, to comprehend my source files. Using tagbar leads me to think about one source file at a time, not the whole project. In a single source file it is useful to understand how names came into scope whether defined or not. But that is not the intention of ctags.

Since I cannot currently implement the import patterns as reference tags, I guess I should remove them from this PR. (I can, optionally, add them in for my own pleasure as local optlib).

I see. Thank you for understanding.
I made a pull request for your branch. In the future it is nice that vim-tagbar can handle reference tags.

Your last point about capturing b in import a as b makes sense only if reference tags are available. So I propose to remove the import matching options (and, maybe, add them back in in as optlib in vim-tagbar-ctags-elm).

Tagging Y in import X as Y is still meaningful. Please, see my commit log in my pull request.

I think ports are okay. They are the way Elm interoperates with Javascript. The name is defined in Elm and referred to in Javascript, whether input or output ports.

I see.

@bitterjug
Copy link
Contributor Author

Thanks @masatake I've merged your suggestions for import, and gave a bit of a more meaningful name.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 85.061% when pulling c9cc975 on bitterjug:elm into df33542 on universal-ctags:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 85.061% when pulling 2f721d1 on bitterjug:elm into df33542 on universal-ctags:master.

@masatake
Copy link
Member

masatake commented Jan 6, 2017

@bitterjug, thank you for merging and pushing the changes.

I'm not sure following my comment is important. People may think too detailed thing.
However, the designing kinds is the most important stage in a parser design.

I will write about the kind name of the last commit. My choice was unknown. As far as I can remember the naming was never supported by other ctags developers. However, I had considered again and again
when choosing this name.

To explain the idea behind unknown, I would like to simple one. My native language is C, so I would like
to use simple C program:

int main (void)
{
    ...
}

In definition tag, ctags captures a newly introduced name specifying a language object.
The kind should represent the kind of the language object, not the name.
This is the fundamental rule. Of course, there are so many exceptions.
In addition there is a technical limitation: ctags doesn't consider the other files than the current input file.
Therefore in some cases, ctags cannot solve the language object for a given name.

Here "main" is a newly introduced name and a function(specified with name "main") is the language object. Of course, the language object is a function.

Let's think about kind; what is "main"? Based on the fundamental rule, the main has "function" kind.

(
One may say "main" is a name of function.
The other may say "main" is a function.

I think in ctags we should say "main" is a function. All things ctags deals with are names. So
" "main" is a name of function. " is verbose.)

import a.b as c

"c" is a newly introduced name. "a" may defined somewhere. Just referring here. "b" is defined in "a". Just referring here, too. "b" may specify a language object. "c" specifies a language object specified by "b". However, the detail of language object is unknown from ctags because it is not defined in the input file. Therefore my choice was "unknown".

For the reason I introduced "unknown" kind in the python like:

	{true, 'x', "unknown",   "name referring a class/variable/function/module defined in other module",
	 .referenceOnly = false, ATTACH_ROLES(PythonUnknownRoles)},

I'm bad in English. So I wonder "unknown" is good name or not. However, the idea is taken from
bonobo. https://developer.gnome.org/libbonobo/stable/libbonobo-bonobo-object.html#bonobo-unknown-img

Your choice is "rename". "c" is a name renamed from a.b. The naming is not about language object specified with "c". If elm is rather macro language like Cpreprocessor or m4, it is meaningful.
In such macro language, a name itself is a language object. Though I don't learn elm, I guess elm is not a macro language.

For above consideration. I think unknown is a better name.

However, in following two criteria, "rename" is still acceptable.

If it is really good for vim user. Many editors can read tags file. However, I guess vim is the primary client tool of ctags. I'm monitoring who puts "star" on ctags. It seems that most of them are vim users.
Currently u-ctags project has 3 active developers. As far as I know none uses vim. Two of them writes their own ide:-P. Being neutral from editors may be good for ctags. However, serving vim users is not bad things, of course.

If you, a parser maintainer, are happy with the kind name. ctags was mostly dying project. (https://sourceforge.net/p/ctags/mailman/ctags-devel/?viewmonth=201310&viewday=15)
As you can see in u-ctags project description, I thought about u-ctags being maintained for long time is more import than consistent tags output. (This is not applicable to the other projects. Special to ctags.)
In the other words u-ctags projects needs more parser maintainers.

@bitterjug
Copy link
Contributor Author

I see where you're coming from. And I appreciate your appetite for careful design!

I will try and rephrase your point to check I understand properly: you chose the name "Unknown" because it is not possible to know what type of language object has been imported and renamed.

When I saw this, I thought you meant that you didn't know what it was. But you're saying that Ctags doesn't know what it is. However, I still feel uncomfortable with "Unknown", it puts the emphasis on the wrong thing. I've had a look at the Elm parser and read some docs and notes on imports. The other word I have seen used to describe the name that follows as is alias, but this is a much better name for type aliases.

But it turns out that in Elm the thing that comes before as is always a module. If you want to import the objects defined in that module, you use an extra bit of syntax:

    import Module.Submodule exposing (foo, bar, Bas)

You can also say:

    import Module as Alias exposing (foo) 

Which means both Alias and foo are introduced into scope in the current module. And you can use Alias to refer to Alias.bar, Alias.Bas, etc.

So, we do know something about the language object being referenced by a local name: it is a module. What kind of tags should this create? Something like:

import Module as Alias    
    Module = (kind: module, role:indirectly-imported),
    Alias  = (kind:module, role:namespace)

And how close to this can we get with a regexp parser?

@masatake
Copy link
Member

masatake commented Jan 7, 2017

I will try and rephrase your point to check I understand properly: you chose the name "Unknown" because it is not possible to know what type of language object has been imported and renamed.

Yes, you understand my intent properly!

import Module.Submodule exposing (foo, bar, Bas)

As you may know foo, bar and Bas cannot be captured well with regex parser.
So I will skip this.

import Module as Alias    
    Module = (kind: module, role:indirectly-imported),
    Alias  = (kind:module, role:namespace)

As far as hearing about elm from you, kind for Alias should be module. Your choice looks good for me.
The role for Alias may not be needed.
Here my understanding is that Alias is newly introduced name, a name newly added to the current name space. It means Alias should be captured as a definition tag, not a reference tag.
role is attached to a tag if it is a reference tag. A role represents how the name is referenced.
So Alias doesn't need a role.

I designed roles and kinds for import in python as following (taken from python.c):

/* Roles related to `import'
 * ==========================
 * import X              X = (kind:module, role:imported)
 *
 * import X as Y         X = (kind:module, role:indirectly-imported),
 *                       Y = (kind:namespace, [nameref:X])
 *                       ------------------------------------------------
 *                       Don't confuse with namespace role of module kind.
 *
 * from X import *       X = (kind:module,  role:namespace)
 *
 * from X import Y       X = (kind:module,  role:namespace),
 *                       Y = (kind:unknown, role:imported, [scope:X])
 *
 * from X import Y as Z  X = (kind:module,  role:namespace),
 *                       Y = (kind:unknown, role:indirectly-imported, [scope:X])
 *                       Z = (kind:unknown, [nameref:X.Y]) */

for import X as Y in python I chose "namespace" for Y's kind. Bacause X can be a module or dictionary. The "namespace" is generalized name for them.

In Elm, X of import X as Y is a module. Then Y specfies the module.
nameref field in the comment is not implemented yet.

Updated kind/role design is:

import Module as Alias    
    Module = (kind: module, role:indirectly-imported),
    Alias  = (kind:module)

Let's think more for future.

 import X exposing (Y)
    X = (kind:module, role:namespace)
    Y = (kind:unknown,role:imported)

This is similar to "from A import B" in python. With the statement, can you use X?
If the role of X should be indirectly-imported or something. I guess X cannot be used; it is referred just for importing Y. Therefore I chose namespace for role.

A bit complicated even for me:-P. However, if we carefully design the kinds/roles, someone can write a very interesting and powerful client tool for code navigation. Helping the one to write such tool is my dream:-) The bad dream is that the github account of the one is @masatake:-(

@bitterjug
Copy link
Contributor Author

So, are you saying that the kind for Y in

import X as Y

should be module the same as for Z in

module Z

which introduces a module definition at the top of a file (each file defines a module in Elm)? Perhaps the only difference between them would be that ideally module Z would introduce a scope, but import X as Y would not?

  1. Within a module that contains import X as Y, you can use the name Y as a namespace to access the objects exposed by module X: foo = Y.z. Which is why I suggested it had namespace role. ( I looked at python.c source for examples). Do you still think it should be just kind:module?

  2. You also asked about import X exposing (y, z). As I said above this introduces y and z as names in the current module referring to X.y and X.z respectively. And it also introduces the name X into the current module referring to the module called X so you can refer to its exposed objects indirectly: X.foo, etc.

  3. So, in a sense, import B and import A as B have the same effect from ctags perspective: they both create a local name B in the current module that refers to another module and may be used to access names withing that other module with dot notation. The only difference is that in import A as B the B might be a new name, whereas in import B we assume there is another module which begins module B. Perhaps in the future we should capture all forms of import and mark their modules as modules?

Returning to the present. What shall I do with the Elm optlib regex for import X as Y? Are you suggesting the following?:

--regex-elm=/^(port[[:blank:]]+)?module[[:blank:]]+([[:upper:]][[:alnum:]_.]*)/\2/m,module,Module/{scope=push}{exclusive}
--regex-elm=/^import[[:blank:]]+[[:alnum:]_.]+[[:blank:]]+as[[:blank:]]+([[:alnum:]]+)/\1/m,module,Renamed imported module/{scope=clear}{exclusive}

@masatake
Copy link
Member

masatake commented Jan 8, 2017

Thank you for explanation. I understand Z and Y are very different in the aspect.

Your naming, "Renamed imported module" tells the intent well. However, it is a bit long.
Here is my idea(just mixing your ideas):

n,namespace,Renamed imported module

(In this case namespace is not a role. It is kind.)

BTW

As far as I can remember ctags rejects following patter definitions:

--regex-elm=/^(port[[:blank:]]+)?module[[:blank:]]+([[:upper:]][[:alnum:]_.]*)/\2/m,module,Module/{scope=push}{exclusive}
--regex-elm=/^import[[:blank:]]+[[:alnum:]_.]+[[:blank:]]+as[[:blank:]]+([[:alnum:]]+)/\1/m,module,Renamed imported module/{scope=clear}{exclusive}

The descriptions for m are different in the two patterns. u-ctags don't allow it. You may want to define sub-kind alike things by giving different descriptions. In u-ctags, there is no such concept. In such case, you should define a new kind. e-ctags allows this technique. And this technique looks popular. I found this technique was used in found But I think it is bad idea. See #1228 .

For long time I have wondered why we have to write the same KIND LETTER,KIND LONG NAME,KIND DESCRIPTION for
`--regex-=' option repeatedly.

In planed `--regex2-=' I would like to improve this area.

--langdef=Foo
--kinddef-Foo=m,module,Module
--roledef-Foo=m:imported,imported module
--kinddef-Foo=n,namespace, Renamed imported module
--regex2-Foo=/^module[[:blank:]]+([[:alnum:]_.]*)/{name=\1,type=def,kind=m}
--regex2-Foo=/^import[[:blank:]]+([[:alnum:]_.]*)$/{name=\1,type=ref,kind=m,role=imported}
--regex2-Foo=/^import[[:blank:]]+([[:alnum:]_.]*) as ([[:alnum:]_.]*)/{name=\1,type=ref,kind=m,role=imported},{name=\2,type=def,kind=n,modref=\1}

@masatake
Copy link
Member

masatake commented Jan 8, 2017

(I have to summarize this discussion somewhere docs/
kinds-roles svg
)

bitterjug and others added 8 commits January 22, 2017 18:54
Add --sort=no to command line when running the test case.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
As discussed [here](universal-ctags#1260 (comment))
we ought not be capturing imports (until optlib supports reference
tags).
Though X is not captured as a reference tag, tagging Y is still
useful. A user of an editor can jump to "import X as Y" line
from where Y is referred. The user can know Y is alternative
name of X.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.007%) to 85.129% when pulling db6adc6 on bitterjug:elm into 9ea78ad on universal-ctags:master.

Following [discussion](universal-ctags#1260)
with @masatake we treat imported renamed modules as namespaces
with a suitable description.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.007%) to 85.129% when pulling 02eb917 on bitterjug:elm into 9ea78ad on universal-ctags:master.

@masatake
Copy link
Member

Thank you for updating.
So the last question is are you interested in maintaining Elm parser as a member of Universal-ctags organization?

As I wrote before, I'm looking for people who wants to improve the optlib and regex facility.

If you have interest, I will make an invitation. Cold you receive it and merge this pull request by yourself?
If you improve elm parser in the future, make a pull request and merge by yourself.

If not, I will merge this pull request. If you improve elm parser, make a pull request in the future. I or someone in this organization may merge it.

@bitterjug
Copy link
Contributor Author

Yes please. I'd be delighted to maintain the elm optlib.
And if there are changes in the world of opttlib and regex maybe we can incorporate them into the elm optlib. I already feel like I learned a lot about the bigger picture of ctags from this conversation.

@masatake
Copy link
Member

@bitterjug, welcome to Universal-ctags project.

@bitterjug bitterjug merged commit e2a1691 into universal-ctags:master Jan 31, 2017
@bitterjug
Copy link
Contributor Author

@masatake thanks for all your help and encouragement.

@masatake
Copy link
Member

masatake commented Mar 7, 2018

I am testing my prototype for the feature recoding reference tag from a parser defined with --regex-<LANG>. The first target is elm parser.

$ ./ctags -o - --extras=+r --fields=+rE \
    --roledef-elm=m:imported,"imported module" \
     --regex-elm='/^import *([a-zA-Z0-9]+)/\1/m/{_role=imported}' \
     --kinds-elm=m Units/simple-elm.d/input.elm
Json	Units/simple-elm.d/input.elm	/^import Json.Encode as Je$/;"	m	role:imported	extras:reference
List	Units/simple-elm.d/input.elm	/^import List$/;"	m	role:imported	extras:reference
Main	Units/simple-elm.d/input.elm	/^port module Main exposing (..)$/;"	m
Maybe	Units/simple-elm.d/input.elm	/^import Maybe exposing (withDefault)$/;"	m	role:imported	extras:reference

What I did is adding new option --roledef-<LANG> and new regex flag _role.
In the above example, "List" in "import List" is captured as a module of "imported" role.

With this new feature puppetParser can capture included manifest as a reference tag.
rodjek/vim-puppet#84 (comment)

@masatake
Copy link
Member

More work may be needed, but soon I can be ready to improve elm parser.
What I have to do at least before working on elm parser are writing the web document and improve optlib2c translator to support the new option, --_roledef-<LANG>.

@bitterjug
Copy link
Contributor Author

Hi @masatake
I see you've been busy here. I have looked over your notes and tried to understand what's going on. I think you added support for reference tags in regexp parsers which we talked about back in January 2017. So we're going to be able to treat importing a module as a reference to that module with a role of import or something similar.

Can you explain how ctags identifies the file that a language feature is defined in? If there are more than one thing with the same name in different files, does it somehow automatically scope it to the modulie file it was defined in, or would we have to use scoping rules on the module definition?

@masatake
Copy link
Member

I see you've been busy here. I have looked over your notes and tried to understand what's going on. I think you added support for reference tags in regexp parsers which we talked about back in January 2017. So we're going to be able to treat importing a module as a reference to that module with a role of import or something similar.

Yes. I think calling it will be "imported".

Can you explain how ctags identifies the file that a language feature is defined in? If there are more than one thing with the same name in different files, does it somehow automatically scope it to the modulie file it was defined in, or would we have to use scoping rules on the module definition?

I'm not sure I understand what you wrote correctly. Could you fix my example or ahow an alternative example if I misunderstand?

The answer may be no. Let's think an imaginary language X that uses .x as a file extension.

a0.x

definePackage A:
...

a1.x

definePackge A:

A is defined in both a0.x and a1.x.

b.x

use package A;

My understanding of your questions about ctags parsing b.x is "can ctags know in which
file, a0.x or a1.x, A is defined?" and/or "can ctags report a0.x or a1.x as the file where A
is defined in tags file format?".

ctags cannot do it.
If the processor of language X is implemented as a compiler, inspecting a build-script to known
which source files are linked together is one of approach. Parsing all source files twice, and using the result of the first pass in the second pass is another approach (as discussed in #80).

Even if we take either approach, ctags cannot make perfect results.

@bitterjug
Copy link
Contributor Author

Apologies; I was not clear enough. Your example deals with the situation where >1 file define modules with the same name. That won't actually happen in a well set up elm project. I wanted to ask a slightly different question that arises from me starting to think more about ctags working on a whole project not just on a single file.

In the context of a project I should be able to navigate to the place where a language element is defined, no matter which file it was in. So I'm thinking about the situation where file One.elm defines module One and a function foo within that module

module One exposing (..)

foo : Int
foo = 
    1

Now I import One into Two.

module Two exposing (..)

import One

bar : Int
bar = 
    One.foo

foo : Int
foo = 
    2

Using my current implementation, ctags captures:

One	One.elm	/^module One exposing (..)$/;"	m
Two	Two.elm	/^module Two exposing (..)$/;"	m
bar	Two.elm	/^bar =$/;"	f
foo	One.elm	/^foo =$/;"	f
foo	Two.elm	/^foo =$/;"	f

Which is to say it does not distinguish One.foo and Two.foo, although i knows which files they were defined in. Actually I think this answers the question I asked above, but reveals that I did not ask the correct question. Can I improve my current regext parser so that the tag names could include the module? Something like:

One	One.elm	/^module One exposing (..)$/;"	m
Two	Two.elm	/^module Two exposing (..)$/;"	m
Two.bar	Two.elm	/^bar =$/;"	f
One.foo	One.elm	/^foo =$/;"	f
Two.foo	Two.elm	/^foo =$/;"	f

Should the module regexps have {scope=push}? Would that be sufficient?

@bitterjug
Copy link
Contributor Author

bitterjug commented Mar 18, 2018

We do have {scope=push} on module:

--regex-elm=/^(port[[:blank:]]+)?module[[:blank:]]+([[:upper:]][[:alnum:]_.]*)/\2/m/{scope=push}{exclusive}

But it gets cleared on import, etc.

--regex-elm=/^import[[:blank:]]+[[:alnum:]_.]+[[:blank:]]+as[[:blank:]]+([[:alnum:]]+)/\1/n/{scope=clear}{exclusive}

Wow! working with tagbar gave me completely the wrong mindset. Okay, this really needs rethinking.

Mind you, one day I expect this stuff will be done by a custom parser based on the language grammar, like happens for Haskell.

@masatake
Copy link
Member

From the view of ctags ideal output may be:

One	One.elm	/^module One exposing (..)$/;"	m
Two	Two.elm	/^module Two exposing (..)$/;"	m
bar	Two.elm	/^bar =$/;"	f	module:One
foo	One.elm	/^foo =$/;"	f	module:One
foo	Two.elm	/^foo =$/;"	f	module:Two

One.foo, One.bar, and Two.bar can be derived from the above output.
I think we don't need clear the scope at "import".

@masatake
Copy link
Member

[yamato@master]~/var/ctags-github% cat input.elm
cat input.elm
module Two exposing (..)

import One

bar : Int
bar =
    One.foo

foo : Int
foo =
    2

[yamato@master]~/var/ctags-github% ./ctags --extras=+r --fields=+sr -o - ~/var/ctags-github/input.elm
./ctags --extras=+r --fields=+sr -o - ~/var/ctags-github/input.elm
One	/home/yamato/var/ctags-github/input.elm	/^import One$/;"	m	roles:imported
Two	/home/yamato/var/ctags-github/input.elm	/^module Two exposing (..)$/;"	m	roles:def
bar	/home/yamato/var/ctags-github/input.elm	/^bar =$/;"	f	module:Two	roles:def
foo	/home/yamato/var/ctags-github/input.elm	/^foo =$/;"	f	function:Two.bar	roles:def

foo's scope is broken. bar must be poped at the end of bar's definition. Howerver, no way to detect it in line oriented parser. See #1577.

Anyway, now a role like "imported" can be defined in an optlib parser. I will push updated version of optlib2c translator in soon.

@masatake
Copy link
Member

masatake commented Mar 20, 2018

If we can assume function definitions are never nested, an imaginary regex "popIfTopIs,function" may allow to us what you want.

Think following input:

module Two exposing (..)

import One

bar : Int
bar = 
    One.foo

foo : Int
foo = 
    2

With following imaginary option,

--regex-elm=/([[:lower:]_][[:alnum:]_]*)[^=]*=$/\1/f/{scope=popIfTopIs,function}{scope=push}

ctags may work as following:

Push "Two".
Push "bar"
Push "foo" after popping "bar".
So "bar" and "foo" can have "Two" as scope. (As far as I can remember correctly, "scope=push"
implies "scope=ref".

popIfTopIs makes sense only if the target language doesn't allow defining nested function.
elm allows defining nested function with using let.

So I'm not sure popIfTopIs is usefu for elm.
If you request, I will make a prototype for popIfTopIs flag.
Even popIfTopIs is not useful for elm, but in my experience, it can be useful for writing a parser for simpler languages.

@masatake
Copy link
Member

masatake commented Mar 21, 2018

I merged following change unintentionally without your ack.

5c9eec2
2625d89#diff-f02fe52f631042b9ac741d9c5a4d1c0f (elm.ctags)

The change for elm.ctags in the later one should go to the former one. I took a mistake.

If elm parser in the latest HEAD doesn't work as you expected, please, let me know. (Please, one a new issue.)

@bitterjug
Copy link
Contributor Author

I'm just catching up on this thread.
Yes Elm allows nested functions with let. But only inside a let -- they are not visible at module level so no module can export them.

  1. There's an argument for not capturing functions defined in let blocks, but it's a weak one. In a large function it would be useful to use tags to locate functions defined in the enclosing let block
  2. A function need not push a scope unless it has a let block since only let blocks introduce named entities -- oh that's just not true! Lambda functions have named parameters. Neither let blocks nor lambdas have names. A function may contain many let blocks too, including nested ones. How would we tag those?

Here's a pathological example:

foo =
    (\ bar bas -> bar + bas)
   (let 
      n1 =  1
    in 
    n1) -- bar
   (let 
      n1 = 1 are not named
      n2 = 2
      n3 =
        ( let 
          n2 = 6 -- shadows n2
          in 
          n1 + n2 -- 1 + 6
      in
      n2 + n3 -- 2 + 7
      ) -- bas

Does it even make sense to try and tag the two n1 bindings here (in unnamed sibling let blocks) or the n2 ones that shadow one another? I think not. We can't detect the difference between the let blocks which are siblings at the same level and the ones that are nested. That is a matter of white space which we're never going to adequately track using optlib. So I think the Elm optlib should aim at being less ambitious. I think it can still be useful. At the moment, however, I'm not yet sure what to aim for.
Because we're not seeing those indentation levels, we don't distinguish nested and sibling function definitions either (except at the top level where we can match exactly no leading white space) so I don't think popIfTopIs is going to be useful for Elm.

@masatake
Copy link
Member

I read your comment. Temporary I would like to ignore nested functions and anonymous functions.

I would like to focus on the following input you showed:

module Two exposing (..)

import One

bar : Int
bar = 
    One.foo

foo : Int
foo = 
    2

When thinking about scope field, what we want to get for the input is (with --fields=+K option):

Two	input.elm	/^module Two exposing (..)$/;"	module
bar	input.elm	/^bar =$/;"	function	module:Two
foo	input.elm	/^foo =$/;"	function	module:Two

Am I correct?
(Capturing Two.bar and/or Two.foo can are nice but it is advanced topic.
I tnink I can ignore "import One" here. It is about reference tag. However, I focus on
scope field of tag entries.
)

Currently what we can get is

Two	input.elm	/^module Two exposing (..)$/;"	module
bar	input.elm	/^bar =$/;"	function
foo	input.elm	/^foo =$/;"	function

No scope information is attached to "bar" and "foo'.
It looks a bug for me.

@bitterjug
Copy link
Contributor Author

Hi! Thanks for this.
Since writing my last comment, I have been thinking about the same approach: focusing on (and properly setting the scope of) language elements that can be exported and used in other modules. Because that's the big win for ctags. Capturing things that can only be used in the same module is a nice-to-have.

In the case of Elm functions that means only the ones defined at the top level of the module (no indentation). And we could get the correct scope for those using popIfTopIs. Lets fix this before we look into ref tags.

@masatake
Copy link
Member

If we have to think about a module and functions under the module only. I think popIfTopIs is not needed.
(popIfTopIs is not implemented yet.) What we need is only scope=set (when finding a module) and
scope=ref (when finding a function). Am I wrong?

@bitterjug
Copy link
Contributor Author

bitterjug commented Mar 31, 2018 via email

@masatake
Copy link
Member

O.k.

I have one more question.

Do you want full querified tag output like:

Two	input.elm	/^module Two exposing (..)$/;"	module
bar	input.elm	/^bar =$/;"	function	module:Two
Two.bar	input.elm	/^bar =$/;"	function	module:Two
foo	input.elm	/^foo =$/;"	function	module:Two
Two.foo	input.elm	/^foo =$/;"	function	module:Two

?

I have code for generating automatic full querified tag. I wonder I should add the code the repository or not.

@bitterjug
Copy link
Contributor Author

Now that ctags supports elm, I get the following errors when trying to use my development version of elm.ctags:

[:~/tmp] $ ctags --options=elm.ctags -f - Source.elm
ctags: Warning: Language "elm" already defined
ctags: Warning: the kind for letter `m' specified in "--kinddef-elm" option is already defined.
ctags: Warning: the kind for letter `n' specified in "--kinddef-elm" option is already defined.
ctags: Warning: the kind for letter `p' specified in "--kinddef-elm" option is already defined.
ctags: Warning: the kind for letter `t' specified in "--kinddef-elm" option is already defined.
ctags: Warning: the kind for letter `c' specified in "--kinddef-elm" option is already defined.
ctags: Warning: the kind for letter `a' specified in "--kinddef-elm" option is already defined.
ctags: Warning: the kind for letter `f' specified in "--kinddef-elm" option is already defined.
ctags: Warning: the role for name `imported' specified in "--_roledef-elm" option is already defined.

Is there a way to override built in language settings or should I be building my changes with make each time?

@bitterjug
Copy link
Contributor Author

Do you want full querified tag output like:

Two	input.elm	/^module Two exposing (..)$/;"	module
bar	input.elm	/^bar =$/;"	function	module:Two
Two.bar	input.elm	/^bar =$/;"	function	module:Two
foo	input.elm	/^foo =$/;"	function	module:Two
Two.foo	input.elm	/^foo =$/;"	function	module:Two

I'm sorry @masatake, I don't understand your question. I guess you mean "fully qualified" output. And from the example it looks like we're getting two matches for each function, one with just the function name and the other with Module.function.

I guess the question is which do I need in order to be able to navigate from a tag use to its definition if multiple modules are allowed to define top level functions with the same name?

using the source from above:

module Two exposing (..)

import One

bar : Int
bar = 
    One.foo

foo : Int
foo = 
    2

This config:

--langdef=elm
--map-elm=+.elm

--kinddef-elm=m,module,Module
--kinddef-elm=n,namespace,Renamed Imported Module
--kinddef-elm=p,port,Port
--kinddef-elm=t,type,Type Definition
--kinddef-elm=c,constructor,Type Constructor
--kinddef-elm=a,alias,Type Alias
--kinddef-elm=f,function,Functions
--_roledef-elm=m.imported,imported module

--regex-elm=/^(port[[:blank:]]+)?module[[:blank:]]+([[:upper:]][[:alnum:]_.]*)/\2/m/{scope=set}{exclusive}
--regex-elm=/^import[[:blank:]]+[[:alnum:]_.]+[[:blank:]]+as[[:blank:]]+([[:alnum:]]+)/\1/n/{exclusive}
{_role=imported}
--regex-elm=/^type +([[:upper:]][[:alnum:]_]*.*)/\1/t/{scope=ref}{exclusive}
--regex-elm=/^type[[:blank:]]+alias[[:blank:]]+([[:upper:]][[:alnum:]_]*[[:blank:][:alnum:]_]*)/\1/a/{scope=ref}{exclusive}
--regex-elm=/^([[:lower:]_][[:alnum:]_]*)[^=]*=$/\1/f/{scope=ref}

Produces the following:

Two     Test.elm        /^module Two exposing (..)$/;"  m
bar     Test.elm        /^bar =$/;"     f       module:Two
foo     Test.elm        /^foo =$/;"     f       module:Two

Trying to navigate with that I see that it doesn't distinguish One.foo from Two.foo. So maybe I do want fully qualified tags. What do other languages do?

@masatake
Copy link
Member

masatake commented Apr 1, 2018

If you can rebuild a ctags binary instantly, the cycle "edit optlib/elm.ctags and run make" is the best.
It will not take time to translate elm.ctags to elm.c, and compile elm.c.

If you don't want to do so in some reasons, disable the built-in elm parser, and use something temporary name for your new code. Let me show an example.

Here we will use elmX as the temporary name.

elmX.ctags:

# Disable elm parser.
--languages=-elm

# Define elmX parser.

--langdef=elmX
--map-elmX=+.elm

--kinddef-elmX=m,module,Module
--kinddef-elmX=n,namespace,Renamed Imported Module
--kinddef-elmX=p,port,Port
...

@masatake
Copy link
Member

masatake commented Apr 1, 2018

I'm sorry @masatake, I don't understand your question. I guess you mean "fully qualified" output.
I'm very sorry. It was a typo. You are correct. What I would like to say is "fully qualified."
I will call it FQ.

You understand what I would like to say well. Thank you.

...

Trying to navigate with that I see that it doesn't distinguish One.foo from Two.foo. So maybe I do want fully qualified tags. What do other languages do?

Some of the crafted parsers, written manually in C, emits fully qualified tags by itself.
Cxx parser and JavaScript parsers do so.

Some of the crafted parsers use AutomaticFQTag feature I developed to emits FQ output.
Such parsers have a responsibility to fill scope field of each tag properly.
The AutomaticFQTag feature generates FQ output from the information put at the scope fields.
Python parser does so. I wrote about this feature in http://docs.ctags.io/en/latest/internal.html?highlight=cork#output-tag-stream .

No optlib parser emits FQ tags because there is no way to do so.
I'm thinking about providing the AutomaticFQTag feature to optlib parsers.
The notation will be quite simple: --langdef=FOO{autoFQTag}. That's all.

However, the feature has a weakness; you cannot tune its behavior.
For example, you cannot choose separators between two names.
As far as I can remember correctly, php uses '' as a separator combines namespaces.
In other hand, it uses . for combining other things.
In C language level, the AutomaticFQTag feature provides the way to choose a separator.
However, it is a bit complicated, so I don't want to implement the way for optlib now.

Another example is about kinds of language objects.
From this discussion, I learn elm uses Module.function notation.
How about Module.type ? The AutomaticFQTag feature refers only scope information.
If a tag has scope information, the AutomaticFQTag feature uses it anyway.

--regex-elm=/^type +([[:upper:]][[:alnum:]_]*.*)/\1/t/{scope=ref}{exclusive}

This pattern causes that the AutomaticFQTag feature emits Module.type FQ tags.

If you can accept or you don't care the weakness, I will work on autoFQtag lang flag.

@bitterjug
Copy link
Contributor Author

Thanks again @masatake. I think I understand AutoFQTag now.

Before I answer this I want to sketch out some context. When I first wrote ctags options for Elm I wanted to use tagbar which shows tags in the current file, ordered by scope. I was used to using it with Python to get an overview of a file's contents. And Elm files are sometimes large. These days I use Elm at work and, increasingly, we're splitting our work into smaller files with fewer types and functions in them. As a result, I'm becoming more interested in being able to navigate to the file where a thing is defined. Tagbar gives an intra-file. But tags also offer useful inter-file navigation.

Inter-file tags in Elm

In Elm you can import a module (and optionally alias it) and then refer to Module.function or Module.type (or Alias.function, etc.). You can also import selected names into scope from a module and use the unqualified. And for all these -- where we're tracking things between modules.files -- it would make sense for the tag to be Module.name.

It is common for Elm modules to define things with the same name. Many modules will contain a type called Model and functions called view and update. So it would be very useful if these were tagged as Module.Model, Module.view, Module.update, etc. When using tags to navigate to the definition of a view function under the cursor, your Vim user is going to have to resolve the ambiguity somehow. Vim itself recommends using completion and fuzzy find commands also exist.

Intra-file tags in Elm

Each file contains exactly one module; and that may contain ports, types and functions. Function definitions can contain other function definitions inside anonymous let blocks, which limit their scope. Nested functions can be called only from within their scope; there's no way to export them, or to refer to them using dot notation.

There is another kind of language artefact that I sometimes thing it would be nice to see in Tagbar. Frequently functions contain large case statements that branch of the different cases of a (tagged) union type. I experimented with capturing those as tags but the result was never satisfactory. Although it is common to have one such case statement in a function, sometimes there may be many and even nested ones. And since the case statements themselves do not have names they do not really define useful scopes.

So perhaps it's not really very useful to try and capture these with tags. I took a quick look at what happens in Haskell which has a very similar language structure. There are several solutions including one built in to GHC itself. Lushtags targets Tagbar specifically. According to its tagbar config it does not use functions as scopes. I can't tell without installing it if nested function definitions get tagged at all.

I think Automatic FQ tagging would be a useful addition to an Elm optlib. (And one day Elm might gain native tagging similar to those available for Haskell)

@masatake
Copy link
Member

masatake commented Apr 1, 2018

Thank you. Impressive. Categorizing Inter-file tags and intra-file tags are vital in designing a parser I have recognized them implicitly, but I should recognize them and write them to ctags-optlib.7.rst.in. I will borrow your sentences in the comment when updating ctags-optlib.7.rst.in. I would like you to review the man page when I updated.

Inter-file tags

Capturing Inter-fie tags has higher priority. While reading your comment, I found Module.Module. I wonder ctags can handle nested Modules well. I assumed using "scope=set" for capturing a module. However, it a module has another module in its scope, "scope=push" may be needed. In such a case, I have to use "scope=pop" somewhere.

Intra-file tags

I would like to see an example of "case statements".
What I found in a result of web search is:

   case number of
    1 -> "one"
    2 -> "two"
    3 -> "three"
    4 -> "four"
    _ -> "many"

I wodner what kind of tags people want.

number foo.elm /^case number of$/;" kind:variable role:caseSelector
1 foo.elm /^  1 -> "one"$/;" kind:constant role:case
2 ...
_ foo.elm /^  _ -> "many"$/;" kind:??? role:caseDefault

If --extras=+q is given, ctags add number.1, number.2,...number._ as fully qualified tags to the tags output. In my experience, these tags may help people walk(?) in the source code.

I think Automatic FQ tagging would be a useful addition to an Elm optlib. (And one day Elm might gain native tagging similar to those available for Haskell)

I see. I will add autoFQtag flag to --langdef option.

Oh, sorry we should focus on Inter-file tags.
I will do (1) implement autoFQtag and (2) update the man page.
I am afraid that ctags can handle nested modules well.
Currently, I'm working hard on solving #1577. I will work on autoFQtag after solving #1577.
BTW, have you ever considered using mtable regex meta parser?
http://docs.ctags.io/en/latest/optlib.html#byte-oriented-pattern-matching-with-multiple-regex-tables
It is the last resort for people not wanting to write a parser in C language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants