Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kotlin psi structure for FIM. #829

Merged
merged 4 commits into from
Jan 29, 2025

Conversation

aaudin90
Copy link
Contributor

I present the concept of collecting a class structure for dependencies. The structure analysis is currently implemented only for Kotlin and works using PSI tree analysis, without using the Kotlin compiler.
This approach increases the accuracy of prediction for auto-completion of the code.

It will also be useful to use the dependency structure collection when adding code to the chat, for more accurate answers.

For example, a simple structure of 3 classes is used.

package org.example.package1

import org.example.package2.ClassInPackage2

class ClassInPackage1 {

    fun someMethod1(classInPackage2: ClassInPackage2): String = classInPackage2.someMethod2()
}
package org.example.package2

class ClassInPackage2 {

    fun someMethod2(): String = "Hello from $this"
}

package org.example

import org.example.package1.ClassInPackage1
import org.example.package2.ClassInPackage2

class Main(
    private val classInPackage1:ClassInPackage1,
    private val classInPackage2:ClassInPackage2,
){

    fun main() {
        // autocomplete here
        println(classInPackage1.someMethod1(classInPackage2))
    }
}

As a result of the code analysis, the FIM request is transformed into the following format:


<|repo_name|>untitled

<|file_sep|>org.example.Main
package org.example

class Main(private val classInPackage1: org.example.package1.ClassInPackage1, private val classInPackage2: org.example.package2.ClassInPackage2) {
    fun main(): TypeUnknown
}

<|file_sep|>org.example.package1.ClassInPackage1
package org.example.package1

class ClassInPackage1 {
    fun someMethod1( classInPackage2: org.example.package2.ClassInPackage2): String
}

<|file_sep|>org.example.package2.ClassInPackage2
package org.example.package2

class ClassInPackage2 {
    fun someMethod2(): String
}
null<|fim_prefix|> package org.example

import org.example.package1.ClassInPackage1
import org.example.package2.ClassInPackage2

class Main(
    private val classInPackage1:ClassInPackage1,
    private val classInPackage2:ClassInPackage2,
){

    fun main() {
         <|fim_suffix|>
        println(classInPackage1.someMethod1(classInPackage2))
    }
} <|fim_middle|>

@carlrobertoh
Copy link
Owner

This is cool, many thanks! I'll take a closer look sometime this week.

@aaudin90
Copy link
Contributor Author

This is cool, many thanks! I'll take a closer look sometime this week.

Haven't been able to watch the Merge request yet?

@carlrobertoh
Copy link
Owner

Sorry, not yet. I'll release the pending stuff first and then take a look.

Copy link
Owner

@carlrobertoh carlrobertoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I like the fact that we're taking advantage of IntelliJ's PSI tree to extract important information. However, as you can see, it can get quite ugly quickly because you have to analyze the elements/modifiers separately and then put them all back together.

I think treesitter would provide much help in scenarios where you need to query for methods, constructors, imports, package declarations, etc.

A while ago, I did a POC for a similar feature but didn't proceed with it (can't remember why), but it can give you an idea of what I'm talking about: 584a1cc#diff-cc18dd78aeb55a5c77b6baed61b6ee1e560828ab872c7e4dbcca94936a8204ecR75

@@ -142,6 +142,8 @@ configurationConfigurable.section.codeCompletion.multiLineCompletions.descriptio
configurationConfigurable.section.codeCompletion.postProcess.title=Enable tree-sitter post-processing
configurationConfigurable.section.codeCompletion.postProcess.description=If checked, the completion will be post-processed using the tree-sitter parser.
configurationConfigurable.section.codeCompletion.gitDiff.title=Enable git diff context
configurationConfigurable.section.codeCompletion.collectDependencyStructure.title=Collect the dependencies of the analyzed file.
configurationConfigurable.section.codeCompletion.collectDependencyStructure.description=Enabling the setting allows the plugin to collect the dependency structure, which increases the accuracy of the proposed data, but consumes more tokens per request.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small note somewhere that this logic only applies to Java and Kotlin sources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course! I will point out that only Kotlin is currently supported.


class ClassStructureSerializer {

fun serialize(classStructure: ClassStructure): String =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a short JavaDoc what we're serializing here, or what's the end result?

import ee.carlrobert.codegpt.codecompletions.psi.structure.models.*
import org.jetbrains.kotlin.utils.addToStdlib.ifNotEmpty

class ClassStructureSerializer {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be singleton - object ClassStructureSerializer {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, yes, but when multiple languages are supported, it won't make much sense to keep all the parsers in memory.
It seems that we will have to make a parser provider for each language that is included in the structure analysis.

@aaudin90
Copy link
Contributor Author

In general, I like the fact that we're taking advantage of IntelliJ's PSI tree to extract important information. However, as you can see, it can get quite ugly quickly because you have to analyze the elements/modifiers separately and then put them all back together.

I think treesitter would provide much help in scenarios where you need to query for methods, constructors, imports, package declarations, etc.

A while ago, I did a POC for a similar feature but didn't proceed with it (can't remember why), but it can give you an idea of what I'm talking about: 584a1cc#diff-cc18dd78aeb55a5c77b6baed61b6ee1e560828ab872c7e4dbcca94936a8204ecR75

I looked at the possibility of analyzing using this library. Ultimately, we will have the same limitations as in the current parsing implementation.
We can say that static analysis is currently being used, which cannot give an accurate code structure for such a complex language as Kotlin, there is no information about the field type in this code:

    companion object {
        val myField = "123"
    }

If you like this approach, I plan to develop Kotlin analysis by connecting a compiler, which will make it possible to output types of complex constructions and generics.
As well as support Java, a cross-project with Java-Kotin, Kotlin-Java, and Python dependencies.

Looking ahead, I have colleagues who can advise me on working with Go, TypeScript, PhP, and JS languages.

@aaudin90
Copy link
Contributor Author

One more question, I wanted to add an analysis of the file structure when working in a chat. Should I do this through the setting or enable it by default?

In the next pull request.

@carlrobertoh
Copy link
Owner

I looked at the possibility of analyzing using this library. Ultimately, we will have the same limitations as in the current parsing implementation. We can say that static analysis is currently being used, which cannot give an accurate code structure for such a complex language as Kotlin, there is no information about the field type in this code:

    companion object {
        val myField = "123"
    }

Not sure I follow. With proper Kotlin grammar rules, I believe it's possible to extract the necessary types and declarations. However, to correctly resolve declarations for specific references, you need to index the entire codebase first. I believe PSI trees provide an advantage over that, as IntelliJ handles this indexing automatically.

For example, you can explore Kotlin's AST structure using the Tree-sitter playground: https://fwcd.github.io/tree-sitter-kotlin/

If you like this approach, I plan to develop Kotlin analysis by connecting a compiler, which will make it possible to output types of complex constructions and generics. As well as support Java, a cross-project with Java-Kotin, Kotlin-Java, and Python dependencies.

Looking ahead, I have colleagues who can advise me on working with Go, TypeScript, PhP, and JS languages.

Yes, it looks good in general. If we have a clean interface for code analysis and post-processing/serialization, then we can easily replace the underlying logic with something else in later phases (if needed).

@carlrobertoh
Copy link
Owner

One more question, I wanted to add an analysis of the file structure when working in a chat. Should I do this through the setting or enable it by default?

In the next pull request.

Hmm, how about creating a new chat action tag, similar to how "Include Open Files" is done?

@aaudin90
Copy link
Contributor Author

One more question, I wanted to add an analysis of the file structure when working in a chat. Should I do this through the setting or enable it by default?
In the next pull request.

Hmm, how about creating a new chat action tag, similar to how "Include Open Files" is done?

I think it's a good idea.

Do I need any more actions for merge pull request?

@carlrobertoh
Copy link
Owner

Do I need any more actions for merge pull request?

Are you able to merge the PR? If not, then I might need to configure something.

@aaudin90
Copy link
Contributor Author

Do I need any more actions for merge pull request?

Are you able to merge the PR? If not, then I might need to configure something.

I can't

@carlrobertoh carlrobertoh merged commit 72f31bf into carlrobertoh:master Jan 29, 2025
2 checks passed
@carlrobertoh
Copy link
Owner

Do I need any more actions for merge pull request?

Are you able to merge the PR? If not, then I might need to configure something.

I can't

I'll take a look at it later.

carlrobertoh pushed a commit that referenced this pull request Feb 3, 2025
* Add Kotlin dependency and configuration file

* Add collect dependency structure option to code completion settings

* Add Kotlin PSI structure analysis and serialization components

* Add repository name and dependencies structure to InfillRequest and update prompt template for QWEN_2_5

---------

Co-authored-by: a.iudin <a.iudin@vk.team>
carlrobertoh pushed a commit that referenced this pull request Feb 3, 2025
* Add Kotlin dependency and configuration file

* Add collect dependency structure option to code completion settings

* Add Kotlin PSI structure analysis and serialization components

* Add repository name and dependencies structure to InfillRequest and update prompt template for QWEN_2_5

---------

Co-authored-by: a.iudin <a.iudin@vk.team>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants