Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interceptors open issues for C# 13 #7835

Merged
merged 24 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions proposals/interceptors-issues-2024-01.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Open issues for interceptors

jaredpar marked this conversation as resolved.
Show resolved Hide resolved
The following issues were identified during the interceptors preview, and should be addressed before the feature is transitioned to stable.

Further down there is a "Future" section containing issues which do not block the move to stable, but could be addressed in the longer term.

## File path portability

One of our design goals with interceptors was for them to conform to the "portability" requirement of source generators. That means we should be able to run the source generators for a project containing interceptors, commit the results to disk, then remove the source generators from the project and be able to build in a variety of environments. Note that this is different from *determinism*--we don't mind if the resulting assembly differs in things like layout, etc. in each build environment, but want it to be the case that if an interceptor declaration functions in one environment, then it also functions in a different environment, assuming all the sources and dependencies are fully available in each environment.

However, we currently require absolute paths for `InterceptsLocationAttribute`, with pathmap substitution performed if present. This doesn't work between local builds on different paths, or across local builds and CI, because only CI will actually pass `/pathmap` during the build. The local build prefers to preserve local paths (e.g. in the PDB) so that the local versions of the sources can be discovered easily during debugging. The different environments will have different answers for what they expect the paths to be, and the InterceptsLocation will have errors in environments which differ from the "generation-time" environment.

Having looked at several strategies to address here, including adjusting when the build system passes `/pathmap` to the compiler, it seems the most straightforward way is to permit relative paths. This is a strategy which is already proven out by `#line` directives.
RikkiGibson marked this conversation as resolved.
Show resolved Hide resolved

When a relative path appears in an `[InterceptsLocation]`, we will resolve it relative to the path of the containing file, just as we would for `#line`. We recommend that relative paths appearing in source use `/` directory separators for portability.

```cs
// /project/src/file1.cs
class Original
{
static void Interceptable() { }

void M()
{
Interceptable();
}
}

// /project/obj/generated/file2.g.cs
class Interceptors
{
[InterceptsLocation("../../src/file1.cs", ...)] // resolves to "/project/src/file1.cs"
public static void Interceptor() { }
}
```

Since we want generated files to refer relatively to "user" files, we now need to define the paths of generated files. Also, for source generators to be able to insert the correct relative paths into the generated source, a source generator needs to know what the file path will be before the file is actually added to the compilation. We may want to add new public API like the following for this:

```diff
namespace Microsoft.CodeAnalysis;

public readonly struct SourceProductionContext
{
public void AddSource(string hintName, string source);
+ public string GetFilePath(string hintName);
jaredpar marked this conversation as resolved.
Show resolved Hide resolved
+ public string GetInterceptsFilePath(SyntaxNode intercepted, string hintName);
RikkiGibson marked this conversation as resolved.
Show resolved Hide resolved
}
RikkiGibson marked this conversation as resolved.
Show resolved Hide resolved
```

`GetFilePath()` will return the absolute file path which the generated file *would be written to* if `$(EmitCompilerGeneratedFiles)` is true in the project. Additionally, the syntax tree for the file added by a subsequent call to `AddSource` using the same `hintName` will have a `FilePath` which exactly matches the result of `GetFilePath()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assuming that as a generator author my AddSource calls look like this:

context.AddSource(context.GetFilePath("GeneratedRouteHandlers.g.cs"), mySourceText)

If so, can we make it implicit so that I can just write:

context.AddSource("GeneratedRouteHandlers.g.cs", mySourceText)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes my intent here matches what you're saying. AddSource will always take a hintName, not an absolute path. Also, it's feeling unclear to me now if GetFilePath() API is worth it, if we also have to add GetInterceptsFilePath() or similar API anyway.


`GetInterceptsFilePath(intercepted, hintName)` will return a relative file path equivalent to the path returned by `GetFilePath(hintName)` relative to `intercepted.SyntaxTree.FilePath`. We think this API will be helpful for generator authors, in significant part, because no built-in API exists on netstandard2.0 to get a relative path.

This solution will not work when the intercepted call and the interceptor method are in completely "disjunct" parts of the file system, for example, if one of the source files is included through a NuGet package which is outside the project folder, and the other is within the project folder. This limitation is similar in nature to what already exists for `#line` directives.

## Need for interceptor semantic info in IDE

Interceptors have a problematic interaction with the ILLinker analyzer, which is responsible for determining if code may be Native AOT-incompatible.

What happens is, the user calls an "original" method which is marked `[RequiresUnreferencedCode]` (i.e., incompatible with NativeAOT). A source generator adds an interceptor for the call, which *is* compatible with NativeAOT. But this is not reflected in the symbol info for the call in the Roslyn APIs.

We ended up working around this in .NET 8 by having the interceptor author also ship a *diagnostic suppressor* for the warnings produced by the ILLinker analyzer, when it knows the call will be intercepted.

```cs
routes.MapGet("/products/", () => { ... }); // ASP.NET suppressor is suppressing warning: MapGet is not NativeAOT-compatible
```

This is not ideal as it essentially requires interceptor authors to know the meaning of all diagnostics reported by analyzers whose results would be influenced by knowing the interceptor method in use at a call site. The interceptor authors must also know whether suppression is appropriate based on what the interceptor and the analyzer are each doing. For example, if the interceptor were inserting a method which *also* `[RequiresUnreferencedCode]`, it would be bad to suppress the ILLinker analyzer warning.

Ideally if an analyzer needs to know if an interceptor is being used, it could simply make a public API call to ask. This adds an extra decision that component authors need to make. For some use cases, such as rename-refactoring, it doesn't make sense to rename the interceptor method--instead the original method is what needs to be renamed. We think that in most cases, the original method is what should be analyzed, but it should still be noted as a pit that people may fall into.

That public API could look something like the following:

```diff
namespace Microsoft.CodeAnalysis;

public static class CSharpExtensions
{
+ public static IMethodSymbol? GetInterceptorMethod(this SemanticModel model, InvocationExpressionSyntax invocation, CancellationToken cancellationToken = default);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback I got includes:

  1. What about IOperation? Do we have to get the .Syntax out of the operation node and cast it? Or can we just add a property to IInvocationOperation?
  2. What about future additional member kinds? Will we add additional public API for additional kinds?

}
```

Note that if we do something like this, there will be a deep need to discover interceptors, and incrementally update compilations containing interceptors, with as little binding work as possible. It will be non-tenable perf-wise if, for example, we had to bind all the attributes in a compilation in order to decide what the interceptor for a given invocation is. We may need to store some additional information in the declaration tree, which can be incrementally updated, in order to address this.

## Future (possibly post-C# 13)

### Brittleness of line/column approaches

While we want interceptor info to be readily available to components in the IDE, we also want the IDE to be able to provide the best possible function while not running source generators on every "keystroke plus delay". Even after our work on incremental source generators, they are still quite expensive to run, and we will need a strategy such as "regenerate on save" or "regenerate on build" in order to claw back IDE performance.

The desire for less-frequent source generation is at odds with the desire for interceptor info in IDE components, as interceptors are currently designed.

For example, we want the ILLinker analyzer's results to be as useful as possible, and ideally to not include *spurious transient warnings* while the user is making edits. For example, if a user inserts some white spaces in a method body which contains intercepted calls, we don't want the ILLinker analyzer to report that some non-intercepted call is using an unsupported method. But this requires source generators to run. This *strongly* pushes the ILLinker analyzer to run *only as frequently as interceptor source generators do*.

This would be an undesirable degradation in the user experience. Instead of being able to give users confidence that their mistakes "Native AOT-wise" will be caught very quickly by the tooling and shown to them as they type, they instead are looking at longer delays, or even not being told something is wrong until they save or hit build.
RikkiGibson marked this conversation as resolved.
Show resolved Hide resolved

#### Opaque identifier and "syntax path" approach

We could consider replacing the line/column approach of identifying a location in a source text, with one slightly more amenable to modification of the text: a "key path" of sorts which indicates the traversal to be made through the syntax tree to arrive at the desired location. e.g. for

```cs
class C
{
void M0() { }

void M()
{
Call();
}
}
```

e.g. in the above, we might identify the invocation `Call` by saying:
- it's in the first member of the compilation unit (`class C`)
- it's in the second member of `class C`
- it's in the first statement of method `M()`
- it's in the expression of expression statement `Call();`
- it's in the receiver of invocation expression `Call()`.

This sequence could be encoded somehow and stuffed into the attribute. This would make it so that edits in other method bodies etc. will not necessarily require regenerating the interceptor. However, it's not clear how source generators, or the generator driver, could make use of this information in order to reduce the frequency of rerunning generators. Also, public API would likely be needed for this in order for generators to create the expected InterceptsLocationAttribute argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential is we could both

  1. Largely take the recommendations here about using relative file paths
  2. Declare that the identifier passed to InterceptsLocation is opaque

Essentially tell customers

The value passed into InterceptsLocation is opaque. The current implementation is an encoding of relative paths + line column but that could change at any time. Conforming generators should not depend on this encoding. Instead all arguments to InterceptsLocation should be generated by calling GetInterceptsLocation(syntaxNode, hintName).

That would require using the API I suggested above but taking a SyntaxNode (thus has line / column + path) instead of SyntaxTree.

I'm not saying we should do this but it is an item to consider.


#### "Handle all"

Some SGs might want to mitigate the need to rerun by simply "blanket handling" all calls to a certain set of methods in a certain context (e.g. an entire source file) at design time. Then during command line build we can actually generate granular calls.

```cs
public static void Usage()
{
Original(42); // intercepted
Original(42, 43); // NOT intercepted, no diagnostic reported
}

public static void Original(int item) { }
public static void Original(int item, int item2) { }

[InterceptsLocation("path/to/file.cs", memberName: "Original")]
public static void Interceptor(int item) { }
```

There is a risk with this approach that somehow a method would end up being intercepted with the "handle all" approach but not with the granular interceptors which are actually emitted.
TODO: how to find misuse and help generator authors fix? maybe if *nothing* is intercepted by it?
TODO: when we f12 on SG'd code, do we go to implementation code, even if SG is publishing "definition-only" versions of the sources?

### Intercepting more member kinds

Currently the interceptors feature only supports intercepting ordinary methods. Much like with partial methods and the upcoming partial properties, users want to be able to intercept usages of more kinds of members, such as properties and constructors.

For all the member kinds we want to support, we will need to decide how usages of each member kind are denoted syntactically. Where possible, we should try to be consistent with [UnsafeAccessorAttribute](https://learn.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.unsafeaccessorattribute?view=net-8.0), which also works by referencing members in attribute arguments.

For method invocations like `receiver.M(args)`, we use the location of the `M` token to intercept the call. Perhaps for property accesses like `receiver.P` we can use location of the `P` token. For constructors should use the location of the `new` token.

jaredpar marked this conversation as resolved.
Show resolved Hide resolved
For member usages such as user-defined implicit conversions which do not have any specific corresponding syntax, we expect that interception will never be possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a section on dotnet/roslyn#68218 be included somewhere in the doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it will also be useful to get a sense of how urgent it is to get it done in C# 13.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it feels like the real kicker here is not just that we want generics support (through something more sophisticated like a unification approach), but that we actually want the interceptor to be able to have a more specific signature than the original method. e.g. in this case where in order for type parameters to propagate through the system, the interceptor parameter needs to be Func<TypeParam> even though the original parameter type is Delegate.

32 changes: 32 additions & 0 deletions proposals/interceptors-mvp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Interceptors MVP:
RikkiGibson marked this conversation as resolved.
Show resolved Hide resolved

- generated SyntaxTree.FilePath
RikkiGibson marked this conversation as resolved.
Show resolved Hide resolved
- `InterceptsLocation(string location)` accepts a *location specifier* which denotes the source file and location of the interceptee. Format is *implementation defined*.
- public API:
- `string SourceProductionContext.GetInterceptableLocation(InvocationExpressionSyntax intercepted, string interceptorFileHintName)`
- Returns a string which is the "latest version" of the implementation-defined location format
- We expect the compiler to support *consuming* previous versions of the location format, but we don't need a point of control for *producing* previous versions of that format
- `bool CSharpExtensions.TryGetInterceptor(this SemanticModel model, InvocationExpressionSyntax intercepted, [NotNullWhen(true)] out IMethodSymbol? interceptor)`

- Why is the location format implementation-defined?
- We want to reserve the ability to improve it in the future. Because we expect generators to obtain the attribute argument by calling a public API, we think it's useful for the location specifier to be "opaque" to the generator itself.
- We want to continue to optimize for the source generator scenario, at least for now. The current format that we have is already not amenable to hand-authoring.
- We intend that producing and consuming format-string is a "conversation" between generator and compiler, where compiler is parsing out the location specifier from a fragment of source code which the compiler itself produced in `GetInterceptableLocation` on behalf of a generator.

- Why are these APIs specialized to intercepting methods?
- If we had `TryGetInterceptableLocation(SyntaxNode node)` we would still be leaving in a usability/clarity gap for generators. Which syntax node is the generator supposed to pass in? Should it be the `NameSyntax` whose location is currently used to denote the location of the intercepted call? Or should it be the containing `InvocationExpressionSyntax`? What is the failure mode supposed to be if the syntax node is of a kind which doesn't support interceptors? Do we just return null?
- Many or all of the above questions are obviated by using specialized types in the public API.
- When we add support for more member kinds, we should add more public APIs corresponding to that. e.g. `MemberAccessExpression`/`ElementAccessExpression` overload for a property/indexer access or `ObjectCreationExpression` for a constructor call.
- The commitment we are making when we support intercepting a particular member kind, is equal in significance to the commitment we make when introducing a public API. It doesn't serve generator authors to try and optimize for fewest number of public APIs in this case.

-----

- Variance (later on in C# 13)
- https://github.com/dotnet/aspnetcore/issues/47338
- ASP.NET has indicated this can wait till later in cycle
- Argument type is `Func<TCaptured>`. Interceptee parameter is `System.Delegate`. Interceptor parameter wants to be `Func<T>` and to have call site pass the `TCaptured` as a type argument.

- Properties (post C# 13)
- UnsafeAccessor does not support properties directly. Instead, a given usage of UnsafeAccessor decorates a method which calls through to a single property accessor.
- Consider: What happens when we want to intercept `obj.Prop += a`? We can't choose a single method to replace usage of `Prop`.
- Conclusion: We should probably favor intercepting a property with an extension property. We are likely blocked here until we ship extensions.