Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generation validation does not support separate SDKs #3309

Closed
heaths opened this issue May 10, 2022 · 31 comments
Closed

Code generation validation does not support separate SDKs #3309

heaths opened this issue May 10, 2022 · 31 comments
Assignees
Labels
SDK Automation Issues related to sdk automation Spec PR Tools Tooling that runs in azure-rest-api-specs repo.

Comments

@heaths
Copy link
Member

heaths commented May 10, 2022

Re: Azure/azure-rest-api-specs#19012

For Cognitive Services, as well as other services like Key Vault, we produce a number of SDKs with different swaggers. The validation may fail because different SDKs - different sets of swaggers - may refer to a common class that then gets duplicated or seems like it. In practice, however, this wouldn't happen because they ship in separate SDKs.

Could we have some way to either suppress or handle these situations so we don't have validation failures like this?

@heaths heaths added the Spec PR Tools Tooling that runs in azure-rest-api-specs repo. label May 10, 2022
@akning-ms akning-ms added the SDK Automation Issues related to sdk automation label May 12, 2022
@dw511214992 dw511214992 linked a pull request May 26, 2022 that will close this issue
@dw511214992 dw511214992 removed a link to a pull request May 26, 2022
@dw511214992
Copy link
Member

Hi @heaths
Are the examples you list dataplane sdk? If yes, I think the latest sdk automation pipeline shouldn't be triggered. (.Net is an exception, and it will generates sdk if there is autorest.md in sdk repository)

About your question:

different SDKs - different sets of swaggers - may refer to a common class that then gets duplicated or seems like it

I don't think it will cause any issues. if it's generating multi sdks in a swagger PR, sdk automation script will run autorest command for multi times, and there is no duplication issue. If it generates a sdk and the swaggers refer to the same common class, there will be no error reported.

The reason of getting duplication issue in Azure/azure-rest-api-specs#19012 is that it actually contains the duplicated schema name in swaggers. For example, TaskState is defined twice:
image
url: https://github.com/Azure/azure-rest-api-specs/blob/c0012f1d2624a32dc1b80875cee423311074c55e/specification/cognitiveservices/data-plane/Language/preview/2022-05-15-preview/common.json#L176-L196

@heaths
Copy link
Member Author

heaths commented Jun 15, 2022

Shouldn't the lenient deduplication take care of that and fold them into a single class if the definitions are the same?

@dw511214992
Copy link
Member

No. If sdk owner don't alow deduplicated model in generating sdk, the pipeline will report error. Based on my knowledge, python sdk owner don't allow it.

@heaths
Copy link
Member Author

heaths commented Jun 23, 2022

Ultimately, the problem here is that all those swaggers in the Language RP folder do not comprise a single SDK. What might be good for the automation is to have a way - perhaps in readme.md - to better separate those. Do we, for example, define different tags for different sets of swaggers? /cc @weshaggard @johanste with whom I've been talking about this topic.

This isn't the only data plane RP in this situation. Key Vault also has numerous swaggers and SDKs, though those swaggers aren't necessarily meant to stand alone and could be aggregated - though, the generated SDK is still nothing like what we ship so I still think trying to generate one is of dubious value. Perhaps better is, for data plane RPs in the specs repo that have SDKs in the language repo, is to update those autorest.md files and regenerate. It's the only way to be sure something wasn't broken. Blindly generating SDKs purely from swaggers provides some level of correctness, but not enough, IMO.

@dw511214992
Copy link
Member

Hi @heaths I think you are focused on dataplane now. Actually, the integration of DPG with sdk automation pipeline is on the way. The way of integration will use the autorest.md in sdk repository, and it should meet your requirement (service team can add model deplication related configuration in the autorest.md).
thanks

@heaths
Copy link
Member Author

heaths commented Jun 30, 2022

@weshaggard does this mean any affected autorest.md files are going to have to add the "feature" tags (or whatever we called them) we talked about, such that we have separate permutations of versions + versions for package tags?

@dw511214992 a lot of the Azure SDK teams works on data plane services, so I am focused on them yes. This is where we're seeing a lot of validation problems, as well as procedural problems.

@dw511214992
Copy link
Member

dw511214992 commented Jul 4, 2022

@heaths I deleted the useless sdk automation pipelines in Azure/azure-rest-api-specs#19000, so there shouldn't be many validation problems now. Besides, when the support for DPG is GA, I think we can enable more services on DPG automation pipeline.

@heaths
Copy link
Member Author

heaths commented Jul 5, 2022

DPG won't solve this problem. These swaggers contain conflicting models and these SDKs will generate models eventually, so the problem will return.

You merged that PR May 10th and problem still happens: https://github.com/Azure/azure-rest-api-specs/pull/19190/checks?check_run_id=7189271735

@heaths
Copy link
Member Author

heaths commented Jul 6, 2022

@weshaggard
Copy link
Member

@dw511214992 @raych1 could you try to help understand how we make our tools work if there are multiple sdk's in the same directory.

@dw511214992
Copy link
Member

@heaths The two PRs you mentioned is another issue, which is not related to this issue. azure-sdk-for-net is not supported, and there is already a PR to disable it. Service team should change it too azure-sdk-for-net-track2 in https://github.com/Azure/azure-rest-api-specs/blob/main/dev/cognitiveservices/data-plane/Language/readme.md
image

@raych1
Copy link
Contributor

raych1 commented Jul 12, 2022

@dw511214992 @raych1 could you try to help understand how we make our tools work if there are multiple sdk's in the same directory.

@weshaggard , it could be supported by separated readme.md.

@heaths
Copy link
Member Author

heaths commented Jul 12, 2022

The two PRs you mentioned is another issue, which is not related to this issue.

I opened this issue. ;) Regardless of the pipeline used, the assumption that validation can build a single SDK from the entirety of the directory contents is incorrect. Same is true for several SDKs including Key Vault. We build multiple data plane SDKs from different files in the RP directory, and swaggers are often written with that in mind; ergo, you can't simply merge them all together to generate an SDK.

@raych1
Copy link
Contributor

raych1 commented Jul 13, 2022

@heaths , SDK generation base on the readme.md. It needs to use different readme to support multiple SDK configuration. With that, it needs to add subfolders to organize swaggers per SDK under data-plane folder then place readme under these subfolders. This is actually what cognitive service do right now.

@heaths
Copy link
Member Author

heaths commented Jul 13, 2022

The folders under the data-plane directory are RPs (resource providers). The Language folder is a single RP that ships multiple SDKs. Same for Key Vault. It's a single RP. Those folders aren't merely for organization of SDKs. Validation needs to support this. /cc @maririos

@weshaggard
Copy link
Member

@heaths perhaps I'm missing something but if we were able to split the Key Vault folder into multiple folders couldn't we achieve the goal of having the files split by folder? For each sdk generation we will want to have one readme configuration in one folder.

@heaths
Copy link
Member Author

heaths commented Jul 13, 2022

Those folders comprise a single RP and should be together per conversations with @johanste. IMO, the swagger repo should be a more git-friendly process for authoring by humans instead of trying to make it work for tools. Why can't a readme just have separate sections with different input-file arrays to generate different SDKs like we do in the SDK language repos?

/cc @markweitzel

@weshaggard
Copy link
Member

Adding @josefree who is working on documentation for the layout of the specs repo but the one thing we did try to align on is trying to keep one SDK limited to one folder and one readme.

Part of the reason for that decision is to not have to teach all our tools how to read and understand the readme file as sort of a project file to group the various specs together. The readme should hopefully be limited to the code generation and not all the other tools we need to run.

@johanste
Copy link
Member

Those folders comprise a single RP and should be together per conversations with @johanste. IMO, the swagger repo should be a more git-friendly process for authoring by humans instead of trying to make it work for tools. Why can't a readme just have separate sections with different input-file arrays to generate different SDKs like we do in the SDK language repos?

/cc @markweitzel

I think what I was trying to say that there is not a 1:1 mapping between RP and client library - even more so for data planes than for control planes. I would say that we want one readme.md per client library (because it's purpose in life is 90+% to drive client code gen), but that is up for debate as well since we want to have some way of describing what endpoints constitutes a single "service".

@heaths
Copy link
Member Author

heaths commented Jul 14, 2022

Then we should talk again about having subdirectories under the RP folder. This was one idea we discussed as an alternative to this class of issue.

@raych1
Copy link
Contributor

raych1 commented Jul 14, 2022

The folders under the data-plane directory are RPs (resource providers). The Language folder is a single RP that ships multiple SDKs. Same for Key Vault. It's a single RP. Those folders aren't merely for organization of SDKs. Validation needs to support this. /cc @maririos

@heaths , can you point me the location of Language folder? I didn't find it.

@weshaggard
Copy link
Member

@raych1 I believe https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/Language is what is being referred to.

Then we should talk again about having subdirectories under the RP folder

@heaths isn't that what cognitive is doing with the Language folder or are you thinking about splitting at a different level?

@heaths
Copy link
Member Author

heaths commented Jul 14, 2022

We don't have subfolders under the Language folder. We've discussed it briefly in the past, but is that even supported by validation and other tools?

@weshaggard
Copy link
Member

Language itself is a subfolder and as such we should support other folder next to it but I would also assume we would support other subfolders under that but I will leave that for @raych1 to confirm.

@heaths
Copy link
Member Author

heaths commented Jul 14, 2022

It's the RP folder. All services I've looked at (many) are organized the same way. As @johanste said above, these swaggers comprise a single RP.

@raych1
Copy link
Contributor

raych1 commented Jul 18, 2022

@heaths , are you saying for this RP folder https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/Language, service team needs to build multiple SDK package base on different swaggers and you would like the SDK automation tools support one readme file under this language folder to have multiple tags with different package definition?

@heaths
Copy link
Member Author

heaths commented Jul 18, 2022

Yes, that set of swaggers produces different SDKs. How it's implemented - whether as separate tags in a readme or whatever - isn't as important. Just let me know what to change.

@weshaggard
Copy link
Member

@raych1 the primary question is can we have subdirectories under there with one readme in each subdirectory? If so I think that will allow us to hold to one readme for each library rule. If not we might need to reconsider some other options.

@raych1
Copy link
Contributor

raych1 commented Jul 19, 2022

@weshaggard @heaths , yes, it's supported to have one readme in each subdirectory of Language folder. Please take this approach to make it consistent with the rule of one readme for each library.

@josefree
Copy link
Member

The bottom line is we should have one and only one autorest.md in each subdirectory, and each autorest.md generates one SDK package. must not allow an autorest.md to generate multiple SDK packages, which means using 'tag' to differentiate SDK package must be avoid. This will lead to a mess of version line from API to SDK.

@raych1
Copy link
Contributor

raych1 commented Jul 20, 2022

@heaths , I close this issue as we aligned on the approach to take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SDK Automation Issues related to sdk automation Spec PR Tools Tooling that runs in azure-rest-api-specs repo.
Projects
Status: 🎊 Closed
Development

No branches or pull requests

7 participants