Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression 2.1.13: AuthenticationException in Centos 7 using WCF #31110

Open
Cronan opened this issue Oct 8, 2019 · 43 comments
Open

Regression 2.1.13: AuthenticationException in Centos 7 using WCF #31110

Cronan opened this issue Oct 8, 2019 · 43 comments

Comments

@Cronan
Copy link
Contributor

Cronan commented Oct 8, 2019

Describe the bug
I use the WCF client from .NET Core to access Windows WCF services from Linux.
Everything works correctly using .NET Core 2.1.3

Upgrading to 2.1.13 results in the exception below when calling the WCF client.
I also see the same problem with .NET Core 2.2 latest or 3.0 latest.

Expected behavior
I use the WCF client from .NET Core to access Windows WCF services from Linux.
Everything works correctly using .NET Core 2.1.3

I expected this to continue working in newer versions of .NET Core.

Additional context
I use kinit before making this call to ensure that I'm authenticated correctly.
Am I missing some dependencies, or could this be something else?

Linux Version

cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

.NET Core Version

dotnet --info
.NET Core SDK (reflecting any global.json):
 Version:   2.1.802
 Commit:    177d0b2525

Runtime Environment:
 OS Name:     centos
 OS Version:  7
 OS Platform: Linux
 RID:         centos.7-x64
 Base Path:   /usr/share/dotnet/sdk/2.1.802/

Host (useful for support):
  Version: 2.1.13
  Commit:  1a165a1588

.NET Core SDKs installed:
  2.1.802 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.13 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.13 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.1.13 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

Full Stack trace

System.ComponentModel.Win32Exception: GSSAPI operation failed with error - Unspecified GSS failure.  Minor code may provide more information (Server not found in Kerberos database).

The above exception was the direct cause of the following exception:

System.Security.Authentication.AuthenticationException: Authentication failed, see inner exception.
   at System.Net.Security.NegoState.StartSendAuthResetSignal(LazyAsyncResult lazyResult, Byte[] message, Exception exception)
   at System.Net.Security.NegoState.StartSendBlob(Byte[] message, LazyAsyncResult lazyResult)
   at System.Net.Security.NegoState.ProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.NegotiateStream.BeginAuthenticateAsClient(NetworkCredential credential, ChannelBinding binding, String targetName, ProtectionLevel requiredProtectionLevel, TokenImpersonationLevel allowedImpersonationLevel, AsyncCallback asyncCallback, Object asyncState)
   at System.Net.Security.NegotiateStream.BeginAuthenticateAsClient(NetworkCredential credential, String targetName, ProtectionLevel requiredProtectionLevel, TokenImpersonationLevel allowedImpersonationLevel, AsyncCallback asyncCallback, Object asyncState)
   at System.Net.Security.NegotiateStream.<>c__DisplayClass27_0.<AuthenticateAsClientAsync>b__0(AsyncCallback callback, Object state)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncImpl(Func`3 beginMethod, Func`2 endFunction, Action`1 endAction, Object state, TaskCreationOptions creationOptions)
   at System.Net.Security.NegotiateStream.AuthenticateAsClientAsync(NetworkCredential credential, String targetName, ProtectionLevel requiredProtectionLevel, TokenImpersonationLevel allowedImpersonationLevel)
   at System.ServiceModel.Channels.WindowsStreamSecurityUpgradeProvider.WindowsStreamSecurityUpgradeInitiator.OnInitiateUpgradeAsync(Stream stream, OutWrapper`1 remoteSecurity)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
System.ServiceModel.Security.SecurityNegotiationException: Authentication failed, see inner exception.
   at System.ServiceModel.Channels.WindowsStreamSecurityUpgradeProvider.WindowsStreamSecurityUpgradeInitiator.OnInitiateUpgradeAsync(Stream stream, OutWrapper`1 remoteSecurity)
   at System.ServiceModel.Channels.StreamSecurityUpgradeInitiatorBase.InitiateUpgradeAsync(Stream stream)
   at System.ServiceModel.Channels.ConnectionUpgradeHelper.InitiateUpgradeAsync(StreamUpgradeInitiator upgradeInitiator, OutWrapper`1 connectionWrapper, ClientFramingDecoder decoder, IDefaultCommunicationTimeouts defaultTimeouts, TimeSpan timeout)
   at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.SendPreambleAsync(IConnection connection, ArraySegment`1 preamble, TimeSpan timeout)
   at System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnectionAsync(TimeSpan timeout)
   at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpenAsync(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.OnOpenAsyncInternal(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.System.ServiceModel.IAsyncCommunicationObject.OpenAsync(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.OpenAsyncInternal(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
   at System.Runtime.TaskHelpers.CallActionAsync[TArg](Action`1 action, TArg argument)
   at System.ServiceModel.Channels.CommunicationObject.OpenOtherAsync(ICommunicationObject other, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannel.OnOpenAsync(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.OnOpenAsyncInternal(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.System.ServiceModel.IAsyncCommunicationObject.OpenAsync(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.OpenAsyncInternal(TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannel.CallOpenOnce.System.ServiceModel.Channels.ServiceChannel.ICallOnce.Call(ServiceChannel channel, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannel.CallOnceManager.CallOnce(TimeSpan timeout, CallOnceManager cascade)
   at System.ServiceModel.Channels.ServiceChannel.EnsureOpened(TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(MethodCall methodCall, ProxyOperationRuntime operation)
   at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(MethodInfo targetMethod, Object[] args)
--- End of stack trace from previous location where exception was thrown ---
   at System.Reflection.DispatchProxyGenerator.Invoke(Object[] args)
   at generatedProxy_1.GetById(Int32 )
   at MyApp.Client.Connections.BaseConnection`1.ExecuteOnSingleUseChannel[T](Func`2 activity)
@Cronan
Copy link
Contributor Author

Cronan commented Oct 8, 2019

Crossposted from here

@mconnew mconnew transferred this issue from dotnet/wcf Oct 8, 2019
@mconnew
Copy link
Member

mconnew commented Oct 8, 2019

Moved to corefx repo as this isn't a bug/change introduced in WCF code. @davidsh, are you aware of any changes in NegotiateStream between 2.1.3 and 2.1.13 which would have caused this?
@Cronan, if this works with 2.1.3 on the same host, then that should rule out any problems with kinit. I also wouldn't expect an error saying it couldn't find the server in the database if there was a problem with kinit as that implies it's communicating with the Kerberos server successfully.

@Cronan
Copy link
Contributor Author

Cronan commented Oct 8, 2019 via email

@mconnew
Copy link
Member

mconnew commented Oct 8, 2019

@Cronan, it might be useful to find the exact version which caused the break. I wasn't going to suggest that as it can be a lot of work and it's possible that the appropriate devs might know which changes may have broken this, but if you have it working in docker, it might not be a lot of work. Especially if you do a binary search through the versions.
Something quicker that would likely be useful is to capture a Kerberos trace for when it's working and when it's not working. You can do this by setting the environment variable KRB5_TRACE before you run your code to capture Kerberos tracing. You can do one of the following:

Output log to stdout

export KRB5_TRACE = /dev/stdout

Output log to file

export KRB5_TRACE = ~/krb5_trace.log

@davidsh
Copy link
Contributor

davidsh commented Oct 8, 2019

@davidsh, are you aware of any changes in NegotiateStream between 2.1.3 and 2.1.13 which would have caused this?

Thanks for bringing this issue to my attention. Yes, there were changes between 2.1.3 and 2.1.13. We ported some fixes from .NET Core 3.0 into the latest servicing releases for 2.1 and 2.2. These were high priority Linux Kerberos related fixes.

See: dotnet/corefx#40109

I'll need to dig into your particular scenario to see why it appears to have been regressed by these changes.

@davidsh davidsh self-assigned this Oct 8, 2019
@davidsh
Copy link
Contributor

davidsh commented Oct 8, 2019

Can you please provide a repro for this? Or describe in great detail the scenario, networking environment and especially how Kerberos is being used. What OS is the server? What OS is the client?

For the Linux client, please include the krb5.conf file on the machine and a KRB5_TRACE.LOG as described above. But, please don't post any confidential information to this GitHub issue. If necessary, you can mask off any information in the traces that is confidential.

@Cronan
Copy link
Contributor Author

Cronan commented Oct 8, 2019 via email

@mconnew
Copy link
Member

mconnew commented Oct 8, 2019

@Cronan, I wouldn't try changing your Kerberos stack at this point, it will just add noise. Things were working with .NET Core 2.1.3 so unless we find a reason to believe there's an issue with the version of Kerberos you are running with, just leave everything which isn't .NET Core in the same state as when it's working with .NET Core 2.1.3.

@mconnew
Copy link
Member

mconnew commented Oct 8, 2019

@davidsh, He's provided details of the client in the initial bug description. He also explained the server is a Windows WCF service. Does the specific Windows version matter?

@Cronan, for your WCF client, are you explicitly specifying a service identity? You would have a reference to the one of the classes SpnEndpointIdentity, UpnEndpointIdentity or DnsEndpointIdentity. If you don't have a reference to any of these in your code, we default to using a targetName of host/hostname where hostname is extracted from the uri of the service you are connecting to. So depending on your the address you provided to WCF it might not be a FQDN.

@davidsh
Copy link
Contributor

davidsh commented Oct 8, 2019

@davidsh, He's provided details of the client in the initial bug description. He also explained the server is a Windows WCF service. Does the specific Windows version matter?

No. But we need a repro. At the very least, snippets of WCF code for the server and client and how the service is configured.

Also, can you please get KRB5_TRACES for both the working scenario (2.1.3) and broken scenario (2.1.13)?

As a workaround can you try setting this environment variable on the Linux client? It will enable the 'legacy' HTTP stack (uses libcurl on Linux). I'm curious if it will help with this problem at least as a workaround for now:

export DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER=0

@mconnew
Copy link
Member

mconnew commented Oct 8, 2019

@davidsh, this is not using HTTP, this is NegotiateStream.

@davidsh
Copy link
Contributor

davidsh commented Oct 8, 2019

@davidsh, this is not using HTTP, this is NegotiateStream.

Oh. Good point. Yes, setting that environment variable won't change anything.

@Cronan
Copy link
Contributor Author

Cronan commented Oct 9, 2019

@Cronan, I wouldn't try changing your Kerberos stack at this point, it will just add noise. Things were working with .NET Core 2.1.3 so unless we find a reason to believe there's an issue with the version of Kerberos you are running with, just leave everything which isn't .NET Core in the same state as when it's working with .NET Core 2.1.3.

Sorry, I actually meant Centos rpms, I was typing that on a phone ...

@Cronan
Copy link
Contributor Author

Cronan commented Oct 9, 2019

cat /etc/krb5.conf
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = MY.DOMAIN.COM
 dns_lookup_realm = false
 dns_lookup_kdc = true
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 default_tkt_enctypes = rc4-hmac
 default_tgs_enctypes = rc4-hmac

[realms]
 MY.DOMAIN.COM = {
  kdc = uk-ldap.MY.DOMAIN.COM:88
  admin_server = MY.DOMAIN.COM:749
  default_domain = MY.DOMAIN.COM
 }

[domain_realm]
 .myotherdomain.com = MY.DOMAIN.COM
 MY.DOMAIN.COM = MY.DOMAIN.COM
 .MY.DOMAIN.COM = MY.DOMAIN.COM
 myotherdomain.com = MY.DOMAIN.COM

@Cronan
Copy link
Contributor Author

Cronan commented Oct 9, 2019

KRB5_TRACE.LOG for 2.1.3 (working scenario):

cat krb5_trace.log
[9407] 1570605553.815700: ccselect module realm chose cache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with client principal icronyn@MY.DOMAIN.COM for server principal tcp/mywindowsservice-prod.MY.DOMAIN.COM:8184@MY.DOMAIN.COM
[9407] 1570605553.815701: Getting credentials icronyn@MY.DOMAIN.COM -> tcp/mywindowsservice-prod.MY.DOMAIN.COM:8184@MY.DOMAIN.COM using ccache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[9407] 1570605553.815702: Retrieving icronyn@MY.DOMAIN.COM -> tcp/mywindowsservice-prod.MY.DOMAIN.COM:8184@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: 0/Success
[9407] 1570605553.815704: Creating authenticator for icronyn@MY.DOMAIN.COM -> tcp/mywindowsservice-prod.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, seqnum 823849255, subkey rc4-hmac/23EC, session key rc4-hmac/B264
[9407] 1570605553.815709: Read AP-REP, time 1570605553.815705, subkey rc4-hmac/23EC, seqnum 1116169193

@Cronan
Copy link
Contributor Author

Cronan commented Oct 9, 2019

KRB5_TRACE.LOG for 2.1.13 (failing scenario):

cat krb5_trace.log 
[16826] 1570607597.439200: ccselect module realm chose cache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with client principal icronyn@MY.DOMAIN.COM for server principal tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM
[16826] 1570607597.439201: Getting credentials icronyn@MY.DOMAIN.COM -> tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM using ccache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[16826] 1570607597.439202: Retrieving icronyn@MY.DOMAIN.COM -> tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: -1765328243/Matching credential not found (filename: /var/opt/thinlinc/sessions/icronyn/10/krb5cc)
[16826] 1570607597.439203: Retrieving icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: 0/Success
[16826] 1570607597.439204: Starting with TGT for client realm: icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM
[16826] 1570607597.439205: Requesting tickets for tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, referrals on
[16826] 1570607597.439206: Generated subkey for TGS request: rc4-hmac/1EBF
[16826] 1570607597.439207: etypes requested in TGS request: rc4-hmac
[16826] 1570607597.439209: Encoding request body and padata into FAST request
[16826] 1570607597.439210: Sending request (5328 bytes) to MY.DOMAIN.COM
[16826] 1570607597.439211: Resolving hostname uk-ldap.MY.DOMAIN.COM
[16826] 1570607597.439212: Initiating TCP connection to stream 10.194.10.221:88
[16826] 1570607597.439213: Sending TCP request to stream 10.194.10.221:88
[16826] 1570607597.439214: Received answer (406 bytes) from stream 10.194.10.221:88
[16826] 1570607597.439215: Terminating TCP connection to stream 10.194.10.221:88
[16826] 1570607597.439216: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[16826] 1570607597.439217: No URI records found
[16826] 1570607597.439218: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[16826] 1570607597.439219: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[16826] 1570607597.439220: No SRV records found
[16826] 1570607597.439221: Response was not from master KDC
[16826] 1570607597.439222: Decoding FAST response
[16826] 1570607597.439223: TGS request result: -1765328377/Server not found in Kerberos database
[16826] 1570607597.439224: Requesting tickets for tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, referrals off
[16826] 1570607597.439225: Generated subkey for TGS request: rc4-hmac/D1CF
[16826] 1570607597.439226: etypes requested in TGS request: rc4-hmac
[16826] 1570607597.439228: Encoding request body and padata into FAST request
[16826] 1570607597.439229: Sending request (5328 bytes) to MY.DOMAIN.COM
[16826] 1570607597.439230: Resolving hostname uk-ldap.MY.DOMAIN.COM
[16826] 1570607597.439231: Initiating TCP connection to stream 10.192.10.221:88
[16826] 1570607597.439232: Sending TCP request to stream 10.192.10.221:88
[16826] 1570607597.439233: Received answer (406 bytes) from stream 10.192.10.221:88
[16826] 1570607597.439234: Terminating TCP connection to stream 10.192.10.221:88
[16826] 1570607597.439235: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[16826] 1570607597.439236: No URI records found
[16826] 1570607597.439237: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[16826] 1570607597.439238: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[16826] 1570607597.439239: No SRV records found
[16826] 1570607597.439240: Response was not from master KDC
[16826] 1570607597.439241: Decoding FAST response
[16826] 1570607597.439242: TGS request result: -1765328377/Server not found in Kerberos database
[17052] 1570607597.686051: Getting initial credentials for icronyn@MY.DOMAIN.COM
[17052] 1570607597.686053: Sending unauthenticated request
[17052] 1570607597.686054: Sending request (199 bytes) to MY.DOMAIN.COM
[17052] 1570607597.686055: Resolving hostname uk-ldap.MY.DOMAIN.COM
[17052] 1570607597.686056: Sending initial UDP request to dgram 10.192.10.222:88
[17052] 1570607597.686057: Received answer (207 bytes) from dgram 10.192.10.222:88
[17052] 1570607597.686058: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[17052] 1570607597.686059: No URI records found
[17052] 1570607597.686060: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[17052] 1570607597.686061: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[17052] 1570607597.686062: No SRV records found
[17052] 1570607597.686063: Response was not from master KDC
[17052] 1570607597.686064: Received error from KDC: -1765328359/Additional pre-authentication required
[17052] 1570607597.686067: Preauthenticating using KDC method data
[17052] 1570607597.686068: Processing preauth types: 16, 15, 11, 19, 2
[17052] 1570607597.686069: Selected etype info: etype rc4-hmac, salt "", params ""
[17052] 1570607597.686070: PKINIT client has no configured identity; giving up
[17052] 1570607597.686071: PKINIT client has no configured identity; giving up
[17052] 1570607597.686072: Preauth module pkinit (16) (real) returned: 22/Invalid argument
[17052] 1570607597.686073: PKINIT client ignoring draft 9 offer from RFC 4556 KDC
[17052] 1570607597.686074: Preauth module pkinit (15) (real) returned: -1765328360/Preauthentication failed
[17052] 1570607597.686075: AS key obtained for encrypted timestamp: rc4-hmac/E3F4
[17052] 1570607597.686077: Encrypted timestamp (for 1570607597.820889): plain 301AA011180F32303139313030393037353331375AA10502030C8699, encrypted 92D1E0DCC1611B46145C9DA2A5AB09A6D2BBF2F7F7195A523D12DF00A7F030DFA688A310727421687A2D3DADC17969D0D7A665BF
[17052] 1570607597.686078: Preauth module encrypted_timestamp (2) (real) returned: 0/Success
[17052] 1570607597.686079: Produced preauth for next request: 2
[17052] 1570607597.686080: Sending request (275 bytes) to MY.DOMAIN.COM
[17052] 1570607597.686081: Resolving hostname uk-ldap.MY.DOMAIN.COM
[17052] 1570607597.686082: Sending initial UDP request to dgram 10.194.0.87:88
[17052] 1570607597.686083: Received answer (122 bytes) from dgram 10.194.0.87:88
[17052] 1570607597.686084: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[17052] 1570607597.686085: No URI records found
[17052] 1570607597.686086: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[17052] 1570607597.686087: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[17052] 1570607597.686088: No SRV records found
[17052] 1570607597.686089: Response was not from master KDC
[17052] 1570607597.686090: Received error from KDC: -1765328332/Response too big for UDP, retry with TCP
[17052] 1570607597.686091: Request or response is too big for UDP; retrying with TCP
[17052] 1570607597.686092: Sending request (275 bytes) to MY.DOMAIN.COM (tcp only)
[17052] 1570607597.686093: Resolving hostname uk-ldap.MY.DOMAIN.COM
[17052] 1570607597.686094: Initiating TCP connection to stream 10.192.10.222:88
[17052] 1570607597.686095: Sending TCP request to stream 10.192.10.222:88
[17052] 1570607597.686096: Received answer (5100 bytes) from stream 10.192.10.222:88
[17052] 1570607597.686097: Terminating TCP connection to stream 10.192.10.222:88
[17052] 1570607597.686098: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[17052] 1570607597.686099: No URI records found
[17052] 1570607597.686100: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[17052] 1570607597.686101: No SRV records found
[17052] 1570607597.686102: Response was not from master KDC
[17052] 1570607597.686103: Salt derived from principal: MY.DOMAIN.COMicronyn
[17052] 1570607597.686104: AS key determined by preauth: rc4-hmac/E3F4
[17052] 1570607597.686105: Decrypted AS reply; session key is: rc4-hmac/5E7D
[17052] 1570607597.686106: FAST negotiation: unavailable
[17052] 1570607597.686107: Initializing FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with default princ icronyn@MY.DOMAIN.COM
[17052] 1570607597.686108: Storing icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM in FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[17052] 1570607597.686109: Storing config in FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc for krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM: pa_type: 2
[17052] 1570607597.686110: Storing icronyn@MY.DOMAIN.COM -> krb5_ccache_conf_data/pa_type/krbtgt\/MY.DOMAIN.COM\@MY.DOMAIN.COM@X-CACHECONF: in FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[16826] 1570607597.439250: ccselect module realm chose cache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with client principal icronyn@MY.DOMAIN.COM for server principal tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM
[16826] 1570607597.439251: Getting credentials icronyn@MY.DOMAIN.COM -> tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM using ccache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[16826] 1570607597.439252: Retrieving icronyn@MY.DOMAIN.COM -> tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: -1765328243/Matching credential not found (filename: /var/opt/thinlinc/sessions/icronyn/10/krb5cc)
[16826] 1570607597.439253: Retrieving icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: 0/Success
[16826] 1570607597.439254: Starting with TGT for client realm: icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM
[16826] 1570607597.439255: Requesting tickets for tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, referrals on
[16826] 1570607597.439256: Generated subkey for TGS request: rc4-hmac/3343
[16826] 1570607597.439257: etypes requested in TGS request: rc4-hmac
[16826] 1570607597.439259: Encoding request body and padata into FAST request
[16826] 1570607597.439260: Sending request (5328 bytes) to MY.DOMAIN.COM
[16826] 1570607597.439261: Resolving hostname uk-ldap.MY.DOMAIN.COM
[16826] 1570607597.439262: Initiating TCP connection to stream 10.192.10.222:88
[16826] 1570607597.439263: Sending TCP request to stream 10.192.10.222:88
[16826] 1570607597.439264: Received answer (406 bytes) from stream 10.192.10.222:88
[16826] 1570607597.439265: Terminating TCP connection to stream 10.192.10.222:88
[16826] 1570607597.439266: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[16826] 1570607597.439267: No URI records found
[16826] 1570607597.439268: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[16826] 1570607597.439269: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[16826] 1570607597.439270: No SRV records found
[16826] 1570607597.439271: Response was not from master KDC
[16826] 1570607597.439272: Decoding FAST response
[16826] 1570607597.439273: TGS request result: -1765328377/Server not found in Kerberos database
[16826] 1570607597.439274: Requesting tickets for tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, referrals off
[16826] 1570607597.439275: Generated subkey for TGS request: rc4-hmac/FCED
[16826] 1570607597.439276: etypes requested in TGS request: rc4-hmac
[16826] 1570607597.439278: Encoding request body and padata into FAST request
[16826] 1570607597.439279: Sending request (5328 bytes) to MY.DOMAIN.COM
[16826] 1570607597.439280: Resolving hostname uk-ldap.MY.DOMAIN.COM
[16826] 1570607597.439281: Initiating TCP connection to stream 10.194.0.87:88
[16826] 1570607597.439282: Sending TCP request to stream 10.194.0.87:88
[16826] 1570607597.439283: Received answer (406 bytes) from stream 10.194.0.87:88
[16826] 1570607597.439284: Terminating TCP connection to stream 10.194.0.87:88
[16826] 1570607597.439285: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[16826] 1570607597.439286: No URI records found
[16826] 1570607597.439287: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[16826] 1570607597.439288: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[16826] 1570607597.439289: No SRV records found
[16826] 1570607597.439290: Response was not from master KDC
[16826] 1570607597.439291: Decoding FAST response
[16826] 1570607597.439292: TGS request result: -1765328377/Server not found in Kerberos database

@Cronan
Copy link
Contributor Author

Cronan commented Oct 9, 2019

So, the first thing that jumps out to me is that in the working version the server principal used is mywindowsservice-prod, which is the alias used in the SPN and URI by the client to connect.

In the failing version, the server principal used is the server name, not the alias used as above.

SPN = "tcp/mywindowsservice-prod.my.domain.com:8184"
URI = "net.tcp://mywindowsservice-prod.my.domain.com:8184/MyDataService"
nslookup mywindowsservice-prod
Server:  machine123.myotherdomain.com
Address:  10.192.207.245

Non-authoritative answer:
Name:    mywindowsserviceothername.my.domain.com
Address:  10.192.226.115
Aliases:  mywindowsservice-prod.my.domain.com

@Cronan
Copy link
Contributor Author

Cronan commented Oct 9, 2019

@davidsh, He's provided details of the client in the initial bug description. He also explained the server is a Windows WCF service. Does the specific Windows version matter?

@Cronan, for your WCF client, are you explicitly specifying a service identity? You would have a reference to the one of the classes SpnEndpointIdentity, UpnEndpointIdentity or DnsEndpointIdentity. If you don't have a reference to any of these in your code, we default to using a targetName of host/hostname where hostname is extracted from the uri of the service you are connecting to. So depending on your the address you provided to WCF it might not be a FQDN.

Sample code below:

var identity = new SpnEndpointIdentity("tcp/mywindowsservice-prod.my.domain.com:8184");
TokenImpersonationLevel tokenImpersonationLevel = TokenImpersonationLevel.Identification;
var endpointAddress = new EndpointAddress(new Uri(net.tcp://mywindowsservice-prod.my.domain.com:8184/MyDataService/SomeData), identity);

@Cronan
Copy link
Contributor Author

Cronan commented Oct 9, 2019

@davidsh, He's provided details of the client in the initial bug description. He also explained the server is a Windows WCF service. Does the specific Windows version matter?

No. But we need a repro. At the very least, snippets of WCF code for the server and client and how the service is configured.

Also, can you please get KRB5_TRACES for both the working scenario (2.1.3) and broken scenario (2.1.13)?

As a workaround can you try setting this environment variable on the Linux client? It will enable the 'legacy' HTTP stack (uses libcurl on Linux). I'm curious if it will help with this problem at least as a workaround for now:

export DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER=0

Service config:

<system.serviceModel>
<services>
      <service behaviorConfiguration="MyBehaviour" name="MyDataServiceCode.Facades.SomeData">
        <endpoint address="net.tcp://localhost:8184/MyDataService/SomeData"
				  binding="netTcpBinding" bindingConfiguration="UnreliableTCP"
				  name="MyDataServiceSomeData" contract="MyDataServiceCode.Interfaces.ISomeData" />

      </service>
</services>
<behaviours>
<serviceBehaviors>
<behavior name="MyBehaviour">
          <serviceDebug includeExceptionDetailInFaults="true" />
          <serviceThrottling maxConcurrentCalls="500" maxConcurrentInstances="500" maxConcurrentSessions="500" />
          <dataContractSerializer maxItemsInObjectGraph="2147483647" />
          <serviceAuthorization impersonateCallerForAllOperations="true"></serviceAuthorization>
        </behavior>
</serviceBehaviors>
</behaviours>
    <bindings>
      <netTcpBinding>
        <binding name="UnreliableTCP" receiveTimeout="23:59:59" transferMode="Buffered" maxBufferPoolSize="0" maxBufferSize="2147483647"
						 maxConnections="100" maxReceivedMessageSize="2147483647">
          <reliableSession inactivityTimeout="23:59:59" enabled="false" />
        </binding>
      </netTcpBinding>
   </bindings>
</system.serviceModel>

@Cronan
Copy link
Contributor Author

Cronan commented Oct 9, 2019

@Cronan, it might be useful to find the exact version which caused the break. I wasn't going to suggest that as it can be a lot of work and it's possible that the appropriate devs might know which changes may have broken this, but if you have it working in docker, it might not be a lot of work. Especially if you do a binary search through the versions.
Something quicker that would likely be useful is to capture a Kerberos trace for when it's working and when it's not working. You can do this by setting the environment variable KRB5_TRACE before you run your code to capture Kerberos tracing. You can do one of the following:

Output log to stdout

export KRB5_TRACE = /dev/stdout

Output log to file

export KRB5_TRACE = ~/krb5_trace.log

I did the binary search and 2.1.12 is the most recent version that works, and 2.1.13 is the version that starts failing.

@davidsh
Copy link
Contributor

davidsh commented Oct 9, 2019

So, the first thing that jumps out to me is that in the working version the server principal used is mywindowsservice-prod, which is the alias used in the SPN and URI by the client to connect.
In the failing version, the server principal used is the server name, not the alias used as above.

As an experiment, please try adding this to your krb5.conf file in the [libdefaults] section

dns_canonicalize_hostname = false

Normally, the best-practice is to define SPNs that use the FQDN of the server. And that means using the A record DNS name. But it seems that your environment has defined the SPN against the CNAME (alias) DNS record. Setting the above in your krb5.conf file should prevent forward lookups of the hostname.

I understand that this worked before upgrading to 2.1.13. But I suspect other changes we made in that servicing release has now impacted using CNAMEs as SPNs. Please try this out and also send us updated KRB5_TRACE logs.

@mconnew
Copy link
Member

mconnew commented Oct 9, 2019

@Cronan, you are including the TCP port number your service is running on as part of the SPN name. Is that intentional? Is the port number part of the SPN as set up in your DC?

@davidsh, only a process running as the SYSTEM user or one of it's equivalents such as Network Service are able to use the hostname or FQDN SPN. If you are running with any other user, you normally need to create a different SPN to use.

@davidsh, I could understand the CNAME/FQDN thing when using HttpClient, but this is NegotiateStream. WCF is explicitly proving the targetName from the provided SpnEndpointIdentity. Are you saying that NegotitateStream will use something different than what was provided to the targetName argument?

@davidsh
Copy link
Contributor

davidsh commented Oct 9, 2019

Are you saying that NegotitateStream will use something different than what was provided to the targetName argument?

NegotiateStream, HttpClient, and SqlClient all use a common native library, System.Net.Security.Native on Linux. That library does the various GSS-API calls to support Negotiate/Kerberos/NTLM protocol. GSS-API uses the Linux krb5.conf file for various settings.

The transformation of the passed in 'targetName' from NegotiateStream is still present at the API call to the GSS-API. But in the implementation of the GSS-API call, SPNEGO protocol uses the settings in the krb5.conf file and apparently is transforming the 'targetName' and normalizing it (i.e CNAME -> A record) before doing a Kerberos principal lookup. That transformation is not something controllable at the .NET layer. That is why I suggested to try the dns_canonicalize_hostname = false in the krb5.conf file as an experiment to see it is helps. It will help narrow down what is happening in this network environment.

@mconnew
Copy link
Member

mconnew commented Oct 9, 2019

I found the documentation for this behavior. This seems an odd design decision as the app can do that dance if needed but as you said, nothing that can be done about it. Based on those docs, it looks like this was caused by a change to use GSS_C_NT_HOSTBASED_SERVICE.

@davidsh
Copy link
Contributor

davidsh commented Oct 9, 2019

Based on those docs, it looks like this was caused by a change to use GSS_C_NT_HOSTBASED_SERVICE.

We made the change in .NET Core 3.0 and servicing for .NET Core 2.1/2.2 because a lot of scenarios were broken until we start using GSS_C_NT_HOSTBASED_SERVICE, which appears to be the preferred format for the SPNEGO protocol plugin for GSS-API.

But this issue here raises some interesting test areas that we probably want to address in a better way long-term in .NET Core. However, short-term, adjusting krb5.conf might be the best way to unblock things for now.

@Cronan
Copy link
Contributor Author

Cronan commented Oct 10, 2019

@Cronan, you are including the TCP port number your service is running on as part of the SPN name. Is that intentional? Is the port number part of the SPN as set up in your DC?

Yes, the port number is is part of the SPN. We tend to use the same name but different ports for TCP vs REST, for example.

@Cronan
Copy link
Contributor Author

Cronan commented Oct 10, 2019

Based on those docs, it looks like this was caused by a change to use GSS_C_NT_HOSTBASED_SERVICE.

We made the change in .NET Core 3.0 and servicing for .NET Core 2.1/2.2 because a lot of scenarios were broken until we start using GSS_C_NT_HOSTBASED_SERVICE, which appears to be the preferred format for the SPNEGO protocol plugin for GSS-API.

But this issue here raises some interesting test areas that we probably want to address in a better way long-term in .NET Core. However, short-term, adjusting krb5.conf might be the best way to unblock things for now.

I'm definitely going to try this today, but it's obviously not a fix for an entire environment - for example, it would prevent short names (like myservice-prod) from being canonicalised to myservice-prod.mydomain.com, if I understand the option correctly:

dns_canonicalize_hostname
Indicate whether name lookups will be used to canonicalize hostnames for use in service principal names. Setting this flag to false can improve security by reducing reliance on DNS, but means that short hostnames will not be canonicalized to fully-qualified hostnames. The default value is true.

It also leaves my systems in a weird position, where most Linux calls that need authentication (e.g. curl <hostname> --anyauth -u : hitting a WCF REST API, also using kerberos) seem totally happy with cnames or a records, but .NET Core is super-strict on requiring a records across the board.

Or am I misunderstanding the situation?

@davidsh
Copy link
Contributor

davidsh commented Oct 12, 2019

It also leaves my systems in a weird position, where most Linux calls that need authentication (e.g. curl --anyauth -u : hitting a WCF REST API, also using kerberos) seem totally happy with cnames or a records, but .NET Core is super-strict on requiring a records across the board.
Or am I misunderstanding the situation?

Are all your SPNs defined only against FQDN CNAMES? Or do you have a mixture of services defined with SPNs using either A or CNAME?

@davidsh
Copy link
Contributor

davidsh commented Oct 17, 2019

Interesting article talking about changes to MIT Kerberos libraries that will allow for trying different behavior (CNAME first, then A if CNAME fails) when trying to find proper SPN.

This new version of MIT Kerberos (Release 1.18) will have a new option for dns_canonicalize_hostname option in the krb5.conf file:

dns_canonicalize_hostname = fallback

See: http://k5wiki.kerberos.org/wiki/Projects/Server_Hostname_Canonicalization

@Cronan
Copy link
Contributor Author

Cronan commented Oct 18, 2019

It also leaves my systems in a weird position, where most Linux calls that need authentication (e.g. curl --anyauth -u : hitting a WCF REST API, also using kerberos) seem totally happy with cnames or a records, but .NET Core is super-strict on requiring a records across the board.
Or am I misunderstanding the situation?

Are all your SPNs defined only against FQDN CNAMES? Or do you have a mixture of services defined with SPNs using either A or CNAME?

There are a mixture of services defined with SPNs using either A or CNAME.
Either seem to work OK in Windows, some are tricky from Linux.

@Cronan
Copy link
Contributor Author

Cronan commented Oct 18, 2019

Interesting article talking about changes to MIT Kerberos libraries that will allow for trying different behavior (CNAME first, then A if CNAME fails) when trying to find proper SPN.

This new version of MIT Kerberos (Release 1.18) will have a new option for dns_canonicalize_hostname option in the krb5.conf file:

dns_canonicalize_hostname = fallback

See: http://k5wiki.kerberos.org/wiki/Projects/Server_Hostname_Canonicalization

It looks to me like Centos 7 is currently on 1.15

sudo yum list installed | grep kr
krb5-devel.x86_64          1.15.1-37.el7_6  @lgb-7.4-centos-base-updates-2019-07
krb5-libs.x86_64           1.15.1-37.el7_6  @lgb-7.4-centos-base-updates-2019-07
krb5-workstation.x86_64    1.15.1-37.el7_6  @lgb-7.4-centos-base-updates-2019-07

Yes, definitely 1.15.1:

klist -V
Kerberos 5 version 1.15.1

But this looks interesting - do you think this will work with GSS_C_NT_HOSTBASED_SERVICE?

@davidsh
Copy link
Contributor

davidsh commented Oct 18, 2019

do you think this will work with GSS_C_NT_HOSTBASED_SERVICE?

I haven't tried it out yet. But those krb5.conf options do affect GSS_C_NT_HOSTBASED_SERVICE name types. So, I do expect it to work with the new dns_canonicalize_hostname = fallback setting.

@mconnew
Copy link
Member

mconnew commented Oct 18, 2019

@davidsh, it would seem that the switch to using GSS_C_NT_HOSTBASED_SERVICE is a breaking change. Would it be possible to retroactively add an AppSetting to control this on a per-app basis? Requiring a system wide krb5.conf configuration which affects all Kerberos usage (and could break a different scenario when setting things so .NET Core works) seems a little too coarse to control behavior.

@davidsh
Copy link
Contributor

davidsh commented Oct 19, 2019

Requiring a system wide krb5.conf configuration which affects all Kerberos usage (and could break a different scenario when setting things so .NET Core works) seems a little too coarse to control behavior.

Apps can use a custom krb5.conf file by setting an environment variable:

http://web.mit.edu/kerberos/krb5-1.13/doc/admin/conf_files/krb5_conf.html

krb5.conf
The krb5.conf file contains Kerberos configuration information, including the locations of KDCs and
admin servers for the Kerberos realms of interest, defaults for the current realm and for Kerberos
applications, and mappings of hostnames onto Kerberos realms. Normally, you should install your
krb5.conf file in the directory /etc. You can override the default location by setting the environment
variable KRB5_CONFIG. Multiple colon-separated filenames may be specified in KRB5_CONFIG; all
files which are present will be read.

@Cronan

There are a mixture of services defined with SPNs using either A or CNAME.
Either seem to work OK in Windows, some are tricky from Linux.

Are all your channels/services using the same type of WCF bindings? Are any using HTTP related WCF bindings? I Would like to understand if any of them use .NET Core classes besides System.Net.Security.NegotiateStream.

@Cronan
Copy link
Contributor Author

Cronan commented Oct 21, 2019 via email

@davidsh
Copy link
Contributor

davidsh commented Oct 31, 2019

All the services I’m currently using are WCF TCP - I’ll try the custom Conf file and report back.

Have you been able to test using the custom .conf file?

Also, do you have NTLM installed on the Linux machines (gss-ntlmssp)? And what credential are you using in the .NET layer? Is it CredentialCache.DefaultCredentials?

The reason for the questions is that when Negotiate (SPNEGO) protocol is used but has errors with Kerberos, NTLM fallback would be possible assuming you are using explicit credentials and have installed the NTLM package on the Linux machine.

@Cronan
Copy link
Contributor Author

Cronan commented Nov 12, 2019

All the services I’m currently using are WCF TCP - I’ll try the custom Conf file and report back.

Have you been able to test using the custom .conf file?

Yes, this worked! 🎉
Thank you @davidsh and @mconnew

Also, do you have NTLM installed on the Linux machines (gss-ntlmssp)?
Yes, I believe so ...

sudo yum list installed | grep ntlm
libntlm.x86_64             1.3-6.el7        @lgb-7.4-centos-base

And what credential are you using in the .NET layer? Is it CredentialCache.DefaultCredentials?

I don't see anything like this call in my WCF client code, where would it be?

The reason for the questions is that when Negotiate (SPNEGO) protocol is used but has errors with Kerberos, NTLM fallback would be possible assuming you are using explicit credentials and have installed the NTLM package on the Linux machine.

davidsh referenced this issue in davidsh/runtime Dec 3, 2019
This is the first of several PRs that add Enterprise Scenarios Testing capability to
the repo. This PR focusses on Linux which allows for docker containers to be used
in an enterprise network configuration.

I focussed on 2 workflows: 1) The 'dev' workflow, and 2) The PR/CI workflow. The dev
workflow works well since it's using containers in a docker-compose environment along
with volume mounting your current dev's repo enlistment. The PR/CI workflow gives us
an Azure DevOps pipeline to automate verification.

I still need to work with the infra team to add a real pipeline that will run. I can't
do that until this is merged. In the meantime, I have my own DevOps pipeline that verified this PR.

See: https://dev.azure.com/systemnetncl/Enterprise%20Testing/_build/results?buildId=141

I will be linking a follow-up GitHub issue describing the roadmap for building on this system
including adding Windows environments, NTLM protocol, proxies, and other libraries such as
System.Net.Mail and System.Data.SqlClient. Those libraries also use Negotiate/Kerberos/NTLM
enterprise-oriented protocols.

Contributes to:
https://github.com/dotnet/corefx/issues/41652
https://github.com/dotnet/corefx/issues/41489
https://github.com/dotnet/corefx/issues/36896
https://github.com/dotnet/corefx/issues/30150
https://github.com/dotnet/corefx/issues/24707
https://github.com/dotnet/corefx/issues/10041
https://github.com/dotnet/corefx/issues/6606
https://github.com/dotnet/corefx/issues/6161
@karelz karelz changed the title AuthenticationException in Centos 7 using WCF from .NET Core 2.1.13 Regression: AuthenticationException in Centos 7 using WCF from .NET Core 2.1.13 Dec 5, 2019
@karelz karelz changed the title Regression: AuthenticationException in Centos 7 using WCF from .NET Core 2.1.13 Regression 2.1.13: AuthenticationException in Centos 7 using WCF Dec 5, 2019
davidsh referenced this issue Dec 5, 2019
This is the first of several PRs that add Enterprise Scenarios Testing capability to
the repo. This PR focusses on Linux which allows for docker containers to be used
in an enterprise network configuration.

I focussed on 2 workflows: 1) The 'dev' workflow, and 2) The PR/CI workflow. The dev
workflow works well since it's using containers in a docker-compose environment along
with volume mounting your current dev's repo enlistment. The PR/CI workflow gives us
an Azure DevOps pipeline to automate verification.

I still need to work with the infra team to add a real pipeline that will run. I can't
do that until this is merged. In the meantime, I have my own DevOps pipeline that verified this PR.

See: https://dev.azure.com/systemnetncl/Enterprise%20Testing/_build/results?buildId=141

I will be linking a follow-up GitHub issue describing the roadmap for building on this system
including adding Windows environments, NTLM protocol, proxies, and other libraries such as
System.Net.Mail and System.Data.SqlClient. Those libraries also use Negotiate/Kerberos/NTLM
enterprise-oriented protocols.

Contributes to:
https://github.com/dotnet/corefx/issues/41652
https://github.com/dotnet/corefx/issues/41489
https://github.com/dotnet/corefx/issues/36896
https://github.com/dotnet/corefx/issues/30150
https://github.com/dotnet/corefx/issues/24707
https://github.com/dotnet/corefx/issues/10041
https://github.com/dotnet/corefx/issues/6606
https://github.com/dotnet/corefx/issues/6161

* Address PR feedback

* Change pipeline *.yml to only run on selected filepaths for PRs
* Change kdc container Dockerfile to be based on ubuntu:18.04
* Fix typo in README.md

* Update .yml file

* Link (instead of copy) apache kerb module to the right place
@Cronan
Copy link
Contributor Author

Cronan commented Jan 22, 2020

@karelz What more info do you need from me?

@karelz
Copy link
Member

karelz commented Jan 22, 2020

@Cronan it was part of our triage, so I assume @davidsh didn't feel like he has all the info to make it actionable yet.
There is lots of back and forth on this issue. What is IMO needed is clear repro environment setup that will allow us to reproduce it locally and dig deeper. Do you think you can help us identify the key environment setup knobs to reproduce it?

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@davidsh
Copy link
Contributor

davidsh commented Feb 18, 2020

@Cronan it was part of our triage, so I assume @davidsh didn't feel like he has all the info to make it actionable yet.

There is lots of back and forth on this issue. What is IMO needed is clear repro environment setup that will allow us to reproduce it locally and dig deeper. Do you think you can help us identify the key environment setup knobs to reproduce it?

Actually, I do understand the root cause here. This issue is caused by a regression in behavior due to a PR fixing related Linux Kerberos issues as described above. See: #31110 (comment)

I know how to reproduce this issue.

The fix for this issue is complex because it will require some changes to the PAL layer to fix both this issue and preserve the other fixes previously made. I have been thinking about the proper fix. We can also test the fix in the 'runtime-libraries enterprise-linux' DevOps pipline that I previously created.

@karelz
Copy link
Member

karelz commented Jun 18, 2020

@davidsh do you think we can address this realistically in 5.0?

@davidsh
Copy link
Contributor

davidsh commented Jun 19, 2020

It's a non-trivial fix. The fix has to be done in the Linux PAL layer. It has to be able to try two different forms of Kerberos principal names. I.e. try one form and if it doesn't match to an SPN in Kerberos, then try the alternate format.

Based on current priorities and staffing I don't see this as achievable for 5.0.

@Cronan
Copy link
Contributor Author

Cronan commented Jun 19, 2020

That’s disappointing but understandable. If someone can point me to the relevant place in the code to start looking I’m prepared to try to take this on.

@karelz karelz modified the milestones: 5.0.0, 6.0.0 Jul 28, 2020
@karelz karelz modified the milestones: 6.0.0, Future May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants