Skip to content

Conversation

dimon222
Copy link
Collaborator

Fix #105

@kevin-bates I would appreciate some assistance with testing, as I don't have such configuration of cluster available.

@kevin-bates
Copy link
Member

Yes, I'll work with the person who raised the issue in EG. Currently, my Hadoop cluster is unavailable (not sure why) so I can't do much. I'll post back here.

@abzymeinsjtu
Copy link

abzymeinsjtu commented Jul 19, 2021

@dimon222

hi dmitry, this change doesn't seem to work.

It only request with user.name param when preparing request, while there's no change when i call such method as ResourceManager.cluster_information.

btw, I make a test with the change below and it works, but don't know whether there're side effects.

import requests


class SimpleAuth(requests.auth.AuthBase):
    def __init__(self, username):
        self.username = username

    def __call__(self, request):
        request.prepare_url(request.url, params={'user.name': self.username})
        return request

@dimon222
Copy link
Collaborator Author

dimon222 commented Jul 19, 2021

Hm, I wonder why does it say in first browser interaction then. Based on suggested change it would send it all the time.

image

Either way, can you confirm if this works for some other endpoints too?

UPD: perhaps because we don't save some kind of session header. I wonder what kind of header it is. Could you also print keys in response?

print(r.headers.keys())
cookiejar = r.cookies
for cookie in cookiejar:
   print(cookie.name)

@kevin-bates
Copy link
Member

I deployed both this package and a version of EnterpriseGateway (EG) that creates a SimpleAuth for its access to the YARN RM.

The code attempts to honor the KERNEL_USERNAME conveyed to the EG kernel start request. If no value is provided, the current username is used (which will be the user in which the EG server process is running). In my case, EG is running as user gateway.

If I convey a KERNEL_USERNAME of alice, I see the following debug statement:

[D 2021-07-19 09:51:40.403 EnterpriseGatewayApp] Using SimpleAuth with 'alice' against endpoints: ['http://yarn-eg-node-1.fyre.ibm.com:8088/ws/v1']

and the launch succeeds. However, when looking at the status of the job in YARN, I see the job is under user gateway...
image

There are a couple of mentions of impersonation on jupyter-server/enterprise_gateway#979 and that clearly is not happening. Is this approach actually supposed to impersonate the referenced user? In my case, user 'alice' is not an actual user, just a name.

If this is working as intended, I really don't see what value it provides over not specifying an auth instance at all.

I just realized I need to set hadoop.http.authentication.simple.anonymous.allowed to false. Looking into that now...

@kevin-bates
Copy link
Member

After disabling anonymous access, I get 401's when using either the process user (gateway) or a plain user name (alice):

[D 2021-07-19 11:31:51.840 EnterpriseGatewayApp] Using SimpleAuth with 'gateway' against endpoints: ['http://yarn-eg-node-1.fyre.ibm.com:8088/ws/v1']
[W 210719 11:31:51 hadoop_conf:61] Failed to access RM 'http://yarn-eg-node-1.fyre.ibm.com:8088/ws/v1' - HTTP Code '401', continuing...

@dimon222
Copy link
Collaborator Author

dimon222 commented Jul 19, 2021

@kevin-bates Indeed, makes sense, session headers/cookies are not shared between requests (since I didn't add logic for that, assuming that hadoop will do my work), and I think thats the root cause for this. I would appreciate if you could collect debugging information I mentioned above. The list of headers/cookie keys that have to be saved in Auth object and passed along.

@kevin-bates
Copy link
Member

kevin-bates commented Jul 19, 2021

Here's the updated code in the exception handler in hadoop_conf.py:

    if response.status_code != 200:
        log.warning("Failed to access RM '{url}' - HTTP Code '{status}', continuing...".format(url=url, status=response.status_code))
        log.warning(response.headers.keys())
        cookiejar = response.cookies
        for cookie in cookiejar:
            log.warning(cookie.name)
        return False

and corresponding output...

[D 2021-07-19 12:37:18.143 EnterpriseGatewayApp] Using SimpleAuth with 'gateway' against endpoints: ['http://yarn-eg-node-1.fyre.ibm.com:8088/ws/v1']
[W 210719 12:37:18 hadoop_conf:61] Failed to access RM 'http://yarn-eg-node-1.fyre.ibm.com:8088/ws/v1' - HTTP Code '401', continuing...
[W 210719 12:37:18 hadoop_conf:62] KeysView({'Cache-Control': 'must-revalidate,no-cache,no-store', 'Date': 'Mon, 19 Jul 2021 19:37:18 GMT, Mon, 19 Jul 2021 19:37:18 GMT', 'Pragma': 'no-cache, no-cache', 'Content-Type': 'text/html; charset=iso-8859-1', 'WWW-Authenticate': 'PseudoAuth', 'Set-Cookie': 'hadoop.auth=; Path=/; HttpOnly', 'Content-Length': '1406', 'Server': 'Jetty(6.1.26.hwx)'})
[W 210719 12:37:18 hadoop_conf:65] hadoop.auth

@dimon222
Copy link
Collaborator Author

@kevin-bates give updated code another try, see my commit

@kevin-bates
Copy link
Member

Thanks Dmitry. Looks like we get about the same thing. However, it dawned on my that the "response" you wanted to debug is the response you just updated. Previously, I was logging the response that is returned in check_is_active_rm() in hadoop_conf.py.

I went ahead and moved the debug statements into the SimpleAuth __call__ method and see the following.. These are produced after you've set the token...

[D 2021-07-19 13:46:43.861 EnterpriseGatewayApp] Using SimpleAuth with 'gateway' against endpoints: ['http://yarn-eg-node-1.fyre.ibm.com:8088/ws/v1']
[W 210719 13:46:43 auth:20] KeysView({'Cache-Control': 'no-cache', 'Expires': 'Mon, 19 Jul 2021 20:46:43 GMT, Mon, 19 Jul 2021 20:46:43 GMT', 'Date': 'Mon, 19 Jul 2021 20:46:43 GMT, Mon, 19 Jul 2021 20:46:43 GMT', 'Pragma': 'no-cache, no-cache', 'Content-Type': 'application/json', 'Set-Cookie': 'hadoop.auth="u=gateway&p=gateway&t=simple&e=1626763603865&s=n/awXcd9CMoAMBXRlTjXk08VK9U="; Path=/; HttpOnly', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked', 'Server': 'Jetty(6.1.26.hwx)'})
[W 210719 13:46:43 auth:23] hadoop.auth
[W 210719 13:46:43 hadoop_conf:61] Failed to access RM 'http://yarn-eg-node-1.fyre.ibm.com:8088/ws/v1' - HTTP Code '401', continuing...

Here's how the method looks...

    def __call__(self, request):
        if not self.auth_done:
            _session = requests.Session()
            r = _session.get(request.url, params={"user.name": self.username})
            r.raise_for_status()
            self.auth_token = _session.cookies.get_dict()['hadoop.auth']
            self.auth_done = True
            log.warning(r.headers.keys())
            cookiejar = r.cookies
            for cookie in cookiejar:
                log.warning(cookie.name)
        else:
            r.cookies.set("hadoop.auth", self.auth_token)
        return request

FYI, I will be unavailable for the rest of Monday (July 19).

@dimon222
Copy link
Collaborator Author

Might have figured the problem, another commit...

@kevin-bates
Copy link
Member

I removed the other debug logging. Looks like we don't always have a cookies attribute:

[D 2021-07-19 14:10:26.378 EnterpriseGatewayApp] Using SimpleAuth with 'gateway' against endpoints: ['http://yarn-eg-node-1.fyre.ibm.com:8088/ws/v1']
[E 210719 14:10:26 web:1793] Uncaught exception POST /api/kernels (9.211.77.69)
    HTTPServerRequest(protocol='http', host='yarn-eg-node-1.fyre.ibm.com:8888', method='POST', uri='/api/kernels', version='HTTP/1.1', remote_ip='9.211.77.69')
    Traceback (most recent call last):
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/enterprise_gateway/services/kernels/handlers.py", line 90, in post
        await super(MainKernelHandler, self).post()
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/jupyter_server/services/kernels/handlers.py", line 47, in post
        path=model.get('path'))
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 156, in start_kernel
        kernel_id = await super(RemoteMappingKernelManager, self).start_kernel(*args, **kwargs)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 186, in start_kernel
        kernel_id = await ensure_async(self.pinned_superclass.start_kernel(self, **kwargs))
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/jupyter_server/utils.py", line 176, in ensure_async
        result = await obj
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/jupyter_client/multikernelmanager.py", line 217, in _async_start_kernel
        await fut
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/jupyter_client/multikernelmanager.py", line 195, in _add_kernel_when_ready
        await kernel_awaitable
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/jupyter_client/utils.py", line 33, in ensure_async
        return await obj
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 365, in start_kernel
        await super(RemoteKernelManager, self).start_kernel(**kwargs)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/jupyter_client/manager.py", line 376, in _async_start_kernel
        self.kernel = await ensure_async(self._launch_kernel(kernel_cmd, **kw))
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/jupyter_client/utils.py", line 33, in ensure_async
        return await obj
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 425, in _launch_kernel
        proxy = await self.process_proxy.launch_process(kernel_cmd, **kwargs)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/enterprise_gateway/services/processproxies/yarn.py", line 104, in launch_process
        self._initialize_resource_manager(**kwargs)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/enterprise_gateway/services/processproxies/yarn.py", line 95, in _initialize_resource_manager
        self.resource_mgr = ResourceManager(service_endpoints=endpoints, auth=auth, verify=cert_path)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/yarn_api_client/resource_manager.py", line 92, in __init__
        if check_is_active_rm(endpoint, timeout, auth, verify):
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/yarn_api_client/hadoop_conf.py", line 55, in check_is_active_rm
        response = requests.get(url + "/cluster", timeout=timeout, auth=auth, verify=verify)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/requests/api.py", line 76, in get
        return request('get', url, params=params, **kwargs)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/requests/api.py", line 61, in request
        return session.request(method=method, url=url, **kwargs)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/requests/sessions.py", line 516, in request
        prep = self.prepare_request(req)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/requests/sessions.py", line 459, in prepare_request
        hooks=merge_hooks(request.hooks, self.hooks),
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/requests/models.py", line 318, in prepare
        self.prepare_auth(auth, url)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/requests/models.py", line 549, in prepare_auth
        r = auth(self)
      File "/opt/anaconda2/envs/py3/lib/python3.6/site-packages/yarn_api_client/auth.py", line 16, in __call__
        request.cookies.set("hadoop.auth", self.auth_token)
    AttributeError: 'PreparedRequest' object has no attribute 'cookies'

... without losing them from session
@dimon222
Copy link
Collaborator Author

dimon222 commented Jul 19, 2021

Didn't realize it was preparedrequest, now that use case should also be covered.
Anything else I missed?

@abzymeinsjtu
Copy link

i tested the change on EG 2.5.0 with Aliyun EMR Cluster and it worked

both anonymous and kerberos is disabled

截屏2021-07-20 下午1 40 40

截屏2021-07-20 下午1 40 14

(egtest) [hadoop@emr-header-1 ~]$ jupyter enterprisegateway --ip=0.0.0.0 --port_retries=10 --EnterpriseGatewayApp.list_kernels=true --EnterpriseGatewayApp.remote_hosts=emr-worker-1 --EnterpriseGatewayApp.remote_hosts=emr-worker-2
[I 2021-07-20 13:34:59.370 EnterpriseGatewayApp] The port 8888 is already in use, trying another port.
[I 2021-07-20 13:34:59.370 EnterpriseGatewayApp] Jupyter Enterprise Gateway 3.0.0.dev0 is available at http://0.0.0.0:8889
[I 210720 13:35:21 web:2239] 200 GET /api/kernelspecs (140.205.147.85) 154.68ms
[I 210720 13:35:23 web:2239] 200 GET /api/kernelspecs (140.205.147.85) 2.15ms
[I 210720 13:35:23 web:2239] 200 GET /api/kernels (140.205.147.85) 0.45ms

Starting IPython kernel for Spark in Yarn Cluster mode on behalf of user hadoop

+ eval exec /usr/lib/spark-current/bin/spark-submit '--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/hadoop/.conda/envs/egtest --conf spark.yarn.appMasterEnv.PATH=/home/hadoop/.conda/envs/egtest/bin:/home/hadoop/.conda/envs/egtest/bin:/usr/lib/miniconda3/condabin:/usr/lib/sqoop-current/bin:/usr/lib/hudi-current/bin:/usr/lib/hive-current/hcatalog/bin:/usr/lib/hive-current/bin:/usr/lib/datafactory-current/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lib/b2monitor-current/bin:/usr/lib/b2smartdata-current/bin:/usr/lib/b2jindosdk-current/bin:/usr/lib/flow-agent-current/bin:/usr/lib/hbase-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin:/usr/lib/spark-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin:/home/hadoop/.local/bin:/home/hadoop/bin --conf spark.yarn.appMasterEnv.PYTHONPATH=/home/hadoop/.conda/envs/egtest/lib/python3.8/site-packages:/usr/lib/spark-current/python:/usr/lib/spark-current/python/lib/py4j-0.10.9-src.zip --conf spark.yarn.appMasterEnv.HIVE_CONF_DIR=/etc/ecm/hive-conf --conf spark.sql.catalogImplementation=hive --conf spark.yarn.submit.waitAppCompletion=false' '' /home/hadoop/.conda/envs/egtest/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py '' --RemoteProcessProxy.kernel-id e0fab152-b6a9-46ae-8576-9c805f4282c6 --RemoteProcessProxy.response-address 192.168.0.237:8877 --RemoteProcessProxy.spark-context-initialization-mode lazy
++ exec /usr/lib/spark-current/bin/spark-submit --master yarn --deploy-mode cluster --name e0fab152-b6a9-46ae-8576-9c805f4282c6 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/hadoop/.conda/envs/egtest --conf spark.yarn.appMasterEnv.PATH=/home/hadoop/.conda/envs/egtest/bin:/home/hadoop/.conda/envs/egtest/bin:/usr/lib/miniconda3/condabin:/usr/lib/sqoop-current/bin:/usr/lib/hudi-current/bin:/usr/lib/hive-current/hcatalog/bin:/usr/lib/hive-current/bin:/usr/lib/datafactory-current/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lib/b2monitor-current/bin:/usr/lib/b2smartdata-current/bin:/usr/lib/b2jindosdk-current/bin:/usr/lib/flow-agent-current/bin:/usr/lib/hbase-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin:/usr/lib/spark-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin:/home/hadoop/.local/bin:/home/hadoop/bin --conf spark.yarn.appMasterEnv.PYTHONPATH=/home/hadoop/.conda/envs/egtest/lib/python3.8/site-packages:/usr/lib/spark-current/python:/usr/lib/spark-current/python/lib/py4j-0.10.9-src.zip --conf spark.yarn.appMasterEnv.HIVE_CONF_DIR=/etc/ecm/hive-conf --conf spark.sql.catalogImplementation=hive --conf spark.yarn.submit.waitAppCompletion=false /home/hadoop/.conda/envs/egtest/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py --RemoteProcessProxy.kernel-id e0fab152-b6a9-46ae-8576-9c805f4282c6 --RemoteProcessProxy.response-address 192.168.0.237:8877 --RemoteProcessProxy.spark-context-initialization-mode lazy
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
64   [main] INFO  org.apache.hadoop.yarn.client.RMProxy  - Connecting to ResourceManager at emr-header-1.cluster-235523/192.168.0.237:8032
217  [main] INFO  org.apache.hadoop.yarn.client.AHSProxy  - Connecting to Application History server at emr-header-1.cluster-235523/192.168.0.237:10200
279  [main] INFO  org.apache.spark.deploy.yarn.Client  - Requesting a new application from cluster with 2 NodeManagers
584  [main] WARN  org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory  - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
653  [main] INFO  org.apache.hadoop.conf.Configuration  - resource-types.xml not found
653  [main] INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils  - Unable to find 'resource-types.xml'.
670  [main] INFO  org.apache.spark.deploy.yarn.Client  - Verifying our application has not requested more than the maximum memory capability of the cluster (35328 MB per container)
671  [main] INFO  org.apache.spark.deploy.yarn.Client  - Will allocate AM container, with 1408 MB memory including 384 MB overhead
671  [main] INFO  org.apache.spark.deploy.yarn.Client  - Setting up container launch context for our AM
674  [main] INFO  org.apache.spark.deploy.yarn.Client  - Setting up the launch environment for our AM container
682  [main] INFO  org.apache.spark.deploy.yarn.Client  - Preparing resources for our AM container
709  [main] WARN  org.apache.spark.deploy.yarn.Client  - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
3378 [main] INFO  org.apache.spark.deploy.yarn.Client  - Uploading resource file:/tmp/spark-10406244-3cbb-48a3-9cd3-31d8bc31b1fc/__spark_libs__219937958661567882.zip -> hdfs://emr-header-1.cluster-235523:9000/user/hadoop/.sparkStaging/application_1626268989286_0013/__spark_libs__219937958661567882.zip
3444 [Thread-7] INFO  org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
3692 [DataStreamer for file /user/hadoop/.sparkStaging/application_1626268989286_0013/__spark_libs__219937958661567882.zip] INFO  org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
3886 [DataStreamer for file /user/hadoop/.sparkStaging/application_1626268989286_0013/__spark_libs__219937958661567882.zip] INFO  org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
4022 [main] INFO  org.apache.spark.deploy.yarn.Client  - Uploading resource file:/home/hadoop/.conda/envs/egtest/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py -> hdfs://emr-header-1.cluster-235523:9000/user/hadoop/.sparkStaging/application_1626268989286_0013/launch_ipykernel.py
4027 [Thread-12] INFO  org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
4035 [main] INFO  org.apache.spark.deploy.yarn.Client  - Uploading resource file:/usr/lib/spark-current/python/lib/pyspark.zip -> hdfs://emr-header-1.cluster-235523:9000/user/hadoop/.sparkStaging/application_1626268989286_0013/pyspark.zip
4039 [Thread-14] INFO  org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
4048 [main] INFO  org.apache.spark.deploy.yarn.Client  - Uploading resource file:/usr/lib/spark-current/python/lib/py4j-0.10.9-src.zip -> hdfs://emr-header-1.cluster-235523:9000/user/hadoop/.sparkStaging/application_1626268989286_0013/py4j-0.10.9-src.zip
4052 [Thread-16] INFO  org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[I 210720 13:35:33 web:2239] 200 GET /api/kernels (140.205.147.85) 6.25ms
4157 [main] INFO  org.apache.spark.deploy.yarn.Client  - Uploading resource file:/tmp/spark-10406244-3cbb-48a3-9cd3-31d8bc31b1fc/__spark_conf__6509133064178639836.zip -> hdfs://emr-header-1.cluster-235523:9000/user/hadoop/.sparkStaging/application_1626268989286_0013/__spark_conf__.zip
4162 [Thread-18] INFO  org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
4195 [main] INFO  org.apache.spark.SecurityManager  - Changing view acls to: hadoop
4196 [main] INFO  org.apache.spark.SecurityManager  - Changing modify acls to: hadoop
4196 [main] INFO  org.apache.spark.SecurityManager  - Changing view acls groups to:
4197 [main] INFO  org.apache.spark.SecurityManager  - Changing modify acls groups to:
4197 [main] INFO  org.apache.spark.SecurityManager  - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
4253 [main] INFO  org.apache.spark.deploy.yarn.Client  - Submitting application application_1626268989286_0013 to ResourceManager
4285 [main] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl  - Submitted application application_1626268989286_0013
4287 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1626268989286_0013 (state: ACCEPTED)
4289 [main] INFO  org.apache.spark.deploy.yarn.Client  -
	 client token: N/A
	 diagnostics: [Tue Jul 20 13:35:34 +0800 2021] Application is Activated, waiting for resources to be assigned for AM.  Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:70656, vCores:64> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 13.224638 % ; Queue's Absolute max capacity = 100.0 % ; Queue's capacity (absolute resource) = <memory:70656, vCores:64> ; Queue's used capacity (absolute resource) = <memory:9344, vCores:7> ; Queue's max capacity (absolute resource) = <memory:70656, vCores:64> ;
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1626759334016
	 final status: UNDEFINED
	 tracking URL: http://emr-header-1.cluster-235523:20888/proxy/application_1626268989286_0013/
	 user: hadoop
4292 [shutdown-hook-0] INFO  org.apache.spark.util.ShutdownHookManager  - Shutdown hook called
4292 [shutdown-hook-0] INFO  org.apache.spark.util.ShutdownHookManager  - Deleting directory /tmp/spark-d8792d9f-1d95-40fe-a893-5279695a6fc7
4294 [shutdown-hook-0] INFO  org.apache.spark.util.ShutdownHookManager  - Deleting directory /tmp/spark-10406244-3cbb-48a3-9cd3-31d8bc31b1fc
[I 2021-07-20 13:35:34.318 EnterpriseGatewayApp] ApplicationID: 'application_1626268989286_0013' assigned for KernelID: 'e0fab152-b6a9-46ae-8576-9c805f4282c6', state: ACCEPTED, 6.0 seconds after starting.
[W 2021-07-20 13:35:37.617 EnterpriseGatewayApp] WARNING!!!! Legacy kernel response received for kernel_id 'e0fab152-b6a9-46ae-8576-9c805f4282c6'! Update kernel launchers to current version!
[I 2021-07-20 13:35:38.006 EnterpriseGatewayApp] Kernel started: e0fab152-b6a9-46ae-8576-9c805f4282c6
[I 210720 13:35:38 web:2239] 201 POST /api/kernels (140.205.147.85) 9994.99ms
[I 210720 13:35:38 web:2239] 200 GET /api/kernels/e0fab152-b6a9-46ae-8576-9c805f4282c6 (140.205.147.85) 5.65ms
[I 210720 13:35:38 web:2239] 200 GET /api/kernels/e0fab152-b6a9-46ae-8576-9c805f4282c6 (140.205.147.85) 5.58ms

@dimon222
Copy link
Collaborator Author

dimon222 commented Jul 20, 2021

I have added default user yarn as I've seen in majority of distributions. @kevin-bates leaving this PR for your review before merging this in.

@dimon222 dimon222 requested a review from kevin-bates July 20, 2021 16:53
@kevin-bates
Copy link
Member

The latest set of changes appear to work - thank you! I'm still not clear on how the user identity is actually used as it all seems to work even when providing a non-existent user name. Is it more about just including a value, primarily for auditing purposes?

Since SimpleAuth appears to work even when anonymous is allowed, I'm inclined to make this the default behavior in EG.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Dmitry!

@dimon222
Copy link
Collaborator Author

You're right on this. I think the it was meant to be called SimpleNoAuth :)
This kind of impersonation sounds... strange.

Either way, merging this in.

@dimon222 dimon222 merged commit 2e6ab9d into gateway-experiments:master Jul 20, 2021
@dimon222 dimon222 deleted the feature/simple_auth branch July 20, 2021 17:35
@dimon222
Copy link
Collaborator Author

Reminder to include example of usage on README before we release new version of this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple authentication is not supported
3 participants