Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p模式下,psi隐私求交失败 #197

Closed
zx007nice opened this issue Jan 14, 2025 · 17 comments
Closed

p2p模式下,psi隐私求交失败 #197

zx007nice opened this issue Jan 14, 2025 · 17 comments

Comments

@zx007nice
Copy link

Issue Type

Install/Deploy

Have you searched for existing documents and issues?

Yes

OS Platform and Distribution

linux centos7

All_in_one Version

最新版

Kuscia Version

0.13.0b0

What happend and What you expected to happen.

同一台机器下部署SecretPad All In One 两个机构
执行pis失败

Log output.

需要查看哪些日志
@Chrisdehe
Copy link
Member

先看一下运行日志有没有定位错误的信息(点击失败的组件,左下角有平台日志)

@zx007nice
Copy link
Author

先看一下运行日志有没有定位错误的信息(点击失败的组件,左下角有平台日志)

image
没有

@Chrisdehe
Copy link
Member

1、你的配置参数截图贴一下
2、进kuscia容器里,根据任务id找下任务执行日志,路径在home/kuscia/var/stdout/任务id/pod/secretflow/0.log
不清楚可以参考引擎日志信息

@zx007nice
Copy link
Author

1、你的配置参数截图贴一下 2、进kuscia容器里,根据任务id找下任务执行日志,路径在home/kuscia/var/stdout/任务id/pod/secretflow/0.log 不清楚可以参考引擎日志信息

image

1736845910222
pods下没有任何东西
image

@Chrisdehe
Copy link
Member

看起来任务没有正常运行起来
辛苦参考一下作业运行失败来排查一下日志后贴出
具体排查:

  1. 作业信息
  2. 任务Pod详细信息
  3. kuscia docker logs

@zx007nice
Copy link
Author

alice:

bash-5.2# cat 0.log
2025-01-14T17:36:12.538339155+08:00 stdout F 17:36:12.528 [main] INFO o.s.d.s.DataProxyServerApplication - Starting DataProxyFlightServer
2025-01-14T17:36:12.573297775+08:00 stdout F 17:36:12.571 [main] INFO o.s.d.p.o.c.EnvironmentOdpsFlightConfigLoader - Load odps flight config from system env.
2025-01-14T17:36:12.577563824+08:00 stdout F 17:36:12.573 [main] INFO o.s.d.c.config.FlightServerContext - load config: SERVICE_HOST=10.88.0.2
2025-01-14T17:36:12.577597413+08:00 stdout F 17:36:12.575 [main] INFO o.s.d.c.config.FlightServerContext - load config: FLIGHT_ENDPOINT_ODPS_MAX=1
2025-01-14T17:36:12.577603996+08:00 stdout F 17:36:12.575 [main] INFO o.s.d.c.config.FlightServerContext - load config: FLIGHT_ENDPOINT_ODPS_UPGRADE_THRESHOLD=1000000
2025-01-14T17:36:12.577608992+08:00 stdout F 17:36:12.575 [main] INFO o.s.d.c.config.FlightServerContext - load config: SERVICE_PORT=8023
2025-01-14T17:36:12.598356613+08:00 stdout F 17:36:12.596 [main] INFO o.apache.arrow.memory.BaseAllocator - Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true.
2025-01-14T17:36:12.601246515+08:00 stdout F 17:36:12.599 [main] INFO o.a.a.m.DefaultAllocationManagerOption - allocation manager type not specified, using netty as the default type
2025-01-14T17:36:12.603288535+08:00 stdout F 17:36:12.602 [main] INFO o.apache.arrow.memory.CheckAllocator - Using DefaultAllocationManager at memory-netty-18.0.0.jar!/org/apache/arrow/memory/netty/DefaultAllocationManagerFactory.class
2025-01-14T17:36:12.940475938+08:00 stdout F 17:36:12.939 [main] INFO o.s.d.server.DataProxyFlightServer - ProducerRegistry register: odps
2025-01-14T17:36:13.396524662+08:00 stdout F 17:36:13.394 [main] INFO o.s.d.server.DataProxyFlightServer - 10.88.0.2
2025-01-14T17:36:13.396551398+08:00 stdout F 17:36:13.395 [main] INFO o.s.d.s.DataProxyServerApplication - Data proxy flight server start at 10.88.0.2:8023

bob:

bash-5.2# cat 0.log
2025-01-14T17:36:14.318307041+08:00 stdout F 17:36:14.305 [main] INFO o.s.d.s.DataProxyServerApplication - Starting DataProxyFlightServer
2025-01-14T17:36:14.368193718+08:00 stdout F 17:36:14.362 [main] INFO o.s.d.p.o.c.EnvironmentOdpsFlightConfigLoader - Load odps flight config from system env.
2025-01-14T17:36:14.378244978+08:00 stdout F 17:36:14.364 [main] INFO o.s.d.c.config.FlightServerContext - load config: SERVICE_HOST=10.88.0.2
2025-01-14T17:36:14.378265936+08:00 stdout F 17:36:14.373 [main] INFO o.s.d.c.config.FlightServerContext - load config: FLIGHT_ENDPOINT_ODPS_MAX=1
2025-01-14T17:36:14.37827173+08:00 stdout F 17:36:14.374 [main] INFO o.s.d.c.config.FlightServerContext - load config: FLIGHT_ENDPOINT_ODPS_UPGRADE_THRESHOLD=1000000
2025-01-14T17:36:14.378276855+08:00 stdout F 17:36:14.374 [main] INFO o.s.d.c.config.FlightServerContext - load config: SERVICE_PORT=8023
2025-01-14T17:36:14.385295581+08:00 stdout F 17:36:14.384 [main] INFO o.apache.arrow.memory.BaseAllocator - Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true.
2025-01-14T17:36:14.389268478+08:00 stdout F 17:36:14.387 [main] INFO o.a.a.m.DefaultAllocationManagerOption - allocation manager type not specified, using netty as the default type
2025-01-14T17:36:14.410287123+08:00 stdout F 17:36:14.408 [main] INFO o.apache.arrow.memory.CheckAllocator - Using DefaultAllocationManager at memory-netty-18.0.0.jar!/org/apache/arrow/memory/netty/DefaultAllocationManagerFactory.class
2025-01-14T17:36:15.023372643+08:00 stdout F 17:36:15.022 [main] INFO o.s.d.server.DataProxyFlightServer - ProducerRegistry register: odps
2025-01-14T17:36:16.666140764+08:00 stdout F 17:36:16.665 [main] INFO o.s.d.server.DataProxyFlightServer - 10.88.0.2
2025-01-14T17:36:16.66734297+08:00 stdout F 17:36:16.665 [main] INFO o.s.d.s.DataProxyServerApplication - Data proxy flight server start at 10.88.0.2:8023

@Chrisdehe
Copy link
Member

看起来两个节点都使用了同一个端口号“8023“吗?
请知悉,双方的端口不能重复和占用。

@zx007nice
Copy link
Author

看起来两个节点都使用了同一个端口号“8023“吗? 请知悉,双方的端口不能重复和占用。

image
容器内部看应该都是8083

@zx007nice
Copy link
Author

bash-5.2# kubectl get kt onwh-efxqkvcy-node-3 -n cross-domain -o yaml
apiVersion: kuscia.secretflow/v1alpha1
kind: KusciaTask
metadata:
annotations:
kuscia.secretflow/initiator: alice
kuscia.secretflow/interconn-bfia-parties: ""
kuscia.secretflow/interconn-kuscia-parties: bob
kuscia.secretflow/interconn-self-parties: bob
kuscia.secretflow/job-id: onwh
kuscia.secretflow/party-master-domain: bob
kuscia.secretflow/self-cluster-as-initiator: "false"
kuscia.secretflow/self-cluster-as-participant: "true"
kuscia.secretflow/task-alias: onwh-efxqkvcy-node-3
creationTimestamp: "2025-01-14T09:48:38Z"
generation: 1
labels:
kuscia.secretflow/controller: kuscia-job
kuscia.secretflow/job-uid: 578a3378-72b3-48c4-8d5c-3f27da5f8d4f
name: onwh-efxqkvcy-node-3
namespace: cross-domain
ownerReferences:

  • apiVersion: kuscia.secretflow/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: KusciaJob
    name: onwh
    uid: 578a3378-72b3-48c4-8d5c-3f27da5f8d4f
    resourceVersion: "5429"
    uid: 8bb32f7e-abc7-4cb1-9609-e417ca873eb4
    spec:
    initiator: alice
    parties:
  • appImageRef: secretflow-image
    domainID: bob
    template:
    spec: {}
  • appImageRef: secretflow-image
    domainID: alice
    template:
    spec: {}
    scheduleConfig: {}
    taskInputConfig: |-
    {
    "sf_datasource_config": {
    "bob": {
    "id": "default-data-source"
    },
    "alice": {
    "id": "default-data-source"
    }
    },
    "sf_cluster_desc": {
    "parties": ["bob", "alice"],
    "devices": [{
    "name": "spu",
    "type": "spu",
    "parties": ["bob", "alice"],
    "config": "{"runtime_config":{"protocol":"SEMI2K","field":"FM128"},"link_desc":{"connect_retry_times":60,"connect_retry_interval_ms":1000,"brpc_channel_protocol":"http","brpc_channel_connection_type":"pooled","recv_timeout_ms":1200000,"http_timeout_ms":1200000}}"
    }, {
    "name": "heu",
    "type": "heu",
    "parties": ["bob", "alice"],
    "config": "{"mode": "PHEU", "schema": "paillier", "key_size": 2048}"
    }],
    "ray_fed_config": {
    "cross_silo_comm_backend": "brpc_link"
    }
    },
    "sf_node_eval_param": {
    "domain": "data_prep",
    "name": "psi",
    "version": "1.0.0",
    "attr_paths": ["input/input_ds1/keys", "input/input_ds2/keys", "protocol", "sort_result", "receiver_parties", "allow_empty_result", "join_type", "input_ds1_keys_duplicated", "input_ds2_keys_duplicated"],
    "attrs": [{
    "is_na": false,
    "ss": ["id"]
    }, {
    "is_na": false,
    "ss": ["id"]
    }, {
    "is_na": false,
    "s": "PROTOCOL_RR22"
    }, {
    "b": true,
    "is_na": false
    }, {
    "is_na": false,
    "ss": ["alice", "bob"]
    }, {
    "b": true,
    "is_na": false
    }, {
    "is_na": false,
    "s": "inner_join"
    }, {
    "b": true,
    "is_na": false
    }, {
    "b": true,
    "is_na": false
    }],
    "inputs": [{
    "type": "sf.table.individual",
    "meta": {
    "@type": "type.googleapis.com/secretflow.spec.v1.IndividualTable",
    "line_count": "-1"
    },
    "data_refs": [{
    "uri": "alice-test01_300839225.csv",
    "party": "alice",
    "format": "csv"
    }]
    }, {
    "type": "sf.table.individual",
    "meta": {
    "@type": "type.googleapis.com/secretflow.spec.v1.IndividualTable",
    "line_count": "-1"
    },
    "data_refs": [{
    "uri": "bob-test01_205533564.csv",
    "party": "bob",
    "format": "csv"
    }]
    }],
    "checkpoint_uri": "ckonwh-efxqkvcy-node-3-output-0"
    },
    "sf_output_uris": ["onwh_efxqkvcy_node_3_output_0", "onwh_efxqkvcy_node_3_output_1"],
    "sf_input_ids": ["mbnubgix", "dgswxart"],
    "sf_input_partitions_spec": ["", ""],
    "sf_output_ids": ["onwh-efxqkvcy-node-3-output-0", "onwh-efxqkvcy-node-3-output-1"],
    "table_attrs": [{
    "table_id": "mbnubgix",
    "column_attrs": [{
    "col_name": "id",
    "col_type": "feature"
    }, {
    "col_name": "name",
    "col_type": "feature"
    }, {
    "col_name": "age",
    "col_type": "feature"
    }]
    }, {
    "table_id": "dgswxart",
    "column_attrs": [{
    "col_name": "id",
    "col_type": "feature"
    }, {
    "col_name": "name",
    "col_type": "feature"
    }, {
    "col_name": "age",
    "col_type": "feature"
    }]
    }]
    }
    status:
    allocatedPorts:
  • domainID: bob
    namedPort:
    onwh-efxqkvcy-node-3-0/client-server: 31491
    onwh-efxqkvcy-node-3-0/fed: 31494
    onwh-efxqkvcy-node-3-0/global: 31488
    onwh-efxqkvcy-node-3-0/inference: 31492
    onwh-efxqkvcy-node-3-0/node-manager: 31489
    onwh-efxqkvcy-node-3-0/object-manager: 31490
    onwh-efxqkvcy-node-3-0/spu: 31493
    completionTime: "2025-01-14T09:53:39Z"
    conditions:
  • lastTransitionTime: "2025-01-14T09:48:38Z"
    status: "True"
    type: PortsAllocated
  • lastTransitionTime: "2025-01-14T09:48:38Z"
    status: "True"
    type: ResourceCreated
    lastReconcileTime: "2025-01-14T09:53:38Z"
    message: The remaining no-failed party task counts 1 are less than the task success
    threshold 2. failed party[alice]
    partyTaskStatus:
  • domainID: bob
    phase: Failed
  • domainID: alice
    phase: Failed
    phase: Failed
    podStatuses:
    bob/onwh-efxqkvcy-node-3-0:
    createTime: "2025-01-14T09:48:38Z"
    message: '0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure:
    }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.,
    reject the pod onwh-efxqkvcy-node-3-0 even after PostFilter.'
    namespace: bob
    podName: onwh-efxqkvcy-node-3-0
    podPhase: Failed
    reason: Unschedulable
    serviceStatuses:
    bob/onwh-efxqkvcy-node-3-0-fed:
    createTime: "2025-01-14T09:48:38Z"
    namespace: bob
    portName: fed
    portNumber: 31494
    scope: Cluster
    serviceName: onwh-efxqkvcy-node-3-0-fed
    bob/onwh-efxqkvcy-node-3-0-global:
    createTime: "2025-01-14T09:48:38Z"
    namespace: bob
    portName: global
    portNumber: 31488
    scope: Domain
    serviceName: onwh-efxqkvcy-node-3-0-global
    bob/onwh-efxqkvcy-node-3-0-inference:
    createTime: "2025-01-14T09:48:38Z"
    namespace: bob
    portName: inference
    portNumber: 31492
    scope: Cluster
    serviceName: onwh-efxqkvcy-node-3-0-inference
    bob/onwh-efxqkvcy-node-3-0-spu:
    createTime: "2025-01-14T09:48:38Z"
    namespace: bob
    portName: spu
    portNumber: 31493
    scope: Cluster
    serviceName: onwh-efxqkvcy-node-3-0-spu
    startTime: "2025-01-14T09:48:38Z"

资源不足导致的吗

@zx007nice
Copy link
Author

[root@localhost secretflow-allinone-package]# df -h
文件系统 容量 已用 可用 已用% 挂载点
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 44K 16G 1% /dev/shm
tmpfs 16G 35M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/mapper/centos-root 94G 93G 1.5G 99% /
/dev/sda1 1014M 238M 777M 24% /boot
tmpfs 3.2G 0 3.2G 0% /run/user/1002
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/3051b216e2faaca213577486aaabc227458c5fcb5d85e4bd99389179d422337c/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/b81faf0a8861efb9076e86b10d0c5c6457f2deb6138a40e09b0ba3f2f7e41b98/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/b424024ba3a7693916a992204faff9d5c04c271f4a7dbc9ddbfba7458ed3de77/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/f64764ed659fa4e0c2cec182a8cddd2e500237195a1dbbb4f186261ba9e2789e/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/d4c0ef6ffc85f115cb0a1f0d28ae9712eb460e9f557c34f0c10d77e164df891f/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/9d9487f8f7f5ad5425bdfe046e172baca6e920e0dfe8abb0af396ae072168672/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/cfb161e196dc8b392d2b32fc145f43991431a7ed5d0cd9ff27066d537ba2b212/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/4c11dea249a3bb6bd9c13a063557ed286350252300a964d844cceb1d5c8e3cb5/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/a35d0bc37522d0b688e70423e2783dc019e19cbd9d3332e9e70250979a784ba2/merged
tmpfs 3.2G 0 3.2G 0% /run/user/0
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/1db86a2e0bfe973fba7ecdfc20e9c4b1708b72f2800fa4fd28a09677b8cd50ed/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/b3740621726f079e428219f1a8d0f0d634627a3dd1b7dd1d0a61c4b761449f21/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/0b317e4f27287752faef7e177e2ff7a01bf0ca12bdb488e9894748b1832f2547/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/c5e554e60c63a6818a4f605d063e2f07d14a8e2116193d023eb30057b9d6bd82/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/82d4abddb8c528fbe0ce2b3bb277d893dc7e2a208234349a1d0d933b84f3e164/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/cbea8dd5d37fc9a5adf761c2d29c1a0fc3974f985f57d18f0d5ede04a2598f11/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/f7a953e966361b577c95aa5a1f35bc6e3840fabc0f4feb37e9b83fcf4910e1e6/merged
overlay 94G 93G 1.5G 99% /var/lib/docker/overlay2/ddbe0b0a221d67f4429609cd2e632e3402030cd400134626bd65cde23f0ef6c5/merged
[root@localhost secretflow-allinone-package]#

@Chrisdehe
Copy link
Member

是的,需要清理一下磁盘到90%以下

@zx007nice
Copy link
Author

是的,需要清理一下磁盘到90%以下

好的,非常感谢你的帮助

@zx007nice
Copy link
Author

是的,需要清理一下磁盘到90%以下

您好,磁盘清理到百分之90一下了,执行任务还是失败。
image

这是alice方的任务详细信息:
`bash-5.2# kubectl get kt -n cross-domain
NAME STARTTIME COMPLETIONTIME LASTRECONCILETIME PHASE
iocb-xwruliuc-node-3 16h 16h 16h Failed
oqvo-xwruliuc-node-3 16h 16h 16h Failed
jxhz-uzpvyqbu-node-3 15h 15h 15h Failed
kznl-uzpvyqbu-node-3 15h 15h 15h Failed
hcnd-uzpvyqbu-node-3 21m 21m 21m Failed
bash-5.2# kubectl get kt hcnd-uzpvyqbu-node-3 -n cross-domain -o yaml
apiVersion: kuscia.secretflow/v1alpha1
kind: KusciaTask
metadata:
annotations:
kuscia.secretflow/initiator: bob
kuscia.secretflow/interconn-bfia-parties: ""
kuscia.secretflow/interconn-kuscia-parties: alice
kuscia.secretflow/interconn-self-parties: alice
kuscia.secretflow/job-id: hcnd
kuscia.secretflow/party-master-domain: alice
kuscia.secretflow/self-cluster-as-initiator: "false"
kuscia.secretflow/self-cluster-as-participant: "true"
kuscia.secretflow/task-alias: hcnd-uzpvyqbu-node-3
creationTimestamp: "2025-01-16T01:31:49Z"
generation: 1
labels:
kuscia.secretflow/controller: kuscia-job
kuscia.secretflow/job-uid: 23a548c0-72dc-43ff-b54b-4addd8c680e0
name: hcnd-uzpvyqbu-node-3
namespace: cross-domain
ownerReferences:

  • apiVersion: kuscia.secretflow/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: KusciaJob
    name: hcnd
    uid: 23a548c0-72dc-43ff-b54b-4addd8c680e0
    resourceVersion: "99833"
    uid: 7d5ed72a-7c9c-455f-9ef9-19def76ddcbf
    spec:
    initiator: bob
    parties:
  • appImageRef: secretflow-image
    domainID: bob
    template:
    spec: {}
  • appImageRef: secretflow-image
    domainID: alice
    template:
    spec: {}
    scheduleConfig: {}
    taskInputConfig: |-
    {
    "sf_datasource_config": {
    "bob": {
    "id": "default-data-source"
    },
    "alice": {
    "id": "default-data-source"
    }
    },
    "sf_cluster_desc": {
    "parties": ["bob", "alice"],
    "devices": [{
    "name": "spu",
    "type": "spu",
    "parties": ["bob", "alice"],
    "config": "{"runtime_config":{"protocol":"SEMI2K","field":"FM128"},"link_desc":{"connect_retry_times":60,"connect_retry_interval_ms":1000,"brpc_channel_protocol":"http","brpc_channel_connection_type":"pooled","recv_timeout_ms":1200000,"http_timeout_ms":1200000}}"
    }, {
    "name": "heu",
    "type": "heu",
    "parties": ["bob", "alice"],
    "config": "{"mode": "PHEU", "schema": "paillier", "key_size": 2048}"
    }],
    "ray_fed_config": {
    "cross_silo_comm_backend": "brpc_link"
    }
    },
    "sf_node_eval_param": {
    "domain": "data_prep",
    "name": "psi",
    "version": "1.0.0",
    "attr_paths": ["input/input_ds1/keys", "input/input_ds2/keys", "protocol", "sort_result", "receiver_parties", "allow_empty_result", "join_type", "input_ds1_keys_duplicated", "input_ds2_keys_duplicated"],
    "attrs": [{
    "is_na": false,
    "ss": ["id"]
    }, {
    "is_na": false,
    "ss": ["id"]
    }, {
    "is_na": false,
    "s": "PROTOCOL_RR22"
    }, {
    "b": true,
    "is_na": false
    }, {
    "is_na": false,
    "ss": ["bob", "alice"]
    }, {
    "b": true,
    "is_na": false
    }, {
    "is_na": false,
    "s": "inner_join"
    }, {
    "b": true,
    "is_na": false
    }, {
    "b": true,
    "is_na": false
    }],
    "inputs": [{
    "type": "sf.table.individual",
    "meta": {
    "@type": "type.googleapis.com/secretflow.spec.v1.IndividualTable",
    "line_count": "-1"
    },
    "data_refs": [{
    "uri": "test1_2146716456.csv",
    "party": "alice",
    "format": "csv"
    }]
    }, {
    "type": "sf.table.individual",
    "meta": {
    "@type": "type.googleapis.com/secretflow.spec.v1.IndividualTable",
    "line_count": "-1"
    },
    "data_refs": [{
    "uri": "test2_2086794063.csv",
    "party": "bob",
    "format": "csv"
    }]
    }],
    "checkpoint_uri": "ckhcnd-uzpvyqbu-node-3-output-0"
    },
    "sf_output_uris": ["hcnd_uzpvyqbu_node_3_output_0", "hcnd_uzpvyqbu_node_3_output_1"],
    "sf_input_ids": ["fjkvkmle", "sadqangh"],
    "sf_input_partitions_spec": ["", ""],
    "sf_output_ids": ["hcnd-uzpvyqbu-node-3-output-0", "hcnd-uzpvyqbu-node-3-output-1"],
    "table_attrs": [{
    "table_id": "sadqangh",
    "column_attrs": [{
    "col_name": "id",
    "col_type": "feature"
    }, {
    "col_name": "studen_name",
    "col_type": "feature"
    }, {
    "col_name": "studen_age",
    "col_type": "feature"
    }, {
    "col_name": "age",
    "col_type": "feature"
    }]
    }, {
    "table_id": "fjkvkmle",
    "column_attrs": [{
    "col_name": "id",
    "col_type": "feature"
    }, {
    "col_name": "name",
    "col_type": "feature"
    }, {
    "col_name": "age",
    "col_type": "feature"
    }]
    }]
    }
    status:
    allocatedPorts:
  • domainID: alice
    namedPort:
    hcnd-uzpvyqbu-node-3-0/client-server: 21859
    hcnd-uzpvyqbu-node-3-0/fed: 21862
    hcnd-uzpvyqbu-node-3-0/global: 21863
    hcnd-uzpvyqbu-node-3-0/inference: 21860
    hcnd-uzpvyqbu-node-3-0/node-manager: 21857
    hcnd-uzpvyqbu-node-3-0/object-manager: 21858
    hcnd-uzpvyqbu-node-3-0/spu: 21861
    completionTime: "2025-01-16T01:32:02Z"
    conditions:
  • lastTransitionTime: "2025-01-16T01:31:49Z"
    status: "True"
    type: PortsAllocated
  • lastTransitionTime: "2025-01-16T01:31:49Z"
    status: "True"
    type: ResourceCreated
  • lastTransitionTime: "2025-01-16T01:31:54Z"
    status: "True"
    type: Running
  • lastTransitionTime: "2025-01-16T01:32:01Z"
    status: "False"
    type: Success
    lastReconcileTime: "2025-01-16T01:32:01Z"
    message: The remaining no-failed party task counts 1 are less than the task success
    threshold 2. pending party[], running party[bob], successful party[], failed party[alice]
    partyTaskStatus:
  • domainID: bob
    phase: Failed
  • domainID: alice
    phase: Failed
    phase: Failed
    podStatuses:
    alice/hcnd-uzpvyqbu-node-3-0:
    createTime: "2025-01-16T01:31:49Z"
    namespace: alice
    nodeName: root-kuscia-autonomy-alice-localhost-localdomain
    podName: hcnd-uzpvyqbu-node-3-0
    podPhase: Failed
    readyTime: "2025-01-16T01:31:54Z"
    reason: Error
    startTime: "2025-01-16T01:31:52Z"
    terminationLog: 'container[secretflow] terminated state reason "Error", message:
    " config: "{\"runtime_config\":{\"protocol\":\"SEMI2K\",\"field\":\"FM128\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"\n }\n devices
    {\n name: "heu"\n type: "heu"\n parties: "bob"\n parties:
    "alice"\n config: "{\"mode\": \"PHEU\", \"schema\": \"paillier\",
    \"key_size\": 2048}"\n }\n ray_fed_config {\n cross_silo_comm_backend:
    "brpc_link"\n }\n}\npublic_config {\n ray_fed_config {\n parties: "bob"\n parties:
    "alice"\n addresses: "hcnd-uzpvyqbu-node-3-0-fed.bob.svc:80"\n addresses:
    "0.0.0.0:21862"\n }\n spu_configs {\n name: "spu"\n parties: "bob"\n parties:
    "alice"\n addresses: "http://hcnd-uzpvyqbu-node-3-0-spu.bob.svc:80\"\n addresses:
    "0.0.0.0:21861"\n }\n inference_config {\n parties: "bob"\n parties:
    "alice"\n addresses: "http://hcnd-uzpvyqbu-node-3-0-inference.bob.svc\"\n addresses:
    "0.0.0.0:21860"\n }\n webhook_config {\n progress_url: "http://reporter.master.svc/report/progress?task_id=hcnd-uzpvyqbu-node-3\"\n }\n}\nprivate_config
    {\n self_party: "alice"\n ray_head_addr: "hcnd-uzpvyqbu-node-3-0-global.alice.svc:21863"\n}\n\n--\n\n2025-01-16
    01:31:58,285|alice|INFO|secretflow|entry.py:comp_eval:56| \n--\nsystem_info
    \n\nuname_result(system=''Linux'', node=''hcnd-uzpvyqbu-node-3-0'', release=''3.10.0-1160.119.1.el7.x86_64'',
    version=''init repo #1 SMP Tue Jun 4 14:43:51 UTC 2024'', machine=''x86_64'')\n--\n\n2025-01-16
    01:31:58,285|alice|INFO|secretflow|driver.py:init:494| Try init sf in PRODUCTION
    mode\n2025-01-16 01:31:58,285|alice|INFO|secretflow|op_context.py:set_distribution_mode:73|
    set distribution mode to DISTRIBUTION_MODE.PRODUCTION\n2025-01-16 01:31:58.285
    INFO brpc_link.py:101 [alice] -- brpc options: {''proxy_max_restarts'': 3,
    ''timeout_in_ms'': 300000, ''recv_timeout_ms'': 604800000, ''connect_retry_times'':
    3600, ''connect_retry_interval_ms'': 1000, ''brpc_channel_protocol'': ''http'',
    ''brpc_channel_connection_type'': ''pooled''}\nI0116 01:31:58.308751 7 0
    external/com_github_brpc_brpc/src/brpc/server.cpp:1204] Server[yacl::link::transport::internal::ReceiverServiceImpl]
    is serving on port=21862.\nW0116 01:31:58.308864 7 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1210]
    Builtin services are disabled according to ServerOptions.has_builtin_services\n2025-01-16
    01:31:59.305 INFO brpc_link.py:127 [alice] -- Succeeded to listen on 0.0.0.0:21862.\n2025-01-16
    01:31:59.367 [warning] [openssl_factory.cc:OpensslDrbg:83] Yacl has been configured
    to use Yacl''s entropy source, but unable to find one. Fallback to use openssl''s
    default entropy srouce\n2025-01-16 01:31:59.372 [warning] [openssl_factory.cc:OpensslDrbg:83]
    Yacl has been configured to use Yacl''s entropy source, but unable to find
    one. Fallback to use openssl''s default entropy srouce\n[667.154] perfetto.cc:45899
    Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total
    buffer size:1024 KB, total sessions:1, uid:0 session name: ""\n[2025-01-16
    01:31:59.419] [info] [launch.cc:115] PSI config: {"protocol_config":{"protocol":"PROTOCOL_RR22","role":"ROLE_RECEIVER","broadcast_result":true},"input_config":{"type":"IO_TYPE_FILE_CSV","path":"/tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/test1_2146716456.csv"},"output_config":{"type":"IO_TYPE_FILE_CSV","path":"/tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/82b00b4d-7d3d-493f-a317-2e982d906026_output.csv"},"keys":["id"],"skip_duplicates_check":true,"advanced_join_type":"ADVANCED_JOIN_TYPE_INNER_JOIN","left_side":"ROLE_RECEIVER","input_attr":{},"output_attr":{"csv_null_rep":"82b00b4d-7d3d-493f-a317-2e982d906026"}}\n[2025-01-16
    01:31:59.420] [info] [receiver.cc:45] [Rr22PsiReceiver::Init] start\n[2025-01-16
    01:31:59.420] [info] [interface.cc:69] [AbstractPsiParty::Init] start\n[2025-01-16
    01:31:59.430] [warning] [openssl_factory.cc:83] Yacl has been configured to
    use Yacl''s entropy source, but unable to find one. Fallback to use openssl''s
    default entropy srouce\n[2025-01-16 01:31:59.437] [warning] [openssl_factory.cc:83]
    Yacl has been configured to use Yacl''s entropy source, but unable to find
    one. Fallback to use openssl''s default entropy srouce\n[2025-01-16 01:31:59.438]
    [info] [resource_manager.cc:24] create path: /tmp/DJATDeAbDMC2CaB-\n[2025-01-16
    01:31:59.438] [info] [interface.cc:83] [AbstractPsiParty::Init][Check csv
    pre-process] start\n[2025-01-16 01:31:59.438] [info] [table_utils.cc:206]
    Init table with file: /tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/test1_2146716456.csv,
    size: 36, format: csv\n[2025-01-16 01:31:59.443] [info] [table_utils.cc:209]
    table header: "id","name","age"\n[2025-01-16 01:31:59.444] [info] [key.cc:92]
    Executing sort scripts: tail -n +2 /tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/test1_2146716456.csv
    | LC_ALL=C sort --parallel=4 --buffer-size=1G --stable --field-separator=,
    --key=1,1 >>/tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv\n[2025-01-16
    01:31:59.519] [info] [key.cc:94] Finished sort scripts: tail -n +2 /tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/test1_2146716456.csv
    | LC_ALL=C sort --parallel=4 --buffer-size=1G --stable --field-separator=,
    --key=1,1 >>/tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv,
    ret=0\n[2025-01-16 01:31:59.520] [info] [table_utils.cc:206] Init table with
    file: /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv,
    size: 36, format: csv\n[2025-01-16 01:31:59.521] [info] [table_utils.cc:209]
    table header: "id","name","age"\n[2025-01-16 01:31:59.521] [info] [arrow_csv_batch_provider.cc:76]
    Reach the end of csv file /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv.\n[2025-01-16
    01:31:59.521] [info] [arrow_csv_batch_provider.cc:76] Reach the end of csv
    file /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv.\n[2025-01-16
    01:31:59.522] [info] [table_utils.cc:206] Init table with file: /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv,
    size: 67, format: csv\n[2025-01-16 01:31:59.523] [info] [table_utils.cc:209]
    table header: "psi_joined_key","psi_start_index","psi_dup_cnt"\n[2025-01-16
    01:31:59.523] [info] [interface.cc:92] [AbstractPsiParty::Init][Check csv
    pre-process] end\n[2025-01-16 01:31:59.524] [info] [interface.cc:142] [AbstractPsiParty::Init]
    end\n[2025-01-16 01:31:59.525] [info] [receiver.cc:49] [Rr22PsiReceiver::Init]
    end\n[2025-01-16 01:31:59.525] [info] [receiver.cc:54] [Rr22PsiReceiver::PreProcess]
    start\n[2025-01-16 01:31:59.536] [info] [bucket_psi.cc:492] psi protocol=3,
    rank=0 item_size=3\n[2025-01-16 01:31:59.536] [info] [bucket_psi.cc:492] psi
    protocol=3, rank=1 item_size=3\n[2025-01-16 01:31:59.536] [info] [hash_bucket_cache.cc:35]
    target dir=/tmp/DJATDeAbDMC2CaB-/input_bucket_store does not exists, create
    it\n[2025-01-16 01:31:59.548] [info] [multiplex_disk_cache.cc:61] MultiplexDiskCache:
    dir_prefix=/tmp/DJATDeAbDMC2CaB-/input_bucket_store/AADYDOCUDBBJAeBz/\n[2025-01-16
    01:31:59.549] [info] [table_utils.cc:91] reach end of stream of /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv,
    3 lines\n[2025-01-16 01:31:59.549] [info] [table_utils.cc:120] read 3 lines
    from /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv\n[2025-01-16
    01:31:59.549] [info] [table_utils.cc:91] reach end of stream of /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv,
    0 lines\n[2025-01-16 01:31:59.549] [info] [table_utils.cc:120] read 0 lines
    from /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv\n[2025-01-16
    01:31:59.549] [info] [receiver.cc:87] [Rr22PsiReceiver::PreProcess] end\n[2025-01-16
    01:31:59.558] [info] [receiver.cc:92] [Rr22PsiReceiver::Online] start\n[2025-01-16
    01:31:59.559] [info] [rr22_psi.cc:130] mask size: 6\n[2025-01-16 01:31:59.559]
    [info] [thread_pool.cc:30] Create a fixed thread pool with size 3\n[2025-01-16
    01:31:59.560] [info] [rr22_oprf.cc:413] baxos_size:28\n[2025-01-16 01:31:59.560]
    [info] [rr22_oprf.cc:421] begin vole recv\n"'
    serviceStatuses:
    alice/hcnd-uzpvyqbu-node-3-0-fed:
    createTime: "2025-01-16T01:31:50Z"
    namespace: alice
    portName: fed
    portNumber: 21862
    readyTime: "2025-01-16T01:31:54Z"
    scope: Cluster
    serviceName: hcnd-uzpvyqbu-node-3-0-fed
    alice/hcnd-uzpvyqbu-node-3-0-global:
    createTime: "2025-01-16T01:31:50Z"
    namespace: alice
    portName: global
    portNumber: 21863
    readyTime: "2025-01-16T01:31:54Z"
    scope: Domain
    serviceName: hcnd-uzpvyqbu-node-3-0-global
    alice/hcnd-uzpvyqbu-node-3-0-inference:
    createTime: "2025-01-16T01:31:49Z"
    namespace: alice
    portName: inference
    portNumber: 21860
    readyTime: "2025-01-16T01:31:54Z"
    scope: Cluster
    serviceName: hcnd-uzpvyqbu-node-3-0-inference
    alice/hcnd-uzpvyqbu-node-3-0-spu:
    createTime: "2025-01-16T01:31:50Z"
    namespace: alice
    portName: spu
    portNumber: 21861
    readyTime: "2025-01-16T01:31:54Z"
    scope: Cluster
    serviceName: hcnd-uzpvyqbu-node-3-0-spu
    startTime: "2025-01-16T01:31:49Z"
    `

alice方pod日志:
`bash-5.2# kubectl get pod hcnd-uzpvyqbu-node-3-0 -o yaml -n alice
apiVersion: v1
kind: Pod
metadata:
annotations:
kuscia.secretflow/config-template-value-cm-name: hcnd-uzpvyqbu-node-3-kuscia-gen-conf
kuscia.secretflow/config-template-volumes: config-template
kuscia.secretflow/initiator: bob
kuscia.secretflow/task-id: hcnd-uzpvyqbu-node-3
kuscia.secretflow/task-resource: hcnd-uzpvyqbu-node-3-4cb23096b5d6
kuscia.secretflow/task-resource-group: hcnd-uzpvyqbu-node-3
kuscia.secretflow/taskresource-reserving-timestamp: "2025-01-16T09:31:50+08:00"
creationTimestamp: "2025-01-16T01:31:49Z"
labels:
kuscia.secretflow/communication-role-client: "true"
kuscia.secretflow/communication-role-server: "true"
kuscia.secretflow/controller: kusciatask
kuscia.secretflow/pod-identity: 7d5ed72a-7c9c-455f-9ef9-19def76ddcbf-0
kuscia.secretflow/pod-role: ""
kuscia.secretflow/task-resource-group-uid: a9e904d8-764d-4779-b400-a69d3af65e3b
kuscia.secretflow/task-resource-uid: d355d613-2237-4dea-81c3-0c518ade9e27
kuscia.secretflow/task-uid: 7d5ed72a-7c9c-455f-9ef9-19def76ddcbf
name: hcnd-uzpvyqbu-node-3-0
namespace: alice
resourceVersion: "99819"
uid: 90626dbd-269c-4233-90c4-d85eebdd1f7f
spec:
automountServiceAccountToken: false
containers:

  • args:
    • -c
    • python -m secretflow.kuscia.entry ./kuscia/task-config.conf
      command:
    • sh
      env:
    • name: KUSCIA_PORT_CLIENT_SERVER_NUMBER
      value: "21859"
    • name: KUSCIA_PORT_INFERENCE_NUMBER
      value: "21860"
    • name: KUSCIA_PORT_SPU_NUMBER
      value: "21861"
    • name: KUSCIA_PORT_FED_NUMBER
      value: "21862"
    • name: KUSCIA_PORT_GLOBAL_NUMBER
      value: "21863"
    • name: KUSCIA_PORT_NODE_MANAGER_NUMBER
      value: "21857"
    • name: KUSCIA_PORT_OBJECT_MANAGER_NUMBER
      value: "21858"
      image: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/secretflow-lite-anolis8:1.11.0b1
      imagePullPolicy: IfNotPresent
      name: secretflow
      ports:
    • containerPort: 21861
      name: spu
      protocol: TCP
    • containerPort: 21862
      name: fed
      protocol: TCP
    • containerPort: 21863
      name: global
      protocol: TCP
    • containerPort: 21857
      name: node-manager
      protocol: TCP
    • containerPort: 21858
      name: object-manager
      protocol: TCP
    • containerPort: 21859
      name: client-server
      protocol: TCP
    • containerPort: 21860
      name: inference
      protocol: TCP
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: FallbackToLogsOnError
      volumeMounts:
    • mountPath: ./kuscia/task-config.conf
      name: config-template
      subPath: task-config.conf
      workingDir: /app
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      nodeName: root-kuscia-autonomy-alice-localhost-localdomain
      nodeSelector:
      kuscia.secretflow/namespace: alice
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Never
      schedulerName: kuscia-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 30
      tolerations:
  • effect: NoSchedule
    key: kuscia.secretflow/agent
    operator: Exists
  • effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  • effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
    volumes:
  • configMap:
    defaultMode: 420
    name: hcnd-uzpvyqbu-node-3-configtemplate
    name: config-template
    status:
    conditions:
  • lastProbeTime: null
    lastTransitionTime: "2025-01-16T01:31:52Z"
    status: "True"
    type: Initialized
  • lastProbeTime: null
    lastTransitionTime: "2025-01-16T01:32:01Z"
    reason: PodFailed
    status: "False"
    type: Ready
  • lastProbeTime: null
    lastTransitionTime: "2025-01-16T01:32:01Z"
    reason: PodFailed
    status: "False"
    type: ContainersReady
  • lastProbeTime: null
    lastTransitionTime: "2025-01-16T01:31:52Z"
    status: "True"
    type: PodScheduled
    containerStatuses:
  • containerID: containerd://bfbb08852b2b4dabd8674d51818c14916f11c13ad7e1b7e06b4b4d03d2b87238
    image: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/secretflow-lite-anolis8:1.11.0b1
    imageID: sha256:9bcb2e7a4264fae2d3c07d9e8f53e0405941ede4483222372aa2f39d936c9fda
    lastState: {}
    name: secretflow
    ready: false
    restartCount: 0
    started: false
    state:
    terminated:
    containerID: containerd://bfbb08852b2b4dabd8674d51818c14916f11c13ad7e1b7e06b4b4d03d2b87238
    exitCode: 132
    finishedAt: "2025-01-16T01:32:00Z"
    message: " config: "{\"runtime_config\":{\"protocol\":\"SEMI2K\",\"field\":\"FM128\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"\n
    \ }\n devices {\n name: "heu"\n type: "heu"\n parties: "bob"\n
    \ parties: "alice"\n config: "{\"mode\": \"PHEU\", \"schema\":
    \"paillier\", \"key_size\": 2048}"\n }\n ray_fed_config {\n cross_silo_comm_backend:
    "brpc_link"\n }\n}\npublic_config {\n ray_fed_config {\n parties:
    "bob"\n parties: "alice"\n addresses: "hcnd-uzpvyqbu-node-3-0-fed.bob.svc:80"\n
    \ addresses: "0.0.0.0:21862"\n }\n spu_configs {\n name: "spu"\n
    \ parties: "bob"\n parties: "alice"\n addresses: "http://hcnd-uzpvyqbu-node-3-0-spu.bob.svc:80\"\n
    \ addresses: "0.0.0.0:21861"\n }\n inference_config {\n parties:
    "bob"\n parties: "alice"\n addresses: "http://hcnd-uzpvyqbu-node-3-0-inference.bob.svc\"\n
    \ addresses: "0.0.0.0:21860"\n }\n webhook_config {\n progress_url:
    "http://reporter.master.svc/report/progress?task_id=hcnd-uzpvyqbu-node-3\"\n
    \ }\n}\nprivate_config {\n self_party: "alice"\n ray_head_addr: "hcnd-uzpvyqbu-node-3-0-global.alice.svc:21863"\n}\n\n--\n\n2025-01-16
    01:31:58,285|alice|INFO|secretflow|entry.py:comp_eval:56| \n--\nsystem_info
    \n\nuname_result(system='Linux', node='hcnd-uzpvyqbu-node-3-0', release='3.10.0-1160.119.1.el7.x86_64',
    version='init repo #1 SMP Tue Jun 4 14:43:51 UTC 2024', machine='x86_64')\n--\n\n2025-01-16
    01:31:58,285|alice|INFO|secretflow|driver.py:init:494| Try init sf in PRODUCTION
    mode\n2025-01-16 01:31:58,285|alice|INFO|secretflow|op_context.py:set_distribution_mode:73|
    set distribution mode to DISTRIBUTION_MODE.PRODUCTION\n2025-01-16 01:31:58.285
    INFO brpc_link.py:101 [alice] -- brpc options: {'proxy_max_restarts': 3,
    'timeout_in_ms': 300000, 'recv_timeout_ms': 604800000, 'connect_retry_times':
    3600, 'connect_retry_interval_ms': 1000, 'brpc_channel_protocol': 'http',
    'brpc_channel_connection_type': 'pooled'}\nI0116 01:31:58.308751 7 0
    external/com_github_brpc_brpc/src/brpc/server.cpp:1204] Server[yacl::link::transport::internal::ReceiverServiceImpl]
    is serving on port=21862.\nW0116 01:31:58.308864 7 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1210]
    Builtin services are disabled according to ServerOptions.has_builtin_services\n2025-01-16
    01:31:59.305 INFO brpc_link.py:127 [alice] -- Succeeded to listen on 0.0.0.0:21862.\n2025-01-16
    01:31:59.367 [warning] [openssl_factory.cc:OpensslDrbg:83] Yacl has been
    configured to use Yacl's entropy source, but unable to find one. Fallback
    to use openssl's default entropy srouce\n2025-01-16 01:31:59.372 [warning]
    [openssl_factory.cc:OpensslDrbg:83] Yacl has been configured to use Yacl's
    entropy source, but unable to find one. Fallback to use openssl's default
    entropy srouce\n[667.154] perfetto.cc:45899 Configured tracing session
    1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024 KB, total
    sessions:1, uid:0 session name: ""\n[2025-01-16 01:31:59.419] [info] [launch.cc:115]
    PSI config: {"protocol_config":{"protocol":"PROTOCOL_RR22","role":"ROLE_RECEIVER","broadcast_result":true},"input_config":{"type":"IO_TYPE_FILE_CSV","path":"/tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/test1_2146716456.csv"},"output_config":{"type":"IO_TYPE_FILE_CSV","path":"/tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/82b00b4d-7d3d-493f-a317-2e982d906026_output.csv"},"keys":["id"],"skip_duplicates_check":true,"advanced_join_type":"ADVANCED_JOIN_TYPE_INNER_JOIN","left_side":"ROLE_RECEIVER","input_attr":{},"output_attr":{"csv_null_rep":"82b00b4d-7d3d-493f-a317-2e982d906026"}}\n[2025-01-16
    01:31:59.420] [info] [receiver.cc:45] [Rr22PsiReceiver::Init] start\n[2025-01-16
    01:31:59.420] [info] [interface.cc:69] [AbstractPsiParty::Init] start\n[2025-01-16
    01:31:59.430] [warning] [openssl_factory.cc:83] Yacl has been configured
    to use Yacl's entropy source, but unable to find one. Fallback to use openssl's
    default entropy srouce\n[2025-01-16 01:31:59.437] [warning] [openssl_factory.cc:83]
    Yacl has been configured to use Yacl's entropy source, but unable to find
    one. Fallback to use openssl's default entropy srouce\n[2025-01-16 01:31:59.438]
    [info] [resource_manager.cc:24] create path: /tmp/DJATDeAbDMC2CaB-\n[2025-01-16
    01:31:59.438] [info] [interface.cc:83] [AbstractPsiParty::Init][Check csv
    pre-process] start\n[2025-01-16 01:31:59.438] [info] [table_utils.cc:206]
    Init table with file: /tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/test1_2146716456.csv,
    size: 36, format: csv\n[2025-01-16 01:31:59.443] [info] [table_utils.cc:209]
    table header: "id","name","age"\n[2025-01-16 01:31:59.444] [info]
    [key.cc:92] Executing sort scripts: tail -n +2 /tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/test1_2146716456.csv
    | LC_ALL=C sort --parallel=4 --buffer-size=1G --stable --field-separator=,
    --key=1,1 >>/tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv\n[2025-01-16
    01:31:59.519] [info] [key.cc:94] Finished sort scripts: tail -n +2 /tmp/sf_hcnd-uzpvyqbu-node-3_alice/82b00b4d-7d3d-493f-a317-2e982d906026/test1_2146716456.csv
    | LC_ALL=C sort --parallel=4 --buffer-size=1G --stable --field-separator=,
    --key=1,1 >>/tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv,
    ret=0\n[2025-01-16 01:31:59.520] [info] [table_utils.cc:206] Init table
    with file: /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv,
    size: 36, format: csv\n[2025-01-16 01:31:59.521] [info] [table_utils.cc:209]
    table header: "id","name","age"\n[2025-01-16 01:31:59.521] [info]
    [arrow_csv_batch_provider.cc:76] Reach the end of csv file /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv.\n[2025-01-16
    01:31:59.521] [info] [arrow_csv_batch_provider.cc:76] Reach the end of csv
    file /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input.csv.\n[2025-01-16
    01:31:59.522] [info] [table_utils.cc:206] Init table with file: /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv,
    size: 67, format: csv\n[2025-01-16 01:31:59.523] [info] [table_utils.cc:209]
    table header: "psi_joined_key","psi_start_index","psi_dup_cnt"\n[2025-01-16
    01:31:59.523] [info] [interface.cc:92] [AbstractPsiParty::Init][Check csv
    pre-process] end\n[2025-01-16 01:31:59.524] [info] [interface.cc:142] [AbstractPsiParty::Init]
    end\n[2025-01-16 01:31:59.525] [info] [receiver.cc:49] [Rr22PsiReceiver::Init]
    end\n[2025-01-16 01:31:59.525] [info] [receiver.cc:54] [Rr22PsiReceiver::PreProcess]
    start\n[2025-01-16 01:31:59.536] [info] [bucket_psi.cc:492] psi protocol=3,
    rank=0 item_size=3\n[2025-01-16 01:31:59.536] [info] [bucket_psi.cc:492]
    psi protocol=3, rank=1 item_size=3\n[2025-01-16 01:31:59.536] [info] [hash_bucket_cache.cc:35]
    target dir=/tmp/DJATDeAbDMC2CaB-/input_bucket_store does not exists, create
    it\n[2025-01-16 01:31:59.548] [info] [multiplex_disk_cache.cc:61] MultiplexDiskCache:
    dir_prefix=/tmp/DJATDeAbDMC2CaB-/input_bucket_store/AADYDOCUDBBJAeBz/\n[2025-01-16
    01:31:59.549] [info] [table_utils.cc:91] reach end of stream of /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv,
    3 lines\n[2025-01-16 01:31:59.549] [info] [table_utils.cc:120] read 3 lines
    from /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv\n[2025-01-16
    01:31:59.549] [info] [table_utils.cc:91] reach end of stream of /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv,
    0 lines\n[2025-01-16 01:31:59.549] [info] [table_utils.cc:120] read 0 lines
    from /tmp/DJATDeAbDMC2CaB-/receiver_DAAzBxCbCTAmALBT_join_sorted_input_key_info.csv\n[2025-01-16
    01:31:59.549] [info] [receiver.cc:87] [Rr22PsiReceiver::PreProcess] end\n[2025-01-16
    01:31:59.558] [info] [receiver.cc:92] [Rr22PsiReceiver::Online] start\n[2025-01-16
    01:31:59.559] [info] [rr22_psi.cc:130] mask size: 6\n[2025-01-16 01:31:59.559]
    [info] [thread_pool.cc:30] Create a fixed thread pool with size 3\n[2025-01-16
    01:31:59.560] [info] [rr22_oprf.cc:413] baxos_size:28\n[2025-01-16 01:31:59.560]
    [info] [rr22_oprf.cc:421] begin vole recv\n"
    reason: Error
    startedAt: "2025-01-16T01:31:54Z"
    hostIP: 192.168.32.2
    phase: Failed
    startTime: "2025-01-16T01:31:52Z"
    bash-5.2#
    `

@zx007nice
Copy link
Author

zx007nice commented Jan 16, 2025

是的,需要清理一下磁盘到90%以下

1736992631303
我发现我把协议改成ecdh就可以执行成功,其他协议不行

@Chrisdehe
Copy link
Member

@zx007nice
可以lscpu | grep Flags看下CPU情况,通过报错来看可能是你的CPU应该不支持AVX2指令集
目前当 CPU 不支持 AVX2指令集时,只有ECDH 协议对应曲线:25519、SM2、SECP256K1可以正常执行PSI;

@zx007nice
Copy link
Author

@zx007nice 可以lscpu | grep Flags看下CPU情况,通过报错来看可能是你的CPU应该不支持AVX2指令集 目前当 CPU 不支持 AVX2指令集时,只有ECDH 协议对应曲线:25519、SM2、SECP256K1可以正常执行PSI;

好的好的,感谢感谢

@zx007nice
Copy link
Author

可以将CPU型号留一下,我们做积累

[root@localhost ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
座: 4
NUMA 节点: 1
厂商 ID: GenuineIntel
CPU 系列: 6
型号: 45
型号名称: Intel(R) Xeon(R) CPU E5-2407 0 @ 2.20GHz
步进: 7
CPU MHz: 2194.711
BogoMIPS: 4389.42
超管理器厂商: VMware
虚拟化类型: 完全
L1d 缓存: 32K
L1i 缓存: 32K
L2 缓存: 256K
L3 缓存: 10240K
NUMA 节点0 CPU: 0-3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants