-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
control/discovery: add discovery controller #127
Conversation
Hi @optimistyzy, please help review. I have left some question that need to be discussed with you. I added 'Q: ' in comments, about version and IO command set. |
75e1777
to
a862a99
Compare
Is the review of the discovery controller still ongoing? |
I didn’t make it to the meeting this morning. I’ll ask next time, or see what I can find out via email.
…--- Scott
From: Yin Congmin ***@***.***>
Sent: Tuesday, May 23, 2023 8:31 AM
To: ceph/ceph-nvmeof ***@***.***>
Cc: Peterson, Scott D ***@***.***>; Mention ***@***.***>
Subject: Re: [ceph/ceph-nvmeof] [RFC]control/discovery: add discovery controller (PR #127)
Is the review of the discovery controller still ongoing?
—
Reply to this email directly, view it on GitHub<#127 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFA4HMUE6ZVDFSQSNSN5WGTXHTJ2RANCNFSM6AAAAAAX3CYFQU>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks overall good @CongMinYin, but I'm a bit concerned about maintainability/debuggability of this, since in my opinion this doesn't look not much more readable than a C implementation, as it's not fully leveraging Python syntactic sugar & libraries.
Additionally, networking code is usually the most vulnerable vector attack as it involves lots of parsing and branching that can potentially lead to exploits (e.g.: DDoS to the underlying Ceph cluster by performing mass queries).
In that sense, we should perhaps add some local TTL cache to the omap object in order to avoid that every discovery query results in a Ceph read.
a862a99
to
a18bb9c
Compare
Hi @epuertat , Thank you for your feedback and guidance. To be honest, I haven't written much Python code before. I made some simple modifications. Many places may still require more detailed discussions to arrive at a definitive decision. Perhaps we can incorporate some feature enhancements into subsequent patches, such as local TTL cache. Regarding the connection, there are some subsequent commitments to enhance it. I will discuss them under the corresponding comments. |
Caching the gateway group config state locally can definitely be deferred. It may turn out to be more trouble than it's worth even in the long run. |
Why don't we reuse the same mechanism we use for the GWs in a group to keep them up-to-date, i.e. watch/notify and polling. This way if there are no changes we don't read the OMAP over and over again. I believe all of the update code could be reused with little modification. |
a18bb9c
to
f37a278
Compare
Hi all, sorry for the late reply. I have spent a lot of time to made significant changes recently.
For some other things, I think it may be necessary to add some content for the |
f37a278
to
f440704
Compare
Sorry, yesterday's push was not the latest version. Now I am re pushing, which includes local cache content. |
f440704
to
f274bbf
Compare
Added |
f274bbf
to
e596bde
Compare
Made a small change. When trying to discover persistent connections in |
Hi, @epuertat , please take a look. |
ping |
Sorry @CongMinYin , I completely missed this one 😅 . Will have a look now ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring @CongMinYin. As long as this works as expected, I'm fine with this! 👍🏼
My comments just focused on maintenance and debuggability of this code. ctypes
might do a good job hiding byte-alignment complexities....
I still think that it'd be a good idea to use some higher-level Python TCP server off the shelf.
@baum Thanks for these changes. both 3e6a162 and 77d973b are OK. The use of tuples avoids type conversion errors, and the use of the As I explained earlier, the solution for this commit e60eb54 is incorrect. We must obtain a certain length and write the content of the response. Unknown variable lengths are dangerous. Perhaps I didn't explain it clearly. Please obtain the nvme version in your error state. I tried After solving this issue, I will make a |
I could do that test. I am not sure though about Thank you! |
#127 (comment) You said you encountered a length error, and the requested log page length is not known as 16 or 1024. Please confirm the version of the nvme client that issued this request. |
The length error #127 (comment) is generated using SPDK/bdevperf SPDK_VERSION="23.01.1" as nvme client. SPDK is a cool project initiated in Intel. SPDK is implemented in user space and does not utilize Linux nvme cli or kernel module. You could reproduce this test, following
Agree on that. Added length validation, according to spec. See the amended patch (branch)
I brushed trough NVME specs, and checked Linux kernel discovery implementation. The data length validation should check that data length defined by SGL equals to NUMD, see Linux kernel pointers:
Please share your thoughts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CongMinYin Great job 👍
@baum Sorry for the late reply. I originally attempted to replicate the bdevperf scene locally. But I encountered some difficulties when building the environment. It doesn't matter, I reviewed the code implementation of spdk. I think now your changes have no issues. The arrangement of its log page reply content is fixed, and a portion is truncated based on the length of the request to reply. Refer to:
What I was originally worried about was that the content of the response to nvme-cli requests was different for different versions, and I was overly concerned. But you misunderstood that I wanted you to check I am preparing for I believe these will be done soon. |
3b6b2c6
to
9d553f9
Compare
I rebase to the latest version and add a very simple readme section. Do I need to add this in this PR and what improvements are needed? This command |
Rebased the branch
Added commit with a small description of running discovery as container 2ac7eaf |
9d553f9
to
d986822
Compare
The discovery contrller implement the basic function. Use command "python3 -m control.discovery" to start discovery controller. Client can use command "nvme discover -t tcp -a ip -s port" to get log pages. The configuration is in ceph-nvmeof.conf [discovery] part. feature: ceph#108 Signed-off-by: Yin Congmin <congmin.yin@intel.com>
add the method of starting discovery service Signed-off-by: Yin Congmin <congmin.yin@intel.com>
- container - CI test Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
d986822
to
d0cd63b
Compare
@CongMinYin 🖖 rebased the branch, tests are green, you could merge it. Thank you! |
d0cd63b
to
d65c6b2
Compare
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
@baum Thank you! I made a minor change by modifying the README number. |
Could we merge this PR? |
@CongMinYin yes from my perspective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in my environment the container name is "ceph-nvmeof-nvmeof-1" and not "ceph-nvmeof_nvmeof_1". Using dashes and not underscores. So, when running "make demo" I get an error from the create_listener command as the gateway name is blank.
Also, notice that the default container name uses the directory from which the "docker-compose up" was run. Shen I use my forked repo, which is in a different subdir than "ceph-nvmeof" I get a different name. This can be fixed by passing "--project-name ceph-nvmeof" to the "docker-compose up" line. Or, even better, by adding "COMPOSE_PROJECT_NAME="ceph-nvmeof" to the .env file. When this is done, no matter which directory was used the container name will be "ceph-nvmeof-nvmeof-1".
Last issue, notice that in case scale=2 is used and we start two instances of the gateway, the CLI commands of "make demo" work on the second gateway. So, when we get the id for "ceph-nvmeof-nvmeof-1" and pass it to the create_listener command we get an error about the wrong gateway being used. In such a case we should use a container name of "ceph-nvmeof-nvmeof-2".
The discovery contrller implement the basic function. Use command "python3 -m control.discovery" to start discovery controller. Client can use command "nvme discover -t tcp -a ip -s port" to get log pages.
The configuration is in ceph-nvmeof.conf [discovery] part.
Fixes: #108