Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected host seen in list of hosts #5518

Closed
cakrit opened this issue Feb 27, 2019 · 22 comments
Closed

Unexpected host seen in list of hosts #5518

cakrit opened this issue Feb 27, 2019 · 22 comments
Assignees
Milestone

Comments

@cakrit
Copy link
Contributor

cakrit commented Feb 27, 2019

On Wed, Feb 27, 2019 at 11:12 AM Clarence Lin mr.lin.clarence@gmail.com wrote:
Hi,

In my google oauth login I see http://ruhighload.com:19999/, but I can't delete it.
Why?
This not my site.

unnamed

Component Name

web

@cakrit cakrit added area/web cloud Netdata hub/cloud related labels Feb 27, 2019
@cakrit
Copy link
Contributor Author

cakrit commented Feb 27, 2019

What do you see when click the little down arrow on the right? Doesn't it show a trash can? Clicking it will remove the entry. In some rare cases, you may need to reload, or click the 'Synchronize' to see it removed.

The only cases I can imagine of how you'd see a host you didn't expect there are the following:

  • The PC and browser you use has been used by someone else in the past.
  • You are working on a clone of a system that was used to access that host in the past.

If you think that none of these are true, we can dig a bit further.

@clarencetw
Copy link

clarencetw commented Feb 27, 2019

I click trash then remove.
But I appear again by sync.

@gmosx
Copy link
Contributor

gmosx commented Feb 27, 2019

Can you try the following:

  • Delete the node by clicking the trash icon
  • Click the Synchronize with netdata.cloud menu item.

@clarencetw
Copy link

clarencetw commented Feb 27, 2019

I did this, but I saw http://ruhighload.com:19999/ again.

@clarencetw
Copy link

I think it is a version issue.
http://ruhighload.com:19999 version: 1.7.0-171-ga5a21f25_rolling

I tried to remove my website properly.

@cakrit
Copy link
Contributor Author

cakrit commented Feb 28, 2019

No it doesn't have to do with the version of that host. Of course if you visit that host, it will be added again to the list.

Do you know how to use the browser's console to investigate the requests sent and received? At the very minimum, we could look at the errors that appear in the console, but what's more interesting is the requests/responses when you delete it and when you press synchronize.

@clarencetw
Copy link

OK! I found it.

8cde1c04-a3d8-4fab-b220-8800bfc679d6

@cakrit
Copy link
Contributor Author

cakrit commented Feb 28, 2019

That's not totally unexpected actually, there are cases when the delete could fail (usually happens when one switches registries). Regardless of the error, if you are signed in when that happens, there should be another call to netdata.cloud, showing in the network view as url?value=http..., with request headers like the following:

Request URL: https://netdata.cloud/api/v1/accounts/2f51cd5c-dd10-423f-87fc-53bf7d740e58/agents/679d0aa8-2bc8-11e9-9707-6cb311235172/url?value=http%3A%2F%2F95.216.43.14%3A19999%2F
Request Method: DELETE

If you don't see that, or it fails, then we have a bug.

@clarencetw
Copy link

Is this?

Request URL: https://netdata.cloud/api/v1/accounts/a8461ec2-43d2-4227-99a2-0ce0f0a524f6/agents/c9db2de6-88bb-11e7-84de-0020ca51271f/url?value=http%3A%2F%2Fruhighload.com%3A19999%2F
Request Method: DELETE
Status Code: 200
Response: {"result":{"count":1},"status":"ok"}

@cakrit
Copy link
Contributor Author

cakrit commented Feb 28, 2019

Yes. So the delete request is sent properly and a simple reload should be removing that entry from your list (until you visit that server again of course).

You should see an agents request (https://netdata.cloud/api/v1/accounts/.../agents) that returns a JSON without that server there any more. Can you show the output of that request hiding any sensitive info in it?

@clarencetw
Copy link

The JSON response.
I hide id and some urls.

{
  "result": {
    "agents": [
      {
        "id": "00000000-0000-0000-0000-000000000000",
        "name": "newyork.netdata.rocks",
        "urls": [
          "http://newyork.my-netdata.io/"
        ],
        "permissions": null,
        "properties": null
      },
      {
        "id": "00000000-0000-0000-0000-000000000000",
        "name": "ip-0-0-0-0",
        "urls": [
          "https://netdata.---.tw/"
        ],
        "permissions": null,
        "properties": null
      },
      {
        "id": "00000000-0000-0000-0000-000000000000",
        "name": "london3.my-netdata.io",
        "urls": [
          "https://london.my-netdata.io/"
        ],
        "permissions": null,
        "properties": null
      },
      {
        "id": "00000000-0000-0000-0000-000000000000",
        "name": "stackscale.my-netdata.io",
        "urls": [
          "https://stackscale.my-netdata.io/"
        ],
        "permissions": null,
        "properties": null
      },
      {
        "id": "00000000-0000-0000-0000-000000000000",
        "name": "ip-0-0-0-0",
        "urls": [],
        "permissions": null,
        "properties": null
      },
      {
        "id": "00000000-0000-0000-0000-000000000000",
        "name": "ruhighload",
        "urls": [],
        "permissions": null,
        "properties": null
      }
    ]
  },
  "status": "ok"
}

@gmosx gmosx self-assigned this Feb 28, 2019
@gmosx gmosx added this to the v1.13 milestone Feb 28, 2019
@gmosx
Copy link
Contributor

gmosx commented Feb 28, 2019

Can you open the Console (f12) and write:

registryAgents

and press enter. Please expand the result and copy paste it here. Is the ruhighload server included?

Please do the same for:

cloudAgents

(press enter, copy-paste result here).

Btw, forget my previous advice to click synchronize after deleting the agent. I misunderstood your problem. If the agent cannot be removed from the global registry (that error 412) clicking 'synchronize' will move it again to the cloud.

@clarencetw
Copy link

But how to delete registry agents.

Can you open the Console (f12) and write:

registryAgents
[
  {
    "guid": "00000000-0000-0000-0000-000000000000",
    "url": "http://ruhighload.com:19999/",
    "last_t": 1551327866000,
    "accesses": 12,
    "name": "ruhighload",
    "alternate_urls": [
      "http://ruhighload.com:19999/"
    ]
  }
]

and press enter. Please expand the result and copy paste it here. Is the ruhighload server included?

Please do the same for:

cloudAgents

(press enter, copy-paste result here).

  {
    "guid": "00000000-0000-0000-0000-000000000000",
    "name": "ip-10-0-0-0",
    "url": "https://netdata.mysite.tw/",
    "alternate_urls": [
      "https://netdata.mysite.tw/"
    ]
  },
  {
    "guid": "00000000-0000-0000-0000-000000000000",
    "name": "london3.my-netdata.io",
    "url": "https://london.my-netdata.io/",
    "alternate_urls": [
      "https://london.my-netdata.io/"
    ]
  },
  {
    "guid": "00000000-0000-0000-0000-000000000000",
    "name": "newyork.netdata.rocks",
    "url": "http://newyork.my-netdata.io/",
    "alternate_urls": [
      "http://newyork.my-netdata.io/"
    ]
  },
  {
    "guid": "00000000-0000-0000-0000-000000000000",
    "name": "stackscale.my-netdata.io",
    "url": "https://stackscale.my-netdata.io/",
    "alternate_urls": [
      "https://stackscale.my-netdata.io/"
    ]
  }
]

Btw, forget my previous advice to click synchronize after deleting the agent. I misunderstood your problem. If the agent cannot be removed from the global registry (that error 412) clicking 'synchronize' will move it again to the cloud.

@ktsaou
Copy link
Member

ktsaou commented Mar 1, 2019

@gmosx I think this is side effect of syncing registry and netdata.cloud.

The only solution I can think is to delete entries from netdata.cloud only when they are deleted from the registry. If the registry denies to delete it, we may get this result.

The registry may deny to delete a host for the following reasons:

  1. The registry is not enabled (not the case)
  2. The request is invalid for a number of reasons:
    • no person_guid given
    • no machine_guid given
    • no url given (of the currently viewed dashboard)
    • the persons_guid is not a valid GUID
    • the machine_guid is not a valid GUID
    • the machine_guid is not a known machine
    • the person_guid is not a known person
    • the person identified by persion_guid does not already know the url of the currently viewed node (not the one to be deleted, but the one currently viewed).
  3. The delete_url to be deleted is the same to the one currently viewed. This is a common error - you cannot delete the URL of the dashboard you view. I think we should remove this check.
  4. The person identified with person_guid has never accessed in the past the delete_url.

I reviewed the deletion code of the registry, and I think it is error prone. The key problem is that the delete command does not take as argument the machine_guid of the machine the delete_url belongs to. So, if you try to delete http://localhost:19999/ and you have 10 different nodes with the same URL, you cannot know which one will be deleted. @cakrit please make a test for this case. If the registry is faulty, we may have to fix it.

@cakrit
Copy link
Contributor Author

cakrit commented Mar 2, 2019

I think this is side effect of syncing registry and netdata.cloud.
The only solution I can think is to delete entries from netdata.cloud only when they are deleted from the registry. If the registry denies to delete it, we may get this result.

This would make things significantly worse. The most common case we have encountered is when someone switches from one registry to another. It's not mentioned in the list, but the new registry actually doesn't have that host at all. The user would need to completely clear the browser local storage to get rid of that entry. So we deliberately ignore the registry errors.

The cases you mentioned are in general valid security precautions, so that person A don't get to delete person B's entries. But we definitely have a disconnect between the code that decides what to put in the list it returs (e.g. for the case in this issue) and the code that decides if the browser that received that list is allowed to delete it. It sounds like the problem would indeed be the last thing you wrote:

The key problem is that the delete command does not take as argument the machine_guid of the machine the delete_url belongs to. So, if you try to delete http://localhost:19999/ and you have 10 different nodes with the same URL, you cannot know which one will be deleted.

I'll check what happens in this case.

@cakrit cakrit added bug area/registry and removed cloud Netdata hub/cloud related labels Mar 2, 2019
@cakrit
Copy link
Contributor Author

cakrit commented Mar 2, 2019

Replicated the 412 error for a machine_guid that appears with multiple URLs. Starting on the fix.

@cakrit cakrit removed the area/web label Mar 2, 2019
@cakrit cakrit assigned cakrit and unassigned gmosx Mar 2, 2019
@cakrit
Copy link
Contributor Author

cakrit commented Mar 2, 2019

That wasn't it, it was a different failure and it wasn't actually testing the opposite scenario that @ktsaou mentioned, same URL multiple machine_guids.

@cakrit
Copy link
Contributor Author

cakrit commented Mar 4, 2019

@clarencetw we checked the logs and saw that you were removing that host's entry, while you were looking at the console of that host. That limitation was there by design, because a refresh will then just add that host again to the registry. We removed that limitation, but if you remove the host the same way again, just close that browser tab. Even better, do the delete from the UI of a host you actually want to keep.

@clarencetw
Copy link

clarencetw commented Mar 4, 2019

Sorry, I'll see ruhighload.com again by use trash can delete.
I try flow:

  1. I close browser tab open again.
  2. click sync.
  3. use trash can delete.
  4. I see ERROR 412: Netdata registry DELETE in my console.

I use registryAgents see

[
  {
    "guid": "00000000-0000-0000-0000-000000000000",
    "url": "http://ruhighload.com:19999/",
    "last_t": 1551327866000,
    "accesses": 12,
    "name": "ruhighload",
    "alternate_urls": [
      "http://ruhighload.com:19999/"
    ]
  }
]

@cakrit
Copy link
Contributor Author

cakrit commented Mar 4, 2019

You're right, we just updated the code in the global registry a few minutes ago. If you do it again, you won't get a 412 now.

Again, please ensure you do NOT load http://ruhighload.com:19999 after the delete.

@clarencetw
Copy link

clarencetw commented Mar 5, 2019

I am still trying to get 412 now.

  • This my console log.
GET https://registry.my-netdata.io/api/v1/registry?action=delete&machine=00000000-0000-0000-0000-000000000000&name=ip-0-0-0-0&url=https%3A%2F%2Fnetdata.mysite.tw%2F&delete_url=http%3A%2F%2Fruhighload.com%3A19999%2F&_=1551751054516 412
ERROR 412: Netdata registry DELETE failed: https://registry.my-netdata.io
Received error from registry null
ACCESS *** ***
Checking if sync is needed. 
Rendering my-netdata menu from netdata.cloud 

There is no http://ruhighload.com:19999 here.


  • Then try sync
ACCESS *** ***
Checking if sync is needed. 

Find http://ruhighload.com:19999 here in JSON.

Synchronizing with netdata.cloud.
Rendering my-netdata menu from netdata.cloud 

http://ruhighload.com:19999 here in JSON.

@clarencetw
Copy link

I deleted successfully using chrome force refresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants