Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gluon-online-status concept #2228

Closed
mweinelt opened this issue Jun 8, 2021 · 23 comments · Fixed by #2297
Closed

gluon-online-status concept #2228

mweinelt opened this issue Jun 8, 2021 · 23 comments · Fixed by #2297
Labels
0. type: enhancement The changeset is an enhancement

Comments

@mweinelt
Copy link
Contributor

mweinelt commented Jun 8, 2021

As we closed #1930 and #1684 today with a reference to an IRC discussion that was ongoing I want to present the conclusion of that discussion.

We think that having all nodes ping into the world in regular intervals is not something we would like to see in a first implementation of this feature. Instead we would like to focus on information that we can cheaply derive from the local node and offer a multitude of flags to various packages, that they can easily test for to see if the node is or isn't in a required state.

For that we define a directory /var/gluon/online/ that carries empty marker files. With our initial proposal we think of two simple markers that we would like to see in the first version:

  • neighbors or mesh to reflect that the node has neighbors that it meshes with
  • route_default4/6 or default_gw4/6, to reflect that the respective network stack has a default route

In later versions these can be extended by a multitude of things, I could imagine exposing whether we have an active NTP sync for example. This would also be open to contributions of markers through community packages.

A script should run in a regular fashion, that calls the canonical checks for the mesh protocols provider and based on these touches certain files, should they not exist, or deletes them should they no longer be valid. I think it would be helpful not to touch already existing markers so that we can look at the mtime of these markers to know when they entered a certain state.

For batman-adv we think batctl n and batctl gwl are strong contenders, and for babeld something can be grepped from the dump command of its control socket.

N.B: I'm not sure what state the babel setup is in, and I wouldn't want to make it a mandatory part of the implementation if, as I have recently heard, it doesn't build for multiple releases now and nobody noticed.

@NeoRaider @blocktrron @AiyionPrime @T-X I hope this summarises what we talked about, if not feel free to add your larification below.

@mweinelt
Copy link
Contributor Author

mweinelt commented Jun 8, 2021

I think it would be helpful not to touch already existing markers so that we can look at the mtime of these markers to know when they entered a certain state.

On second thought: we should have ctime to read that information, so touching could be done unconditionally.

@lemoer
Copy link
Member

lemoer commented Jun 8, 2021

Suggestion:

/var/gluon/state/has_neighbours
/var/gluon/state/has_default_gw4
/var/gluon/state/has_default_gw6

(React with: 👍 or 👎)

@AiyionPrime
Copy link
Member

It's correct that batctl gwj and batctl nj are not soon available in gluon/tree/master right?

@rubo77
Copy link
Contributor

rubo77 commented Jun 9, 2021

So how will this solution help creating an SSID-Changer? How can we be sure, that the node can reach the internet?

@mweinelt
Copy link
Contributor Author

mweinelt commented Jun 9, 2021

It's correct that batctl gwj and batctl nj are not soon available in gluon/tree/master right?

As soon as we are on OpenWrt 21.02, but apparently we have not yet decided to migrate to that after 2021.1, which I honestly can't understand because 19.07 has a projected EOL in august.

@AiyionPrime
Copy link
Member

AiyionPrime commented Jun 9, 2021

So how will this solution help creating an SSID-Changer? How can we be sure, that the node can reach the internet?

@rubo77 Not sure it will. For now I am aiming for the three paths above;
a ressource-check using icmp and external servers might be added later, as soon as the current goal proves to be insufficient.

And neither have I forgotten: there was a huge discussion about what an online-checker might check;
nor that one of the results was for offline-ssid-changer a ping check is expected to be most useful.

It's just so controversial, that I'd like to start with a featureset we can agree on.
There are other usecases, for which the provided cheaper calls might be enough.
We can start with them and discuss the implementation of the expensive ones later.

@Adorfer
Copy link
Contributor

Adorfer commented Jun 9, 2021

So how will this solution help creating an SSID-Changer?

at the moment there is no way to let batman-gw signalize if it has public ipv6/ipv4 connectivity available through it.
as long as this is, the offline ssid package should still test "pinging public internet".

@mweinelt
Copy link
Contributor Author

mweinelt commented Jun 9, 2021

So how will this solution help creating an SSID-Changer?

at the moment there is no way to let batman-gw signalize if it has public ipv6/ipv4 connectivity available through it.
as long as this is, the offline ssid package should still test "pinging public internet".

Why not

# untested, but hope you get the gist
IFNAME=bat0 DEST=192.0.0.2 (ping -c4 -I $IFNAME $DEST && batctl meshif $IFNAME gw server 100mbit/100mbit) || batctl meshif $IFNAME gw off

in a cronjob on your gateway, and then we have signaling based on the gateway mode and you even have some sort of failover condition.

There are smarter solutions than having hundreds of nodes ping into the internet all day long.

@AiyionPrime
Copy link
Member

Hanover does this as well.

@Adorfer
Copy link
Contributor

Adorfer commented Jun 9, 2021

Why not

this is what we do, a little more complex, which i consider just a workaround.
Nevertheless i assume that there are people arguing that "being a batman-gateway should not be the indicator for offering internet-peering".
Or to put a a different way: i wish there would be a dedicated way for a offline-ssid-package to determine if there is internet connectivity available via the batman-network, for example via dns (a dnsbl-alike method)

@CodeFetch
Copy link
Contributor

What is the goal you want to achieve with this gluon-online-status? Better say for what do you want to collect this information?

I think gluon-online-status is a bad naming as it suggests that this can actually check whether the internet is reachable which only a global ping check could and it does not make clear that this package also does neighbor checks etc.

My idea in freifunk-gluon/community-packages#9 is that it is up to the community decide on which level they want to do the test. It is possible to define target groups e.g. local (supernodes, nameservers, timeservers) and global ones (publicly pingable servers, servers from other Freifunk communities or something). The offline-ssid package can then be set to use either the global or local targets depending on if the community thinks it is a bad idea to ping global targets.

My approach could be clearly made more efficient, but it was thought as an RFC right from the beginning. For example one could do a local respondd query to determine if another node in the local mesh cloud has done a connection check successfully so that only one of the nodes does the ping to global targets or something.

I just don't understand why you want to add this whole complexity. From my point of view ping checks are not costly and checking for neighbors and route_default4/6 would only add to complexity without a proper use-case as a ping would not cost anything if these are not given (because it would not reach the internet at all then).

@AiyionPrime
Copy link
Member

The first third of checks we initially agreed on is now part of master.
The other two would (at least for batman-adv) follow in #2274.

Babels implementation is still missing; if there's interest I can write them down as well.

@CodeFetch
Copy link
Contributor

@AiyionPrime What about adding a respondd provider for publishing the gluon-state results?

@AiyionPrime
Copy link
Member

Personally I do not see a need for that.

I was made aware of the goal to fit repondd responses in a packet.

[...], the data should fit in a single unfragmented packet.[...]

Originally posted by @NeoRaider in #2289 (comment)

That's something I did not focus on yet, but will in the future.
Other than that such a provider is out of this' issues scope. If you need it, you might want to open another.

What's left to do here is #2274 as well as possibly a pendant for babel. Maybe @mweinelt did come to a conclusion; the initial post questions this in the italic section.

@CodeFetch
Copy link
Contributor

@AiyionPrime A respondd provider for this would be very useful for diagnostics in the future (maybe not the currently implemented) and I think the overhead is negligible as these are one-byte values... Isn't there a possibility to include the requested fields in the query? And if not... Why? E.g. the total of transferred bytes or the hostname etc. is something which doesn't need to be transferred each time or only if it is changed. If respondd is at it's size limit, respondd should be optimized...

@AiyionPrime
Copy link
Member

I really think query-dependent respondd-responses are out of scope of this.

@AiyionPrime
Copy link
Member

I'm not sure what check for has_default_gw4 in babel would make sense.

@neocturne
Copy link
Member

@AiyionPrime I don't have access to a Babel-based mesh at the moment. Could you post the output of a babel dump?

@AiyionPrime
Copy link
Member

I haven't either; will try to find a working net again.

@rotanid
Copy link
Member

rotanid commented Aug 17, 2021

hm, @mweinelt was, as far as i remember, under the impression that recent Gluon releases don't build or at least don't work with babel, as no one has worked on or at least tested the babel support in the last ~2 years

@AiyionPrime
Copy link
Member

I haven't found a working net with recent gluon-babel in the past months, so this is kind of stalled by #2353 .

@rubo77
Copy link
Contributor

rubo77 commented Mar 10, 2022

I think we need a solution here soon because the current ssID changer doesn't work anymore with the latest gluon and needs to be adapted

@AiyionPrime
Copy link
Member

I think we need a solution here soon because the current ssID changer doesn't work anymore with the latest gluon and needs to be adapted

I think the solution has been implemented a while ago.
For batman networks the work is done since the merge of #2245 and #2274.
So in case your community uses batman, you're already good to go.

In case it does not and uses babel instead, you only have a v6 default route check for now.
You can however help get #2297 merged, by testing how well it works.

Code has been done about a year ago, we haven't encountered anybody with a recent babel network yet...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. type: enhancement The changeset is an enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants