Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add domain specific configs to gluon #1216

Closed
wants to merge 13 commits into from

Conversation

lemoer
Copy link
Member

@lemoer lemoer commented Aug 27, 2017

intro

As freifunk networks become larger the ground noise in the batman layer 2 network increases. At about ~1000 nodes, we reach a point, where we want to split the network in different domains (layer 2 segments). So need to serve different configs. With current implementation in gluon, we need to build one firmware per domain, as there is a possibility for one site.conf per image. The idea is to introduce multiple domain configs per image, which could be changed without reflashing the router.

about this pr

The gluon framework ensures, that the site.conf is valid before the image is build (at least in some limits). This is done per module (package/gluon-pkg-name/check_site.lua) with help of the check_site_lib.lua. This PR tries to extend the functionalities of the check_site_lib to verify domain specific configs also. This PR does not implement a functionality to read the domain values inside the firmware yet.

ideas

  • the domain config should be in site/domains/domainname.conf
  • the structure is the same as in the site/site.conf
  • the two configs (site.conf and selected domain.conf) should be merged.
  • not all config values make sense in a domain specific way, so we forbid some values in the domain specific configs. the other way is also a subject

state

  • domain specific configs are baked into the image (/lib/gluon/domains/domainname.json)
  • the functionality to forbid values has been implemented (e.g. forbid_in_domain('config_mode.geo_location.show_altitude'))
    • values, which do not make sense in a domain specific way are forbidden
    • some other values, which I think are not in first priority, are also forbidden for now
  • if there is an incorrect value while checking the site and domain configs, we print in which config the issue is
  • the need_table(...) function should be handled correctly
  • decide what should happen with packages, that do not know of domain specific values
    • fail at compiletime, since they could use values, which should be forbidden in domain specific config?
    • don't care?
  • what happens if site/domains/ directory is not existing?
  • switch from forbid_in_* to need_*, need_domain_*, need_site_*
  • mesh, ap, ... should be only allowed in domain config
  • site.conf does not need to be valid at all. -> having at least one domain.conf becomes necessary.
    • remove if domain ~= nil checks
    • stop verifying check_domain(site, nil)
  • test what happens, if no domain.conf exists
  • test one need_site_* value
  • test one need_domain_* value
  • test one need_* value
  • decide, how packages should ask for values of domain.conf?
    • generic approach, which is implemented in site.lua, which is transparent for the packages
    • implement the query in each package if domain.opkg or if site.opkg ...
  • upgrade behavior?
  • domain in uci value gluon.@System[0].domain
  • implement mergeing in lua lib site
  • test merging in lua lib
  • merge domain and site in libgluonutil
  • add domain_seed
  • use domain_seed for mesh_lan
  • ensure, that the default domain exists
  • announce domain via respondd
  • [BUG] if a domain (only) value is not given, the error says it's missing in site.conf
  • [IMPROVEMENT] if a domain and site value is not given, the error says it's missing in site.conf

@lemoer
Copy link
Member Author

lemoer commented Aug 27, 2017

I think, i need to change the api of need_table(varname, subcheck, required) a little. The subcheck parameter is a callback function with signature subcheck(key, value) for now.

The following example is not easy to solve with this api in my oppinion:

site = {
  arr_value = {
    somekey = {
      special_number = 42
    }
  }
}

domain = {
  arr_value = {
    somekey = {
      another_setting = 'magic'
    }
  }
}

We could now do the following to call the subcheck inside need_table('arr_value', subcheck):

subcheck('somekey', { special_number=42, another_setting='magic'})

In other words, we merge the tables. But then inside subcheck(...) it would be impossible to know where the value was from (domain or site). We could not drop a nice error message this way.

My idea would be to drop the pass of value to the subcheck(...) function.
All current modules in gluon do not use the value anyway. They instead reach the value by using the "json path" with the key. E.g:

local prefix = string.format('autoupdater.branches[%q].', key)
need_string(prefix .. 'name')

This should work for all purposes and already covers proper error messages.

The only thing would be:

  • change api from subcheck(key, value) to subcheck(key)
  • merge keys from domain and site before calling subcheck(key)

@rotanid rotanid added the 0. type: enhancement The changeset is an enhancement label Aug 27, 2017
@lemoer
Copy link
Member Author

lemoer commented Aug 29, 2017

allowed in both site and domain config

  • autoupdater.branches
  • autoupdater.branches.*.mirrors
  • wifi*
  • wifi*.channel
  • wifi*.basic_rate
  • wifi*.ap
  • wifi*.ap.disabled
  • wifi*.ibss
  • wifi*.ibss.mcast_rate
  • wifi*.ibss.vlan
  • wifi*.ibss.disabled
  • wifi*.mesh
  • wifi*.mcast_rate
  • wifi*.disabled
  • opkg
  • opkg.lede
  • opkg.extra
  • opkg.extra.*
  • ntp_servers
  • dns
  • dns.cacheentries
  • dns.servers
  • nextnode
  • mesh
  • mesh.batman_adv
  • mesh.batman_adv.gw_sel
  • mesh.batman_adv.routing_algo
  • mesh_vpn.enabled
  • mesh_vpn.mtu
  • mesh_vpn.fastd.methods
  • mesh_vpn.fastd.groups
  • mesh_vpn.fastd.groups.*.limit
  • mesh_vpn.fastd.groups.*.groups
  • mesh_vpn.fastd.groups.*.peers
  • mesh_vpn.fastd.groups.*.peers.remotes
  • mesh_vpn.tunneldigger.brokers

allowed in site only

  • authorized_keys
  • autoupdater.branch
  • autoupdater.branches.*.name
  • autoupdater.branches.*.good_signatures
  • autoupdater.branches.*.pubkeys
  • config_mode
  • config_mode.owner
  • config_mode.owner.obligatory
  • config_mode.remote_login
  • config_mode.geo_location
  • config_mode.geo_location.show_altitude
  • config_mode.remote_login.show_password_form
  • config_mode.remote_login.min_password_length
  • site_code
  • site_name
  • site_seed
  • hostname_prefix
  • timezone
  • regdom
  • wifi*.ibss.supported_basic_rates
  • poe_passthrough
  • wifi*.mesh.supported_basic_rates
  • mesh_on_wan
  • mesh_on_lan
  • single_as_lan
  • mesh_vpn.fastd.configurable
  • mesh_vpn.fastd.syslog_level
  • roles.default
  • roles.list
  • setup_mode.skip
  • mesh_vpn.bandwidth_limit (only applied at first boot)
  • mesh_vpn.bandwidth_limit.enabled
  • mesh_vpn.bandwidth_limit.ingress
  • mesh_vpn.bandwidth_limit.egress

allowed in domain only

  • next_node.mac
  • next_node.ip4
  • next_node.ip6
  • prefix4
  • prefix6
  • wifi*.ap.ssid (clients could roam between domains otherwise)
  • wifi*.ibss.ssid
  • wifi*.ibss.bssid
  • wifi*.mesh.id
  • extra_prefixes6
  • mesh_vpn.fastd.groups.*.peers.key (an attacker with control over the packet flow could make nodes connect to the wrong instance)

helpful commands:

grep -r need_ package/*/check_site.lua | cut -d: -f 2 | sed 's/need_/\nneed_/g' | grep need_ | grep need_site
grep -r need_ package/*/check_site.lua | cut -d: -f 2 | sed 's/need_/\nneed_/g' | grep need_ | grep need_domain
grep -r need_ package/*/check_site.lua | cut -d: -f 2 | sed 's/need_/\nneed_/g' | grep need_ | grep -v need_site | grep -v need_domain

@lemoer
Copy link
Member Author

lemoer commented Aug 29, 2017

Yeay, it works:

trying to put site_code in domain specific config:

/home/gluon/gluon/lede/staging_dir/hostpkg/bin/lua: stdin:50: domains/default.conf error: `site_code` is not allowed in domain specific config.
stack traceback:
        [C]: in function 'assert'
        stdin:50: in function 'with'
        stdin:186: in function 'need_site_string'
        stdin:209: in function 'check_domain'
        stdin:296: in main chunk
        [C]: in function 'dofile'
        (command line):1: in main chunk
        [C]: ?
postinst script ./usr/lib/opkg/info/gluon-core.postinst has failed with exit code 1
make[3]: *** [package/Makefile:67: package/install] Error 1
make[3]: Leaving directory '/home/gluon/gluon/lede'
make[2]: *** [package/Makefile:102: /home/gluon/gluon/lede/staging_dir/target-mips_24kc_musl-1.1.16/stamp/.package_install] Error 2
make[2]: Leaving directory '/home/gluon/gluon/lede'
make[1]: *** [/home/gluon/gluon/lede/include/toplevel.mk:200: world] Error 2
make[1]: Leaving directory '/home/gluon/gluon/lede'
make: *** [Makefile:133: all] Error 2

trying to put prefix6 in site config:

/home/gluon/gluon/lede/staging_dir/hostpkg/bin/lua: stdin:55: site.conf error: `prefix6` is not allowed in site config.
stack traceback:
        [C]: in function 'assert'
        stdin:55: in function 'with'
        stdin:186: in function 'need_domain_string_match'
        stdin:232: in function 'check_domain'
        stdin:296: in main chunk
        [C]: in function 'dofile'
        (command line):1: in main chunk
        [C]: ?
postinst script ./usr/lib/opkg/info/gluon-core.postinst has failed with exit code 1

no domain config:

GLUON_SITEDIR='/home/gluon/gluon/site' lua -e 'print(require("cjson").encode(assert(dofile("../../scripts/site_config.lua"))))' > /home/gluon/gluon/lede/build_dir/target-mips_24kc_musl-1.1.16/gluon-site/site.json
ls /home/gluon/gluon/site/domains/*.conf > /dev/null # at least one domain cfg has to exist
ls: cannot access '/home/gluon/gluon/site/domains/*.conf': No such file or directory
make[4]: *** [Makefile:67: /home/gluon/gluon/lede/build_dir/target-mips_24kc_musl-1.1.16/gluon-site/.built] Error 2
make[4]: Leaving directory '/home/gluon/gluon/package/gluon-site'
make[3]: *** [package/Makefile:106: package/feeds/gluon/gluon-site/compile] Error 2
make[3]: Leaving directory '/home/gluon/gluon/lede'
make[2]: *** [package/Makefile:101: /home/gluon/gluon/lede/staging_dir/target-mips_24kc_musl-1.1.16/stamp/.package_compile] Error 2
make[2]: Leaving directory '/home/gluon/gluon/lede'
make[1]: *** [/home/gluon/gluon/lede/include/toplevel.mk:200: world] Error 2
make[1]: Leaving directory '/home/gluon/gluon/lede'
make: *** [Makefile:133: all] Error 2

Yeay :)

@lemoer
Copy link
Member Author

lemoer commented Aug 29, 2017

@NeoRaider I think, I implemented your improvements. What do you say for now? Especially in regard to the chosen (allowed) places for the values?

@skorpy2009
Copy link
Member

Maybe the "hostname_prefix" should be in both, site and domain config, because we will start with the zip code and separate by the first two digits. 60xxx and 61xxx will be different meshes.

@lemoer
Copy link
Member Author

lemoer commented Aug 30, 2017

@skorpy2009 Hm. I think this will introduce another level of complexity. I think the hostname_prefix is only used during config mode and maybe the domain will be set in config mode (or maybe first in the post upgrade process, aka the scripts in /lib/gluon/upgrade). So the changed hostname_prefix only read, during the second time in config mode. Normally there is already a hostname set yet, so the hostname_prefix won't be used neither.

Besides: I don't get the advantages, why one wants to store such information in the node name? Wouldn't be storing the domain in another uci value be enough? Maybe add a postal uci value, which is mandatory?

@rotanid
Copy link
Member

rotanid commented Aug 31, 2017

i agree with @lemoer , bad idea to try to store additional data in the hostname

@lemoer lemoer mentioned this pull request Sep 4, 2017
@lemoer
Copy link
Member Author

lemoer commented Sep 4, 2017

I can first go on with my work, if I know, when the domain config should be changeable. We need a decision here!

  1. Changing should be possible when the router is online (already in the first version of domain configs).
  2. Changing should be only possible during post-upgrade procedure (forever)
  3. First we implement the second choice and merge this behavior into gluon. If somebody (maybe the hoodselector) wants to extend the behavior, so the domains can be changed online, he is free to add another PR.

@NeoRaider I think this is your decision. I would be happy if you could state your position, so I could go on.

@neocturne
Copy link
Member

I think 2. is fine for now. Allowing online config changes doesn't really have anthing to do with separating the config into site and domain sections, so I'd like to postpone that to a later PR.

@lemoer
Copy link
Member Author

lemoer commented Sep 10, 2017

  • path is broken /lib/gluon/default/default

@lemoer
Copy link
Member Author

lemoer commented Sep 11, 2017

  • look for uci_alloc_context(); in code and ensure free()

@lemoer
Copy link
Member Author

lemoer commented Sep 11, 2017

  • rename site.json to base.json?

@lemoer
Copy link
Member Author

lemoer commented Sep 11, 2017

  • libuci dependency in gluonutil

@lemoer
Copy link
Member Author

lemoer commented Sep 13, 2017

The binary for gluon-print-site is about 4K when the debug infos are stripped.

@lemoer
Copy link
Member Author

lemoer commented Sep 14, 2017

@NeoRaider I think, this is ready to review.

@lemoer lemoer changed the title [RFC] Add domain specific config verification to check_site framework Add domain specific configs to gluon Sep 15, 2017
@lemoer
Copy link
Member Author

lemoer commented Oct 6, 2017

Some more work to allow aliases:

  • symlink
  • add check_site for domain_name and domain_aliases
  • domain_name should be adapted?
  • rename aliases to domain_aliases
  • announce pretty name
  • show domain in statuspage

@rotanid rotanid added this to the 2017.2 milestone Oct 14, 2017
package/gluon.mk Outdated
end

local dir = os.getenv('IPKG_INSTROOT') .. '/lib/gluon/domains/'
local pfile = io.popen('find '..dir..' -iname \\*.json') -- TODO: not sure about the doublebackslash? why do we need it?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving a comment as a marker for the TODO

@rotanid
Copy link
Member

rotanid commented Oct 14, 2017

please check your code indention again. the current lua code seems to use 3 spaces while yours sometimes has 2 or even 5 spaces

Copy link
Member

@christf christf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this PR is merged domains must be defined. Is this intentional?


domain=$(uci get gluon.@system[0].domain)

ls /lib/gluon/domains/${domain}.json > /dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a subtle way of printing an error message. How about
[[ -f /lib/gluon/domains/${domain}.json ]] || echo "file not found: /lib/gluon/domains/${domain}.json" >&2

*/
static struct json_object * merge_json(struct json_object *a, struct json_object *b) {
if (!json_object_is_type(a, json_type_object) || !json_object_is_type(b, json_type_object)) {
json_object_put(b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the reference count for b decremented here? I do not think this should be inside this function.
also returning a if a is not a json_type_object is possible from l46-48

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, how this should be possible outside the function? merge_json() is called recursively.

@mweinelt
Copy link
Contributor

Honestly swapping between site and domain is confusing at times.

Can we call the shared configuration base, and the invidual configurations either site or domain? Would probably increase readability and prevent confusion alot.

@lemoer lemoer force-pushed the pr_check_site_domain branch 2 times, most recently from 09b5bdd to 8cc67ea Compare October 18, 2017 21:07
@@ -0,0 +1,10 @@
#!/bin/sh

domain=$(uci get gluon.@system[0].domain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is not populated by default, also gluon would be an entirely new section, while gluon-core already exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, neoraider said, that he want to merge all the gluon-* sections in one gluon section in near future. And I already use this new section. But I'm currently not sure how to introduce sections. I'll add a todo for this.

But you pointed me to an other error. It should be uci get gluon.system.domain. Thanks.

@lemoer
Copy link
Member Author

lemoer commented Dec 31, 2017

Found a small bug:

ln: failed to create symbolic link '/home/gluon/gluon/lede/build_dir/target-i386_pentium4_musl-1.1.16/gluon-site/domains/suedstadt.json': File exists
ln: failed to create symbolic link '/home/gluon/gluon/lede/build_dir/target-i386_pentium4_musl-1.1.16/gluon-site/domains/nordstadt.json': File exists

GLUON_SITEDIR='$(call qstrip,$(CONFIG_GLUON_SITEDIR))' lua -e 'print(require("cjson").encode(assert(dofile("../../scripts/domain_config.lua")("'$$$${dc}'"))))' > $(PKG_BUILD_DIR)/domains/$$$${dc}.json; \
aliases=$$$$(GLUON_SITEDIR='$(call qstrip,$(CONFIG_GLUON_SITEDIR))' lua -e 'for alias_name, _ in pairs(dofile("../../scripts/domain_config.lua")("'$$$${dc}'")["domain_aliases"] or {}) do print(alias_name.." ") end'); \
for alias in $$$${aliases}; do \
ln -s $$$${dc}.json $(PKG_BUILD_DIR)/domains/$$$${alias}.json; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace ln -s with ln -sf here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to use rm -r to clean the whole directory domains/ before entering the for loop, so the whole process should be a little more idempotent. Deleting a domain in site repo hat no effect before this additionional rm.

@lemoer
Copy link
Member Author

lemoer commented Jan 3, 2018

Pushed a fix.

@rotanid
Copy link
Member

rotanid commented Jan 17, 2018

@lemoer a small issue came up in #1310 , maybe you can fix this

@lemoer
Copy link
Member Author

lemoer commented Jan 17, 2018

Why should we continue executing upgrade scripts if one fails? I think this increases the chance to overlook the errors. E.g. the openWRT build scripts also fail if there is an error instead of trying to continue.

@neocturne
Copy link
Member

  • Third-party packages should not be able to easily break the whole system
  • Upgrade scripts can better be compared with /etc/uci-defaults scripts, which are also executed indepenently of each other
  • If there is an issue breaking the scripts, configuration must still continue so the autoupdater is likely to work, so the issue can be solved in the next release

@lemoer
Copy link
Member Author

lemoer commented Jan 17, 2018

As domain_changed.sh is only to be called directly by the user, when they manually changed the gluon.system.domain uci value, I think there is no need for continuing. Do you still stay in your position?

@neocturne
Copy link
Member

I think it makes sense to unify the logic handling initial configuration and changing domains (eventually), so this logic should not behave differently.

@lemoer
Copy link
Member Author

lemoer commented Jan 17, 2018

Do you only mean it should behave equally or do you also mean it should be the same piece of code executing the scripts?

@neocturne
Copy link
Member

In the long term, it should be the same piece of code, but this is not a requirement for getting this PR merged.

@lemoer
Copy link
Member Author

lemoer commented Jan 17, 2018

Updated.

@neocturne
Copy link
Member

I've just pushed 0b80f1b, which allows us to avoid some code duplication.

@neocturne
Copy link
Member

I have applied parts of this PR that also work without actual multi-domain support. In 50812b1, I have made the following changes:

  • gluon-client-bridge: allow AP SSID in domain config only (clients could roam between domains otherwise)
  • gluon-mesh-vpn-fastd: allow peer keys in domain config only (reusing the same keys for multiple fastd instances breaks certain security guarantees (an attacker with control over the packet flow could make nodes connect to the wrong instance)
  • gluon-mesh-vpn-core: allow bandwidth limit defaults in site config only (these default values are applied only in the initial configuration, so domain defaults would never have an effect)

@rotanid rotanid added the 2. status: merge conflict The merge has a conflict and needs rebasing label Jan 19, 2018
@neocturne
Copy link
Member

I've pulled the rest of the changes into a local branch and will take it from here.

The most important piece missing now is an updated site documentation.

@rotanid
Copy link
Member

rotanid commented Jan 19, 2018

@NeoRaider isnt #1261 the documentation update? if that is not complete, please comment over there

@lemoer
Copy link
Member Author

lemoer commented Jan 19, 2018

@NeoRaider Thanks for reworking. I like the idea to merge stub no-ops as a prepare step.

(I think we can leave this PR/issue open, until the feature is merged into the master branch)

@neocturne
Copy link
Member

Multidomain support is merged now. It is disabled by default and can be enabled by setting GLUON_MULTIDOMAIN=1 in site.mk.

@neocturne neocturne closed this Jan 26, 2018
@lemoer lemoer deleted the pr_check_site_domain branch February 17, 2023 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. type: enhancement The changeset is an enhancement 2. status: merge conflict The merge has a conflict and needs rebasing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants