Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Pydantic v2 #64

Merged
merged 9 commits into from
Aug 10, 2023
Merged

Migrate to Pydantic v2 #64

merged 9 commits into from
Aug 10, 2023

Conversation

pederhan
Copy link
Member

@pederhan pederhan commented Jul 31, 2023

Pydantic v2 brings with it much higher performance (5-50x times faster than Pydantic v1), as well as improved APIs for validation and (de)serialization.

This PR bumps the minimum Pydantic version of ZAC to 2.0.0. Code changes have been made to accommodate the new APIs introduced in 2.0.0. Most of the code changes were made using the migration tool Bump Pydantic as outlined in the Pydantic V2 Migration Guide.

The use of BaseSettings has been substituted with BaseModel since we didn't use any of the features offered by BaseSettings. Furthermore, the BaseSettings class has been moved to the pydantic-settings package, which we shouldn't include if we don't need to.

@pederhan
Copy link
Member Author

pederhan commented Aug 9, 2023

Just discovered an issue with merging hosts using Pydantic V2.

89ae85d (origin/master)

Done with merge in 9.91 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 9.62 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 9.06 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 10.11 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 10.75 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 10.67 seconds. Equal hosts: 10067, replaced hosts: 0

f653bc3 (pederhan/pydantic-v2)

Done with merge in 13.49 seconds. Equal hosts: 9742, replaced hosts: 324
Done with merge in 12.28 seconds. Equal hosts: 9737, replaced hosts: 330
Done with merge in 11.94 seconds. Equal hosts: 9744, replaced hosts: 324
Done with merge in 12.71 seconds. Equal hosts: 9744, replaced hosts: 324
Done with merge in 12.67 seconds. Equal hosts: 9744, replaced hosts: 324
Done with merge in 12.76 seconds. Equal hosts: 9744, replaced hosts: 324

Seems potentially like an issue with SourceMergerProcess and the comparison between the model created from the merged source model and the model in the hosts table:

if current_host:
if current_host == host:
# logging.debug(f"Host <{host['hostname']}> from source <{source}> is equal to current host")
return HostAction.NO_CHANGE
else:
# logging.debug(f"Replaced host <{host['hostname']}> from source <{source}>")
cursor.execute(
f"UPDATE {self.db_hosts_table} SET data = %s WHERE data->>'hostname' = %s",
[host.model_dump_json(), host.hostname],
)
return HostAction.UPDATE

Where in Pydantic V1 the models would match, but in V2 they don't for whatever reason. Need to attach a debugger and investigate which models are affected. The number of models affected each merge iteration is very similar, so we should expect to see the same models affected each time. What do they have in common?

@pederhan
Copy link
Member Author

pederhan commented Aug 9, 2023

This happens due to a host modifier assigning a new interface to the hosts in the form of a dict, not as an instance of models.Interface. In Pydantic V1, the equality check between the following two objects would be True:

from zabbix_auto_config.models import Host, Interface

host1 = Host(
    enabled=False,
    hostname='foo.example.com',
    interfaces=[],
    # ...
)
host1.append(
    {
        'details': {'version': 2, 'community': '{$SNMP_COMMUNITY}'},
        'endpoint': 'foo.example.com',
        'port': '161',
        'type': 2,
    }
)

host2 = Host(
    enabled=False,
    hostname='foo.example.com',
    interfaces=[
        {
            'details': {'version': 2, 'community': '{$SNMP_COMMUNITY}'},
            'endpoint': 'foo.example.com',
            'port': '161',
            'type': 2,
        }
    ],
)

assert host1 == host2

But in Pydantic V2, the list of dicts in host2 is not converted to a list of Interface objects during the comparison operation, thereby always returning False when comparing the two. Even calling Host.parse_obj(host2) will not yield an object where the list of dicts are converted to a list of Interface objects.

@pederhan
Copy link
Member Author

pederhan commented Aug 10, 2023

After some more testing, it seems the only way to rebuild a model where submodels are re-validated in Pydantic V2 is to add revalidate_instances="always" to the model config:

class Host(BaseModel):
    # ...
    model_config = ConfigDict(validate_assignment=True, revalidate_instances="always")

Source: https://docs.pydantic.dev/latest/usage/model_config/#revalidate-instances


While that is a decent workaround, it ignores the fact that appending the wrong data type to a field is marked as an error by type checkers such as mypy, so it shouldn't even happen in the first place:

image

Without modifying the code to accommodate this erroneous use of the library, this PR would introduce a breaking change, but largely in the form of being less lenient with existing erroneous code. We are not introducing some arcane set of requirements or changing the "best practice" way of interacting with the models.

We could have a transition period to Pydantic V2 where do in fact re-validate each modification, but announce that we will eventually stop doing it? I'll add re-validation of Hosts for each modifier for now. But seeing as this problem is caught by static type checkers and we have to enable optional functionality (revalidate_instances) to even get validation to pass, it's a signal that this behavior should NOT be supported by us in the near future. This will introduce N_hosts * N_modifiers new validation calls, which can be quite a few... Need to test the performance of this change so we don't waste away all the Pydantic V2 performance gains.

@pederhan
Copy link
Member Author

pederhan commented Aug 10, 2023

Need to test the performance of this chance so we don't waste away all the Pydantic V2 performance gains.

Statistic V1 (seconds) V2 Revalidation (seconds) V2 No Revalidation (seconds)
Mean 7.56 4.51 (↓40.34%) 4.22 (↓44.18%)
Median 7.55 4.54 (↓39.87%) 4.23 (↓43.97%)
95th Percentile 7.60 5.04 (↓33.68%) 4.90 (↓35.53%)
99th Percentile 7.75 5.17 (↓33.29%) 5.21 (↓32.77%)
Range 0.34 0.97 (↑185.29%) 1.50 (↑341.18%)
Data

V1 (See #63)

Statistic Value (seconds)
Mean 7.56
Median 7.55
95th Percentile 7.60
99th Percentile 7.75
Range 0.34
Done with merge in 7.75 seconds. Equal hosts: 10041
Done with merge in 7.60 seconds. Equal hosts: 10041
Done with merge in 7.51 seconds. Equal hosts: 10041
Done with merge in 7.41 seconds. Equal hosts: 10041

V2 Revalidation

Statistic Value (seconds)
Mean 4.51
Median 4.54
95th Percentile 5.04
99th Percentile 5.17
Range 0.97
Done with merge in 4.76 seconds. Equal hosts: 10071
Done with merge in 5.04 seconds. Equal hosts: 10077
Done with merge in 4.20 seconds. Equal hosts: 10077
Done with merge in 4.62 seconds. Equal hosts: 10077
Done with merge in 4.39 seconds. Equal hosts: 10077
Done with merge in 4.32 seconds. Equal hosts: 10077
Done with merge in 4.64 seconds. Equal hosts: 10077
Done with merge in 4.76 seconds. Equal hosts: 10077
Done with merge in 4.54 seconds. Equal hosts: 10077
Done with merge in 4.38 seconds. Equal hosts: 10075
Done with merge in 4.57 seconds. Equal hosts: 10060
Done with merge in 4.24 seconds. Equal hosts: 10077
Done with merge in 4.60 seconds. Equal hosts: 10077
Done with merge in 4.21 seconds. Equal hosts: 10077
Done with merge in 4.31 seconds. Equal hosts: 10077
Done with merge in 4.47 seconds. Equal hosts: 10077
Done with merge in 4.75 seconds. Equal hosts: 10077
Done with merge in 5.17 seconds. Equal hosts: 10077
Done with merge in 4.22 seconds. Equal hosts: 10077

V2 No Revalidation (pederhan/zabbix-auto-config@408a9c3)

Statistic Value (seconds)
Mean 4.22
Median 4.23
95th Percentile 4.90
99th Percentile 5.21
Range 1.50
Done with merge in 3.77 seconds. Equal hosts: 10069
Done with merge in 3.71 seconds. Equal hosts: 10074
Done with merge in 3.94 seconds. Equal hosts: 10074
Done with merge in 4.32 seconds. Equal hosts: 10074
Done with merge in 4.15 seconds. Equal hosts: 10074
Done with merge in 3.85 seconds. Equal hosts: 10074
Done with merge in 3.84 seconds. Equal hosts: 10074
Done with merge in 4.23 seconds. Equal hosts: 10074
Done with merge in 5.21 seconds. Equal hosts: 10074
Done with merge in 4.43 seconds. Equal hosts: 10069
Done with merge in 3.96 seconds. Equal hosts: 10075
Done with merge in 4.90 seconds. Equal hosts: 10075
Done with merge in 4.28 seconds. Equal hosts: 10075
Done with merge in 4.49 seconds. Equal hosts: 10075
Done with merge in 4.27 seconds. Equal hosts: 10075
Done with merge in 4.32 seconds. Equal hosts: 10075
Done with merge in 4.23 seconds. Equal hosts: 10075
Done with merge in 4.54 seconds. Equal hosts: 10075
Done with merge in 4.56 seconds. Equal hosts: 10074
Done with merge in 4.53 seconds. Equal hosts: 10063
Done with merge in 4.13 seconds. Equal hosts: 10078
Done with merge in 4.04 seconds. Equal hosts: 10078
Done with merge in 4.81 seconds. Equal hosts: 10078
Done with merge in 4.09 seconds. Equal hosts: 10078
Done with merge in 4.00 seconds. Equal hosts: 10078

Conclusion

We see a ~40% speedup when doing revalidation with Pydantic V2 compared to the existing V1 implementation, while removing the revalidation puts this number up at ~44%. There aren't a lot of data points used to benchmark, so these are just rough estimates, but it gives us a good impression of the overall performance increase that Pydantic V2 provides as well as the cost of performing revalidation of hosts after each modification. The increase in range is likely due to external factors such as DB latency combined with few data points for the V1 benchmark.

In conclusion, the overhead incurred from revalidation is trivial when compared to other actions such as DB reads/writes. The repeat validation lets us maintain backwards compatibility with "improper" host modifiers while giving us more granular diagnostics in case a host modifier modifies a Host to be invalid, which we did not previously have.

@pederhan pederhan merged commit 2650598 into unioslo:master Aug 10, 2023
3 checks passed
@pederhan pederhan deleted the pydantic-v2 branch August 10, 2023 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant