We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For some reason, I was unable to run soda with lesser number of rows
create table Employee ( id int primary key, name varchar(255) ); insert into Employee (id, name) values (1, 'Alice'); insert into Employee (id, name) values (2, 'Bob'); insert into Employee (id, name) values (3, 'Alice'); insert into Employee (id, name) values (11, 'Alice'); insert into Employee (id, name) values (12, 'Bob'); insert into Employee (id, name) values (13, 'Alice'); insert into Employee (id, name) values (21, 'Alice'); insert into Employee (id, name) values (22, 'Bob'); insert into Employee (id, name) values (23, 'Alice'); insert into Employee (id, name) values (31, 'Alice'); insert into Employee (id, name) values (32, 'Bob'); insert into Employee (id, name) values (33, 'Alice'); insert into Employee (id, name) values (41, 'Alice'); insert into Employee (id, name) values (42, 'Bob'); insert into Employee (id, name) values (43, 'Alice'); insert into Employee (id, name) values (51, 'Alice'); insert into Employee (id, name) values (52, 'Bob'); insert into Employee (id, name) values (53, 'Alice');
checks for Employee: - row_count = 18 - distribution_difference(name) < 0.05: method: chi_square distribution reference file: ./distribution.yaml
with distribution.yaml:
distribution.yaml
dataset: employee column: name distribution_type: categorical distribution_reference: weights: - 0.7 - 0.3 bins: - Alice - Bob
chi_square statistic is close to zero, since the number of Alice rows is 12 and Bob's is 6
the statistic value is high (~0.6)
When I change the order of weights but not the bins, the statistic is OK
The text was updated successfully, but these errors were encountered:
CLOUD-8980
Sorry, something went wrong.
No branches or pull requests
Steps to reproduce
data.sql
For some reason, I was unable to run soda with lesser number of rows
with
distribution.yaml
:Expected behavior
chi_square statistic is close to zero, since the number of Alice rows is 12 and Bob's is 6
Actual behavior
the statistic value is high (~0.6)
Misc
When I change the order of weights but not the bins, the statistic is OK
The text was updated successfully, but these errors were encountered: