Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 Column missed in s_web_returns #22

Closed
eisenwang opened this issue Nov 2, 2016 · 6 comments
Closed

1 Column missed in s_web_returns #22

eisenwang opened this issue Nov 2, 2016 · 6 comments
Labels

Comments

@eisenwang
Copy link

Dear all,
When I'm preparing the refresh data, I created the source table with tpcds_source.sql. In this sql file, there's 1 table named s_web_returns. The statement is --
create table s_web_returns
(
wret_web_site_id char(16) ,
wret_order_id integer not null,
wret_line_number integer not null,
wret_item_id char(16) not null,
wret_return_customer_id char(16) ,
wret_refund_customer_id char(16) ,
wret_return_date char(10) ,
wret_return_time char(10) ,
wret_return_qty integer ,
wret_return_amt numeric(7,2) ,
wret_return_tax numeric(7,2) ,
wret_return_fee numeric(7,2) ,
wret_return_ship_cost numeric(7,2) ,
wret_refunded_cash numeric(7,2) ,
wret_reversed_charge numeric(7,2) ,
wret_account_credit numeric(7,2) ,
wret_reason_id char(16)
);

But while creating the fact view wrv on it, we can see an error -- Column 'wret_web_page_id' not found, and by check for wrv's statement --

CREATE VIEW wrv AS
SELECT
d_date_sk wr_return_date_sk
,t_time_sk wr_return_time_sk
,i_item_sk wr_item_sk
,c1.c_customer_sk wr_refunded_customer_sk
,c1.c_current_cdemo_sk wr_refunded_cdemo_sk
,c1.c_current_hdemo_sk wr_refunded_hdemo_sk
,c1.c_current_addr_sk wr_refunded_addr_sk
,c2.c_customer_sk wr_returning_customer_sk
,c2.c_current_cdemo_sk wr_returning_cdemo_sk
,c2.c_current_hdemo_sk wr_returning_hdemo_sk
,c2.c_current_addr_sk wr_returing_addr_sk
,wp_web_page_sk wr_web_page_sk
,r_reason_sk wr_reason_sk
,wret_order_id wr_order_number
,wret_return_qty wr_return_quantity
,wret_return_amt wr_return_amt
,wret_return_tax wr_return_tax
,wret_return_amt + wret_return_tax AS wr_return_amt_inc_tax
,wret_return_fee wr_fee
,wret_return_ship_cost wr_return_ship_cost
,wret_refunded_cash wr_refunded_cash
,wret_reversed_charge wr_reversed_charge
,wret_account_credit wr_account_credit
,wret_return_amt+wret_return_tax+wret_return_fee
-wret_refunded_cash-wret_reversed_charge-wret_account_credit wr_net_loss
FROM s_web_returns LEFT OUTER JOIN date_dim ON (cast(wret_return_date as date) = d_date)
LEFT OUTER JOIN time_dim ON ((CAST(SUBSTR(wret_return_time,1,2) AS integer)*3600
+CAST(SUBSTR(wret_return_time,4,2) AS integer)*60+CAST(SUBSTR(wret_return_time,7,2) AS integer))=t_time)
LEFT OUTER JOIN item ON (wret_item_id = i_item_id)
LEFT OUTER JOIN customer c1 ON (wret_return_customer_id = c1.c_customer_id)
LEFT OUTER JOIN customer c2 ON (wret_refund_customer_id = c2.c_customer_id)
LEFT OUTER JOIN reason ON (wret_reason_id = r_reason_id)
LEFT OUTER JOIN web_page ON (wret_web_page_id = WP_WEB_PAGE_id)
WHERE i_rec_end_date IS NULL AND wp_rec_end_date IS NULL;

But there's no wret_web_page_id column defined in s_web_returns. So would you please have a look and also modify the refresh data generator to add this column?

Thanks in advance for your help.

Regards
Eisen

@eisenwang
Copy link
Author

eisenwang commented Nov 2, 2016

And also puzzled on the difference between this refresh data list and the official document list --
In "TPC BENCHMARK ™ DS Standard Specification Version 2.3.0" Table 5-4, we can see -- only s_catalog_returns, s_catalog_sales, s_inventory, s_store_returns, s_purchase_lineitem, s_web_returns, s_web_order_lineitem -- 7 source tables. But in refresh data, there are 21 tables --
asiq160@bigDataIQ:/iq_data/dump/refresh> ls s_*.dat
s_call_center_1.dat s_customer_address_1.dat s_store_1.dat s_web_returns_1.dat
s_catalog_order_1.dat s_inventory_1.dat s_store_returns_1.dat s_web_site_1.dat
s_catalog_order_lineitem_1.dat s_item_1.dat s_warehouse_1.dat s_zip_to_gmt_1.dat
s_catalog_page_1.dat s_promotion_1.dat s_web_order_1.dat
s_catalog_returns_1.dat s_purchase_1.dat s_web_order_lineitem_1.dat
s_customer_1.dat s_purchase_lineitem_1.dat s_web_page_1.dat

And in the refresh data, there's no s_catalog_sales data file either...

Would you please tell me what use for the other source tables and how to fix s_catalog_sales data file loss? Thanks

Regards
Eisen

@gregrahn
Copy link
Owner

gregrahn commented Nov 2, 2016

Looks like the issue is that tpcds_source.sql has the first column of s_web_returns as wret_web_site_id, but the spec has it as wret_web_page_id. See page 97 Table A-18: Column definition s_web_returns.

@eisenwang
Copy link
Author

eisenwang commented Nov 2, 2016

Dear Gregrahn,

Thanks a lot. Now the wrv is right. But still no s_catalog_sales refresh data generated... Please kind help. Thanks

Regards
Eisen

@gregrahn
Copy link
Owner

gregrahn commented Nov 2, 2016

It looks like Table 5-4 in the spec has an error in the name s_catalog_sales -- this should be s_catalog_order since the view csv contains the table s_catalog_order and Table 5-4 is the only occurrence of s_catalog_sales in the entire spec. Also see Appendix A. noting the cord_* columns referenced in the csv view belong to table s_catalog_order.

@eisenwang
Copy link
Author

Oh... Got it. Thank you very much

@gregrahn
Copy link
Owner

Fixed via #42

@gregrahn gregrahn added the bug label Oct 25, 2018
gregrahn added a commit that referenced this issue Oct 25, 2018
Add changes from #22 and #42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants