Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Duplicate Balance Sheet and Transaction Sheet Records #128

Closed
2 of 4 tasks
christian-wendlandt-amerilux opened this issue Jul 31, 2024 · 20 comments
Closed
2 of 4 tasks
Assignees
Labels

Comments

@christian-wendlandt-amerilux

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

Multiple records with different md5 hashes in the int_netsuite2__tran_with_converted_amounts table are being joined to the same transactions.

Relevant error log or model output

Failure in test unique_netsuite2__balance_sheet_balance_sheet_id (models/netsuite2.yml)
Failure in test unique_netsuite2__transaction_details_transaction_details_id (models/netsuite2.yml)

Here are some example records from netsuite2__transaction_details. Sensitive columns have been replaces with ****.

TO_SUBSIDIARY_ID,TO_SUBSIDIARY_NAME,TO_SUBSIDIARY_CURRENCY_SYMBOL,TRANSACTION_LINE_ID,TRANSACTION_MEMO,IS_TRANSACTION_NON_POSTING,IS_MAIN_LINE,IS_TAX_LINE,IS_CLOSED,TRANSACTION_ID,TRANSACTION_STATUS,TRANSACTION_DATE,TRANSACTION_DUE_DATE,TRANSACTION_TYPE,_FIVETRAN_SYNCED_DATE,TRANSACTION_NUMBER,ENTITY_ID,IS_TRANSACTION_INTERCOMPANY_ADJUSTMENT,ACCOUNTING_PERIOD_ENDING,ACCOUNTING_PERIOD_NAME,ACCOUNTING_PERIOD_ID,IS_ACCOUNTING_PERIOD_ADJUSTMENT,IS_ACCOUNTING_PERIOD_CLOSED,ACCOUNT_NAME,ACCOUNT_TYPE_NAME,ACCOUNT_TYPE_ID,ACCOUNT_ID,ACCOUNT_NUMBER,IS_ACCOUNT_LEFTSIDE,IS_ACCOUNTS_PAYABLE,IS_ACCOUNTS_RECEIVABLE,IS_ACCOUNT_INTERCOMPANY,PARENT_ACCOUNT_NAME,IS_EXPENSE_ACCOUNT,IS_INCOME_ACCOUNT,COMPANY_NAME,CUSTOMER_CITY,CUSTOMER_STATE,CUSTOMER_ZIPCODE,CUSTOMER_COUNTRY,CUSTOMER_DATE_FIRST_ORDER,CUSTOMER_EXTERNAL_ID,CLASS_FULL_NAME,ITEM_ID,ITEM_NAME,ITEM_TYPE_NAME,SALES_DESCRIPTION,LOCATION_NAME,LOCATION_CITY,LOCATION_COUNTRY,VENDOR_NAME,VENDOR_CREATE_DATE,CURRENCY_NAME,CURRENCY_SYMBOL,DEPARTMENT_ID,DEPARTMENT_NAME,SUBSIDIARY_ID,SUBSIDIARY_NAME,CONVERTED_AMOUNT,TRANSACTION_AMOUNT,TRANSACTION_DETAILS_ID
1,****,USD,1,LOWES RMA REQUEST PO #266278322 - CORRECTED BOL,false,false,false,true,612140,A,2024-07-25 00:00:00.000 Z,2024-09-08 00:00:00.000 Z,VendBill,2024-07-30,VENDBILL9225,630,,2024-07-31 00:00:00.000 Z,Jul 2024,11,false,false,NON-MATERIAL COST OF GOODS SOLD : DIRECT OVERHEAD COST OF GOODS SOLD : FREIGHT OUT : LTL FREIGHT OUT,Cost of Goods Sold,COGS,921,47310,true,false,false,false,NON-MATERIAL COST OF GOODS SOLD : DIRECT OVERHEAD COST OF GOODS SOLD : FREIGHT OUT,false,false,,,,,,,,****,,,,,****,****,****,****,2024-02-29 11:26:01.000 Z,USD,USD,,,1,****,39.97,39.97,9deac63901cef072f2c5037ac122729b
1,****,USD,1,LOWES RMA REQUEST PO #266278322 - CORRECTED BOL,false,false,false,true,612140,A,2024-07-25 00:00:00.000 Z,2024-09-08 00:00:00.000 Z,VendBill,2024-07-30,VENDBILL9225,630,,2024-07-31 00:00:00.000 Z,Jul 2024,11,false,false,NON-MATERIAL COST OF GOODS SOLD : DIRECT OVERHEAD COST OF GOODS SOLD : FREIGHT OUT : LTL FREIGHT OUT,Cost of Goods Sold,COGS,921,47310,true,false,false,false,NON-MATERIAL COST OF GOODS SOLD : DIRECT OVERHEAD COST OF GOODS SOLD : FREIGHT OUT,false,false,,,,,,,,****,,,,,****,****,****,****,2024-02-29 11:26:01.000 Z,USD,USD,,,1,****,39.97,39.97,9deac63901cef072f2c5037ac122729b

Expected behavior

I expect to not have uniqueness errors.

dbt Project configurations

vars:
netsuite_data_model: netsuite2
netsuite_database: 'PROD_FIVETRAN'
netsuite_schema: netsuite_suiteanalytics
netsuite2__multibook_accounting_enabled: false # False by default. Disable accountingbooksubsidiary and accountingbook if you are not using the Multi-Book Accounting feature
netsuite2__using_to_subsidiary: true # False by default.
netsuite2__using_exchange_rate: true #True by default. Disable exchange_rate if you don't utilize exchange rates. If you set this variable to false, ensure it is scoped globally so that the netsuite_source package can access it as well.
netsuite2__using_vendor_categories: false # True by default. Disable vendorcategory if you don't categorize your vendors
netsuite2__using_jobs: false # True by default. Disable job if you don't use jobs

Package versions

packages:

  • package: dbt-labs/dbt_utils
    version: 1.1.0
  • package: calogica/dbt_expectations
    version: 0.8.5
  • package: fivetran/netsuite
    version: 0.13.0

What database are you using dbt with?

snowflake

dbt Version

  • installed: 1.5.9

Additional Context

I found no duplicate transactions in the source models.

Doing a full refresh only fixes the problem temporarily.

I've narrowed the problem down to this join logic that I found in "target\compiled\netsuite\models\netsuite2\netsuite2__transaction_details.sql".

transaction_details as (
  select
  ...
  left join transactions_with_converted_amounts
    on transactions_with_converted_amounts.transaction_line_id = transaction_lines.transaction_line_id
      and transactions_with_converted_amounts.transaction_id = transaction_lines.transaction_id
      and transactions_with_converted_amounts.transaction_accounting_period_id = transactions_with_converted_amounts.reporting_accounting_period_id
  ....
)

There are two records in int_netsuite2__tran_with_converted_amounts that are almost identical but have different md5 hashes. It appears that the ACCOUNT_ID of the record was changed, and now the package sees this as two different transactions for this model but not for the netsuite2__balance_sheet and netsuite2__transaction_details models.

TRANSACTION_ID,TRANSACTION_LINE_ID,SUBSIDIARY_ID,ACCOUNT_ID,TRANSACTION_ACCOUNTING_PERIOD_ID,UNCONVERTED_AMOUNT,_FIVETRAN_SYNCED_DATE,REPORTING_ACCOUNTING_PERIOD_ID,EXCHANGE_RATE_REPORTING_PERIOD,EXCHANGE_RATE_TRANSACTION_PERIOD,TO_SUBSIDIARY_ID,TO_SUBSIDIARY_NAME,TO_SUBSIDIARY_CURRENCY_SYMBOL,CONVERTED_AMOUNT_USING_TRANSACTION_ACCOUNTING_PERIOD,CONVERTED_AMOUNT_USING_REPORTING_MONTH,IS_INCOME_STATEMENT,ACCOUNT_CATEGORY,TRAN_WITH_CONVERTED_AMOUNTS_ID
612140,1,1,921,11,39.97,2024-07-30,11,1,1,1,AmeriLux International,USD,39.97,39.97,true,Expense,497b9917e06ffc917c8c78e024dbdb94
612140,1,1,941,11,39.97,2024-07-27,11,1,1,1,AmeriLux International,USD,39.97,39.97,true,Expense,3994190e8f3b4bd87bf12733cdbf77d7

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance and will schedule time during our office hours for guidance
  • No.
@fivetran-joemarkiewicz
Copy link
Contributor

Hi @christian-wendlandt-amerilux thanks for opening this issue and providing a thorough investigation and explanation of what seems to be occurring.

One item in your initial issue description that immediately jumped out to me was the following:

It appears that the ACCOUNT_ID of the record was changed

Would you be able to add some more details to how the account_id of the record changed? Did this happen within Netsuite itself, or does this seem to be a bug within the package code itself? Additionally, you mentioned these duplicates are not appearing in the BS or IS models. Are you able to see the similar account_id changing in those model results?

@fivetran-joemarkiewicz fivetran-joemarkiewicz added the status:scoping Currently being scoped label Jul 31, 2024
@christian-wendlandt-amerilux
Copy link
Author

Would you be able to add some more details to how the account_id of the record changed? Did this happen within Netsuite itself, or does this seem to be a bug within the package code itself? Additionally, you mentioned these duplicates are not appearing in the BS or IS models. Are you able to see the similar account_id changing in those model results?

It appears that the account was changed by an accountant in NetSuite on purpose.
chrome_5Kz0QJIYaK

When looking up the transaction in the source models, only the version with the new account number shows up.
It appears that (with my package configuration at least), account_id is only considered part of the primary key for int_netsuite2__tran_with_converted_amounts model, which is why another record was created. I've ran that model again as a query and can confirm that the old transaction doesn't exist. It's only still in the database table because that model is incremental, and the old record has a different hash.

@christian-wendlandt-amerilux
Copy link
Author

It appears that (with my package configuration at least), account_id is only considered part of the primary key for int_netsuite2__tran_with_converted_amounts model, which is why another record was created.

Not saying that the extra record is the issue. Maybe my configuration variables are wrong, or maybe the join isn't strict enough.
Here is how that join looks in the source file:

left join transactions_with_converted_amounts
    on transactions_with_converted_amounts.transaction_line_id = transaction_lines.transaction_line_id
      and transactions_with_converted_amounts.transaction_id = transaction_lines.transaction_id
      and transactions_with_converted_amounts.transaction_accounting_period_id = transactions_with_converted_amounts.reporting_accounting_period_id
      
      {% if var('netsuite2__multibook_accounting_enabled', false) %}
      and transactions_with_converted_amounts.accounting_book_id = transaction_lines.accounting_book_id
      {% endif %}

@fivetran-joemarkiewicz
Copy link
Contributor

@christian-wendlandt-amerilux thanks again for sharing the additional details here.

I am on the fence between if this is a result of some incremental logic tradeoffs as described within Issue #121 where we recommend periodic full refreshes to ensure data quality, or if this is a join update as you described. It's tricky because I was able to recreate the issue in my test environment and found both solutions to work. However, I'm a bit apprehensive to implement the code changes as I am not entirely certain at this moment if this could have other downstream implications not yet considered.

If you want, I would welcome you to try out the branch in your packages.yml with these changes and let me know if you are able to see this issue resolved. In the meantime we can continue the discussion if this makes sense to include in the package code. I welcome any other users of this package to join in the discussion in case there are more factors to consider.

packages:
  - git: https://github.com/fivetran/dbt_netsuite.git
    revision: bugfix/changing-transactions
    warn-unpinned: false 

Let me know your thoughts. Thanks!

@christian-wendlandt-amerilux
Copy link
Author

@christian-wendlandt-amerilux thanks again for sharing the additional details here.

I am on the fence between if this is a result of some incremental logic tradeoffs as described within Issue #121 where we recommend periodic full refreshes to ensure data quality, or if this is a join update as you described. It's tricky because I was able to recreate the issue in my test environment and found both solutions to work. However, I'm a bit apprehensive to implement the code changes as I am not entirely certain at this moment if this could have other downstream implications not yet considered.

If you want, I would welcome you to try out the branch in your packages.yml with these changes and let me know if you are able to see this issue resolved. In the meantime we can continue the discussion if this makes sense to include in the package code. I welcome any other users of this package to join in the discussion in case there are more factors to consider.

packages:
  - git: https://github.com/fivetran/dbt_netsuite.git
    revision: bugfix/changing-transactions
    warn-unpinned: false 

Let me know your thoughts. Thanks!

Thank you @fivetran-joemarkiewicz , I really appreciate the quick fix!
I just started using this package for my organization, so I'm also fearful of causing any downstream errors.
I'm happy to try it out and share the results.
I'll come back in a week or two.

If this is a legitimate fix, I don't think that I would classify the problem as a "data quality" issue, since the bug is originating from a partial key join. Requiring full refreshes seems like a Band-Aid solution if that's truly what the problem is.

I'm not asking you to try this, but perhaps the solution might be to remove account_id from the composite key.
I don't know enough about NetSuite to understand why it is being included.

Thanks again for the assistance.

@fivetran-joemarkiewicz
Copy link
Contributor

@christian-wendlandt-amerilux thanks so much for the information here. I'll keep this issue open for a while and will wait to hear back the results if the issue no longer persists. I will keep this branch locked unless otherwise mentioned here so you will not see any unexpected changes.

If this does seem to address the issue and is a legitimate fix then we will explore adding this to the package in the next release. Thanks for your collaboration here and look forward to hearing back if this is a legitimate long term fix for you!

@fivetran-joemarkiewicz fivetran-joemarkiewicz added the status:stale Issue was blocked or had no user response for more than 30 days label Nov 13, 2024
@christian-wendlandt-amerilux
Copy link
Author

Sorry for the very late status update.

My organization had to delay the implementation of the NetSuite pipeline and I forgot about this issue.

I've been running the revision for a few weeks and the uniqueness error never came back.

@fivetran-catfritz fivetran-catfritz mentioned this issue Nov 26, 2024
7 tasks
@fivetran-catfritz fivetran-catfritz linked a pull request Nov 26, 2024 that will close this issue
7 tasks
@fivetran-catfritz fivetran-catfritz removed a link to a pull request Dec 2, 2024
7 tasks
@fivetran-joemarkiewicz
Copy link
Contributor

@christian-wendlandt-amerilux thanks for the response that all looks to be well since using the branch we shared earlier! We were planning on folding these changes into the most recent release of the dbt_netsuite package; however, during our release review stage @fivetran-reneeli on our team called out that the joins we added in the branch likely would need to be added in the transaction_line sections of the queries as well as the transaction_detail.

This made me realize that the code updates I applied to the branch you're using were actually not enabled per your variable configs shared in the description of the ticket. The joins I added would only have been used if the income_statement_transaction_detail_columns or balance_sheet_transaction_detail_columns variables were populated due to the code here and here in the income statement and balance sheet models respectively.

As such, this revelation made it clear that I don't believe we have been testing this code on your data the last few weeks. Ultimately, we chose to remove this update as we were unsure of the consequences of adding the join condition for others in both the transaction_detail and transaction_line joins within the two end models.

Further, since you mentioned you haven't seen the issue in the last few weeks I do wonder if this was an incremental mismatch error that caused this duplicate record discrepancy. Fortunately, in the most recent release we also updated the incremental logic to be more robust. Ideally if the initially error was a result of the incremental logic, we very likely could have patched that in the recent release.

Let me know your thoughts. If you would like to explore the join condition in the transaction_line joins then we can modify the branch you're currently using. Also, please let me know if you see the initial error again. Thanks!

@fivetran-joemarkiewicz fivetran-joemarkiewicz removed the status:stale Issue was blocked or had no user response for more than 30 days label Dec 4, 2024
@christian-wendlandt-amerilux
Copy link
Author

christian-wendlandt-amerilux commented Dec 4, 2024

Hi @fivetran-joemarkiewicz

I actually pinned your original bugfix commit because I needed a stable version to continue development.
I've not been testing changes to the branch, only the bugfix.
This is how I've been importing the package:

  - git: https://github.com/fivetran/dbt_netsuite.git
    revision: c5487fea195d899b05b3bcf7f384602f146b6487
    warn-unpinned: false

I'm sorry if this has caused confusion in the development process.

@fivetran-joemarkiewicz
Copy link
Contributor

Hi @christian-wendlandt-amerilux no worries at all! That revision is actually the one where I noticed the changes were made to sections of code not being used by your setup. For example, we can see the below code is only run when the balance_sheet_transaction_detail_columns variable is populated. However, since your variable section doesn't leverage this variable, this entire code block is not compiled during model execution.

{% if var('balance_sheet_transaction_detail_columns') != []%}
left join transaction_details
on transaction_details.transaction_id = transactions_with_converted_amounts.transaction_id
and transaction_details.transaction_line_id = transactions_with_converted_amounts.transaction_line_id
and transaction_details.account_id = transactions_with_converted_amounts.account_id
{% if var('netsuite2__multibook_accounting_enabled', false) %}
and transaction_details.accounting_book_id = transactions_with_converted_amounts.accounting_book_id
{% endif %}
{% if var('netsuite2__using_to_subsidiary', false) and var('netsuite2__using_exchange_rate', true) %}
and transaction_details.to_subsidiary_id = transactions_with_converted_amounts.to_subsidiary_id
{% endif %}
{% endif %}

Therefore, I'm not sure if this is the fix we were expecting. Fortunately, as you mentioned there hasn't been a recurrence of this issue since using the pinned version. This makes me believe the issue was a result of the incremental strategy. Especially since the unique fields for these models are using the account_name (which is used in the incremental strategy) and if a new account_name is found then it's understood that we would see these duplicate records in the models.

I'll continue to look into this to see if there is a more optimal way to address this dynamically while also taking advantage of the incremental strategy. However, in the meantime if you imagine something like this name change could happen again in the future then you can always turn off the incremental strategy used in these models to leverage a full refresh approach which will completely rebuild the model contents each run. This likely will result in more compute being used and possibly longer runtimes, but this will ensure data accuracy going forward. On the flip side, you could keep the current approach and rely on the dbt tests to flag any future inconsistencies and then run the periodic full refreshes if you encounter any test failures.

@christian-wendlandt-amerilux
Copy link
Author

Hi @fivetran-joemarkiewicz

I don't believe that this is related to the incremental strategy.
I'm running the package in Snowflake, which seems to default to only use view and ephemeral materialization.
Without any hard materialization, I don't see how the incremental strategy would come into play.

In my project, I'm only creating tables from the netsuite2__transaction_details view.
So netsuite2__income_statement and netsuite2__balance_sheet are never queried or materialized.
I've confirmed this with Snowflake's "lineage" feature.

9OM16ijEzj

However, since your variable section doesn't leverage this variable, this entire code block is not compiled during model execution.

I'm failing to see how this correlates to netsuite2__transaction_details, since the details view doesn't use the same if-else conditional statement, and it's not an ancestor of those two other models.

Especially since the unique fields for these models are using the account_name (which is used in the incremental strategy) and if a new account_name is found then it's understood that we would see these duplicate records in the models.

I'm unfamiliar with how account_name becomes part of primary key for transaction details.
It seems to have something to do with converted amounts. Although it seems to use account_id and reporting_accounting_period_id instead.

surrogate_key as ( 
   -- add 'source_relation' when combining with union schema
  
  select 
    *,
    md5(cast(coalesce(cast(transaction_line_id as TEXT), '_dbt_utils_surrogate_key_null_') || '-' || coalesce(cast(transaction_id as TEXT), '_dbt_utils_surrogate_key_null_') || '-' || coalesce(cast(account_id as TEXT), '_dbt_utils_surrogate_key_null_') || '-' || coalesce(cast(reporting_accounting_period_id as TEXT), '_dbt_utils_surrogate_key_null_') as TEXT)) as tran_with_converted_amounts_id

  from transactions_with_converted_amounts
)

I agree that adding an extra condition to the JOIN condition feels hacky and there may be a more correct fix that involves how accounts are handled.
At least in my NetSuite configuration (according to my oa_fkeys table), transaction lines aren't ever associated with more than one account and they are already unique when given the transaction and transaction line (id) fields.

PKTABLE_QUALIFIER PKTABLE_OWNER PKTABLE_NAME PKCOLUMN_NAME FKTABLE_QUALIFIER FKTABLE_OWNER FKTABLE_NAME FKCOLUMN_NAME KEY_SEQ UPDATE_RULE DELETE_RULE FK_NAME PK_NAME
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC NextTransactionLineLink nextdoc 1     NextTransactionLineLink_nextdoc_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction         1       transactionLine_PK
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC NextTransactionAccountingLineLink nextline 1     NextTransactionAccountingLineLink_nextline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC PreviousTransactionAccountingLineLink nextline 1     PreviousTransactionAccountingLineLink_nextline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC PreviousTransactionAccountingLineLink previousline 1     PreviousTransactionAccountingLineLink_previousline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC NextTransactionAccountingLineLink previousline 1     NextTransactionAccountingLineLink_previousline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC PreviousTransactionAccountingLineLink previousdoc 1     PreviousTransactionAccountingLineLink_previousdoc_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC NextTransactionAccountingLineLink previousdoc 1     NextTransactionAccountingLineLink_previousdoc_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine uniquekey AmeriLux International, LLC AmeriLux - JDBC AppliedCreditTransactionLineLink nextline 1     AppliedCreditTransactionLineLink_nextline_transactionLine_uniquekey  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC inventoryAssignment transactionline 1     inventoryAssignment_transactionline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine uniquekey AmeriLux International, LLC AmeriLux - JDBC SystemNote linetransactionid 1     SystemNote_linetransactionid_transactionLine_uniquekey  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC NextTransactionLineLink previousline 1     NextTransactionLineLink_previousline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC TransactionAccountingLine transaction 1     TransactionAccountingLine_transaction_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC TransactionAccountingLine transactionline 1     TransactionAccountingLine_transactionline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC NextTransactionAccountingLineLink nextdoc 1     NextTransactionAccountingLineLink_nextdoc_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC NextTransactionLineLink nextline 1     NextTransactionLineLink_nextline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC PreviousTransactionLineLink nextline 1     PreviousTransactionLineLink_nextline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC PreviousTransactionAccountingLineLink nextdoc 1     PreviousTransactionAccountingLineLink_nextdoc_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC PreviousTransactionLineLink nextdoc 1     PreviousTransactionLineLink_nextdoc_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine uniquekey AmeriLux International, LLC AmeriLux - JDBC AppliedCreditTransactionLineLink previousline 1     AppliedCreditTransactionLineLink_previousline_transactionLine_uniquekey  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC NextTransactionLineLink previousdoc 1     NextTransactionLineLink_previousdoc_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC PreviousTransactionLineLink previousdoc 1     PreviousTransactionLineLink_previousdoc_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC PreviousTransactionLineLink previousline 1     PreviousTransactionLineLink_previousline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine id         2       transactionLine_PK
AmeriLux International, LLC AmeriLux - JDBC transactionLine id AmeriLux International, LLC AmeriLux - JDBC TransactionAccountingLineCostComponent transactionline 1     TransactionAccountingLineCostComponent_transactionline_transactionLine_id  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC TransactionAccountingLineCostComponent transaction 1     TransactionAccountingLineCostComponent_transaction_transactionLine_transaction  
AmeriLux International, LLC AmeriLux - JDBC transactionLine transaction AmeriLux International, LLC AmeriLux - JDBC inventoryAssignment transaction 1     inventoryAssignment_transaction_transactionLine_transaction  

Whether the fix involves changing the JOIN logic or the something else, the issue still remains.
Does your dev process require a new issue to be opened if a different fix is required?

@fivetran-joemarkiewicz
Copy link
Contributor

Whether the fix involves changing the JOIN logic or the something else, the issue still remains.
Does your dev process require a new issue to be opened if a different fix is required?

Thanks for this additional context @christian-wendlandt-amerilux and thank you for confirming the issue does still remain on your end. Since this is still the same issue we can keep this ticket open and continue our investigation and ideal resolution here. Unfortunately, I will be out today but will review your most recent response in it's entirety when I'm back tomorrow and share any findings/observations.

I do want to callout that the netsuite2__transaction_details model is intended to be incremental for Snowflake destinations (see here) , but I do see that your models are being materialized as views. Are you making any model materialization changes on your end by chance? Regardless, since we do see that these are materialized as views on your end I agree that this likely is not the result of the incremental strategy as the incremental strategy wouldn't be applied to a materialized view. Apologies for not noticing that earlier.

@christian-wendlandt-amerilux
Copy link
Author

You're right!

Looks like I configured these to materialize as views.

models:
  netsuite:
    +materialized: view
    +schema: netsuite
    netsuite:
      intermediate:
        +materialized: ephemeral
    netsuite2:
      intermediate:
        +materialized: ephemeral

Oh no.
This change entered the main branch together with your bugfix.
So I can't tell if your bugfix was tested in isolation.
I'm going to revert the change to materialization to make sure that this is being tested correctly.

@fivetran-joemarkiewicz
Copy link
Contributor

Ahh okay this makes a lot of sense and makes me think the issue was initially caused by the incremental approach. The reason it seemed to be addressed on your end is because the incremental strategy has been removed with the view materialization. I still don't believe my bugfix will address the issue you encountered, I'm currently trying to recreate the error on my end in order to more effectively triage and see what other solutions we could apply as opposed to running a full refresh.

That all being said, you should be able to leverage the non incremental solution in the meantime to ensure you do not encounter duplicates in cases when the transaction components are updated in the short term until we're able to address this fix directly in the incremental logic.

@fivetran-joemarkiewicz
Copy link
Contributor

@christian-wendlandt-amerilux I've been able to recreate this issue locally! So far I've been able to come to the following understanding of this issue:

  • The root of this issue is in the incremental strategy within the int_netsuite2__tran_with_converted_amounts model
  • In this incremental logic we're leveraging the _fivetran_synced field in the transactions source table to identify "new" records.
  • However, to make sure we capture late arriving records we employ a lookback window of a few days. This means if the transaction in question was synced and then had the account_id updated in the last 3 days then it will still be included in the incremental load.
  • This causes an issue because we still have a valid record present in the model from the previous incremental load. However, the incremental strategy doesn't remove the old record because as you called out the logic thinks it to still be a unique record.
  • Due to this logic, we now confusingly have unique records (according to the tran_with_converted_amounts_id field because it factors in account_id), but in actuality we have duplicates which will persist in the downstream netsuite2__transaction_details model as a result of this join (does not include account_id in the join and therefore causes duplicates).
  • As such, we could add a join condition in the above mentioned netsuite2__transaction_details join to resolve this issue. However, that is not the right fix since we still technically have these duplicate transactions in the int_netsuite2__tran_with_converted_amounts model.

I have a few ideas for how we could possibly address this in a more robust and reliable way, but I'll want to sync with my team to make sure I'm covering all bases. I'll plan to reply back to this thread early next week with some ideas and can provide a new branch with a possible bugfix to address this issue.

That being said, as I mentioned in my previous comment, this issue should only be a result of the incremental strategy. I was able to see this issue be resolved after subsequent incremental runs or by taking the approach you've taken and materialize the models as ephemeral/view. As such, you shouldn't see this error in the meantime if you keep that model configuration. Ideally we will be able to address this directly so you don't need to override the model materializtions.

@fivetran-joemarkiewicz
Copy link
Contributor

I discussed this with the team and we found that the best course of action will be to remove the incremental strategy for the int_netsuite2__tran_with_converted_amounts model in the next release of the Netsuite dbt package. This will align the model with the same behavior for BigQuery and Databricks and also ensure issues like the one raised here do not persist regardless of subsequent incremental runs.

I will accept this into the upcoming sprint (starting this Thursday). The change will be fairly straightforward. We will simply update this logic in the model by replacing it with the below config to make it ephemeral for all destinations. We will then document the removal of the incremental strategy and make this a breaking change.

{{
  config(
    enabled=var('netsuite_data_model', 'netsuite') == var('netsuite_data_model_override','netsuite2'),
    materialized='ephemeral' 
  )
}}

I know you initially mentioned you would be open to contributing. If so, we would be happy to review a PR with the above changes and fold them into the upcoming release. If not, no worries and we will still address this in our upcoming sprint.

Thanks for working through this with myself and our team!

@fivetran-joemarkiewicz fivetran-joemarkiewicz added good first issue Good for newcomers and removed status:scoping Currently being scoped labels Dec 10, 2024
@christian-wendlandt-amerilux
Copy link
Author

That all sounds good.
What are the steps that I would need to perform for the PR request?
If defending the request is part of the PR process, then I probably don't have time for it.

@fivetran-joemarkiewicz
Copy link
Contributor

If you're interested in contributing the PR you can follow these steps. Taking on this PR is not required for this to be folded in. Our team is happy to take this on in the upcoming sprint (starting tomorrow). However, I wanted to present the opportunity in case you were still interested and had the time for it. If not, then no worries and we will handle the PR and release this coming sprint. Thanks!

@christian-wendlandt-amerilux
Copy link
Author

Alright. I don't wish to take on the pull request.

@fivetran-jamie
Copy link
Contributor

The fix for this is live in v0.17.0 of the package! I'll close this but feel free to re-open if the issue persists for you @christian-wendlandt-amerilux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants