-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Duplicate Balance Sheet and Transaction Sheet Records #128
Comments
Hi @christian-wendlandt-amerilux thanks for opening this issue and providing a thorough investigation and explanation of what seems to be occurring. One item in your initial issue description that immediately jumped out to me was the following:
Would you be able to add some more details to how the account_id of the record changed? Did this happen within Netsuite itself, or does this seem to be a bug within the package code itself? Additionally, you mentioned these duplicates are not appearing in the BS or IS models. Are you able to see the similar account_id changing in those model results? |
Not saying that the extra record is the issue. Maybe my configuration variables are wrong, or maybe the join isn't strict enough.
|
@christian-wendlandt-amerilux thanks again for sharing the additional details here. I am on the fence between if this is a result of some incremental logic tradeoffs as described within Issue #121 where we recommend periodic full refreshes to ensure data quality, or if this is a join update as you described. It's tricky because I was able to recreate the issue in my test environment and found both solutions to work. However, I'm a bit apprehensive to implement the code changes as I am not entirely certain at this moment if this could have other downstream implications not yet considered. If you want, I would welcome you to try out the branch in your packages:
- git: https://github.com/fivetran/dbt_netsuite.git
revision: bugfix/changing-transactions
warn-unpinned: false Let me know your thoughts. Thanks! |
Thank you @fivetran-joemarkiewicz , I really appreciate the quick fix! If this is a legitimate fix, I don't think that I would classify the problem as a "data quality" issue, since the bug is originating from a partial key join. Requiring full refreshes seems like a Band-Aid solution if that's truly what the problem is. I'm not asking you to try this, but perhaps the solution might be to remove account_id from the composite key. Thanks again for the assistance. |
@christian-wendlandt-amerilux thanks so much for the information here. I'll keep this issue open for a while and will wait to hear back the results if the issue no longer persists. I will keep this branch locked unless otherwise mentioned here so you will not see any unexpected changes. If this does seem to address the issue and is a legitimate fix then we will explore adding this to the package in the next release. Thanks for your collaboration here and look forward to hearing back if this is a legitimate long term fix for you! |
Sorry for the very late status update. My organization had to delay the implementation of the NetSuite pipeline and I forgot about this issue. I've been running the revision for a few weeks and the uniqueness error never came back. |
@christian-wendlandt-amerilux thanks for the response that all looks to be well since using the branch we shared earlier! We were planning on folding these changes into the most recent release of the dbt_netsuite package; however, during our release review stage @fivetran-reneeli on our team called out that the joins we added in the branch likely would need to be added in the This made me realize that the code updates I applied to the branch you're using were actually not enabled per your variable configs shared in the description of the ticket. The joins I added would only have been used if the As such, this revelation made it clear that I don't believe we have been testing this code on your data the last few weeks. Ultimately, we chose to remove this update as we were unsure of the consequences of adding the join condition for others in both the transaction_detail and transaction_line joins within the two end models. Further, since you mentioned you haven't seen the issue in the last few weeks I do wonder if this was an incremental mismatch error that caused this duplicate record discrepancy. Fortunately, in the most recent release we also updated the incremental logic to be more robust. Ideally if the initially error was a result of the incremental logic, we very likely could have patched that in the recent release. Let me know your thoughts. If you would like to explore the join condition in the transaction_line joins then we can modify the branch you're currently using. Also, please let me know if you see the initial error again. Thanks! |
I actually pinned your original bugfix commit because I needed a stable version to continue development.
I'm sorry if this has caused confusion in the development process. |
Hi @christian-wendlandt-amerilux no worries at all! That revision is actually the one where I noticed the changes were made to sections of code not being used by your setup. For example, we can see the below code is only run when the dbt_netsuite/models/netsuite2/netsuite2__balance_sheet.sql Lines 245 to 258 in c5487fe
Therefore, I'm not sure if this is the fix we were expecting. Fortunately, as you mentioned there hasn't been a recurrence of this issue since using the pinned version. This makes me believe the issue was a result of the incremental strategy. Especially since the unique fields for these models are using the I'll continue to look into this to see if there is a more optimal way to address this dynamically while also taking advantage of the incremental strategy. However, in the meantime if you imagine something like this name change could happen again in the future then you can always turn off the incremental strategy used in these models to leverage a full refresh approach which will completely rebuild the model contents each run. This likely will result in more compute being used and possibly longer runtimes, but this will ensure data accuracy going forward. On the flip side, you could keep the current approach and rely on the dbt tests to flag any future inconsistencies and then run the periodic full refreshes if you encounter any test failures. |
I don't believe that this is related to the incremental strategy. In my project, I'm only creating tables from the netsuite2__transaction_details view.
I'm failing to see how this correlates to netsuite2__transaction_details, since the details view doesn't use the same if-else conditional statement, and it's not an ancestor of those two other models.
I'm unfamiliar with how account_name becomes part of primary key for transaction details.
I agree that adding an extra condition to the JOIN condition feels hacky and there may be a more correct fix that involves how accounts are handled.
Whether the fix involves changing the JOIN logic or the something else, the issue still remains. |
Thanks for this additional context @christian-wendlandt-amerilux and thank you for confirming the issue does still remain on your end. Since this is still the same issue we can keep this ticket open and continue our investigation and ideal resolution here. Unfortunately, I will be out today but will review your most recent response in it's entirety when I'm back tomorrow and share any findings/observations. I do want to callout that the |
You're right! Looks like I configured these to materialize as views.
Oh no. |
Ahh okay this makes a lot of sense and makes me think the issue was initially caused by the incremental approach. The reason it seemed to be addressed on your end is because the incremental strategy has been removed with the That all being said, you should be able to leverage the non incremental solution in the meantime to ensure you do not encounter duplicates in cases when the transaction components are updated in the short term until we're able to address this fix directly in the incremental logic. |
@christian-wendlandt-amerilux I've been able to recreate this issue locally! So far I've been able to come to the following understanding of this issue:
I have a few ideas for how we could possibly address this in a more robust and reliable way, but I'll want to sync with my team to make sure I'm covering all bases. I'll plan to reply back to this thread early next week with some ideas and can provide a new branch with a possible bugfix to address this issue. That being said, as I mentioned in my previous comment, this issue should only be a result of the incremental strategy. I was able to see this issue be resolved after subsequent incremental runs or by taking the approach you've taken and materialize the models as ephemeral/view. As such, you shouldn't see this error in the meantime if you keep that model configuration. Ideally we will be able to address this directly so you don't need to override the model materializtions. |
I discussed this with the team and we found that the best course of action will be to remove the incremental strategy for the I will accept this into the upcoming sprint (starting this Thursday). The change will be fairly straightforward. We will simply update this logic in the model by replacing it with the below config to make it ephemeral for all destinations. We will then document the removal of the incremental strategy and make this a breaking change. {{
config(
enabled=var('netsuite_data_model', 'netsuite') == var('netsuite_data_model_override','netsuite2'),
materialized='ephemeral'
)
}} I know you initially mentioned you would be open to contributing. If so, we would be happy to review a PR with the above changes and fold them into the upcoming release. If not, no worries and we will still address this in our upcoming sprint. Thanks for working through this with myself and our team! |
That all sounds good. |
If you're interested in contributing the PR you can follow these steps. Taking on this PR is not required for this to be folded in. Our team is happy to take this on in the upcoming sprint (starting tomorrow). However, I wanted to present the opportunity in case you were still interested and had the time for it. If not, then no worries and we will handle the PR and release this coming sprint. Thanks! |
Alright. I don't wish to take on the pull request. |
The fix for this is live in v0.17.0 of the package! I'll close this but feel free to re-open if the issue persists for you @christian-wendlandt-amerilux |
Is there an existing issue for this?
Describe the issue
Multiple records with different md5 hashes in the int_netsuite2__tran_with_converted_amounts table are being joined to the same transactions.
Relevant error log or model output
Expected behavior
I expect to not have uniqueness errors.
dbt Project configurations
vars:
netsuite_data_model: netsuite2
netsuite_database: 'PROD_FIVETRAN'
netsuite_schema: netsuite_suiteanalytics
netsuite2__multibook_accounting_enabled: false # False by default. Disable
accountingbooksubsidiary
andaccountingbook
if you are not using the Multi-Book Accounting featurenetsuite2__using_to_subsidiary: true # False by default.
netsuite2__using_exchange_rate: true #True by default. Disable
exchange_rate
if you don't utilize exchange rates. If you set this variable to false, ensure it is scoped globally so that thenetsuite_source
package can access it as well.netsuite2__using_vendor_categories: false # True by default. Disable
vendorcategory
if you don't categorize your vendorsnetsuite2__using_jobs: false # True by default. Disable
job
if you don't use jobsPackage versions
packages:
version: 1.1.0
version: 0.8.5
version: 0.13.0
What database are you using dbt with?
snowflake
dbt Version
Additional Context
I found no duplicate transactions in the source models.
Doing a full refresh only fixes the problem temporarily.
I've narrowed the problem down to this join logic that I found in "target\compiled\netsuite\models\netsuite2\netsuite2__transaction_details.sql".
There are two records in int_netsuite2__tran_with_converted_amounts that are almost identical but have different md5 hashes. It appears that the ACCOUNT_ID of the record was changed, and now the package sees this as two different transactions for this model but not for the netsuite2__balance_sheet and netsuite2__transaction_details models.
TRANSACTION_ID,TRANSACTION_LINE_ID,SUBSIDIARY_ID,ACCOUNT_ID,TRANSACTION_ACCOUNTING_PERIOD_ID,UNCONVERTED_AMOUNT,_FIVETRAN_SYNCED_DATE,REPORTING_ACCOUNTING_PERIOD_ID,EXCHANGE_RATE_REPORTING_PERIOD,EXCHANGE_RATE_TRANSACTION_PERIOD,TO_SUBSIDIARY_ID,TO_SUBSIDIARY_NAME,TO_SUBSIDIARY_CURRENCY_SYMBOL,CONVERTED_AMOUNT_USING_TRANSACTION_ACCOUNTING_PERIOD,CONVERTED_AMOUNT_USING_REPORTING_MONTH,IS_INCOME_STATEMENT,ACCOUNT_CATEGORY,TRAN_WITH_CONVERTED_AMOUNTS_ID
612140,1,1,921,11,39.97,2024-07-30,11,1,1,1,AmeriLux International,USD,39.97,39.97,true,Expense,497b9917e06ffc917c8c78e024dbdb94
612140,1,1,941,11,39.97,2024-07-27,11,1,1,1,AmeriLux International,USD,39.97,39.97,true,Expense,3994190e8f3b4bd87bf12733cdbf77d7
Are you willing to open a PR to help address this issue?
The text was updated successfully, but these errors were encountered: