Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codeql analysis #25

Merged
merged 59 commits into from
Dec 20, 2023
Merged

Add codeql analysis #25

merged 59 commits into from
Dec 20, 2023

Conversation

Will-Lo
Copy link
Owner

@Will-Lo Will-Lo commented Dec 20, 2023

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

umustafi and others added 30 commits October 3, 2023 14:50
* Add millisecond level precision to timestamp cols & proper timezone conversion

	- existing tests pass with minor modifications

* Handle reminder events properly

* Fix compilation errors & add isReminder flag

* Add unit tests

* Address review comments

* Add newline to address comment

* Include reminder/original tag in logging

* Clarify timezone issues in comment

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
* Set reminder event flag to true for reminders

* Update unit tests

* remove unused variable

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
* Add retention for lease arbiter table

* Replace blocking thread with scheduled thread pool executor

* Make Calendar instance thread-safe

* Rename variables, make values more clear

* Update timestamp related cols

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
* Fix Reminder Event Epsilon Comparison

* Add TODO comment

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
…3800)

* Improve Multi-active related logs and metrics

* Add more metrics and logs around forwarding dag action to DagManager

* Improve logs in response to review comments

* Replace flow execution id with trigger timestamp from multi-active

* Update flow action execution id within lease arbiter

* Fix test & make Lease Statuses more lean

* Update javadoc

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
[GOBBLIN-1929] add dataset root in some common type of datasets
…opicNameValidator (apache#3793)

* * Add generic topic validation support
* Add the first validator TopicNameValidator into the validator chain, as a refactor of existing codes

* Refine to address comments

* Refine

---------

Co-authored-by: Tao Qin <tqin@linkedin.com>
…omment (apache#3801)

* Refactor dag action updating method & add clarifying comment

* Log filtered out duplicate messages

* logs and metrics for missing messages from change monitor

* Only add gobblin.service prefix for dagActionStoreChangeMonitor

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
* Emit metrics to monitor high level consumer queue size

* Empty commit to trigger tests

* Use BlockingQueue.size() func instead of atomic integer array

* Remove unused import & add DagActionChangeMonitor prefix to metric

* Refactor to avoid repeating code

* Make protected variables private where possible

* Fix white space

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
…che#3807)

* Skip over null dag actions from malformed messages

* Add new metric for skipped messages

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
… filtered partitions (apache#3798)

* add function in Kafka Source to recompute workUnits for filtered partitions

* address comments

* set default min container value to 1

* add condition when create empty wu

* update the condition
* preserve x bit in manifest file based copy
* fix project structure preventing running unit tests from intellij
* fix unit test
…reusing code in Temporal-based execution (apache#3784)

* Simplify a few elements of MR-related job exec before reusing code in Temporal-based execution

* Add JSON-ification to several foundational config-state representations, plus encapsulated convience method `JobState.getJobIdFromProps`

* Update javadoc comments

* Encapsulate check for whether a path has the extension of a multi-work-unit
…n with Gobblin (apache#3809)

* Bump AWS version to use a compatible version of jackson with Gobblin

* use shared aws version
* Quantify Missed Work Completed by Reminders
   Also fix bug to filter out heartbeat events before extracting field

* Refactor changeMonitorUtils & add delimiter to metrics prefix

* Re-order params to group similar ones

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
…ti reference tier (apache#3806)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1933] Change the logic in completeness verifier to support multi reference tier

* add uite test

* fix typo

* change the javadoc

* change the javadoc

---------

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
…r workflows of unbounded size through sub-workflow nesting (apache#3811)

* Define `Workload` abstraction for Temporal workflows of unbounded size through sub-workflow nesting

* Adjust Gobblin-Temporal configurability for consistency and abstraction

* Define `WorkerConfig`, to pass the `TemporalWorker`'s configuration to the workflows and activities it hosts

* Improve javadoc

* Javadoc fixup

* Minor changes

* Update per review suggestions

* Insert pause, to spread the load on the temporal server, before launch of each child workflow that may have direct leaves of its own

* Appease findbugs by having `SeqSliceBackedWorkSpan::next` throw `NoSuchElementException`

* Add comment
…ming super-workflow with a configurable number of activities nested beneath (apache#3815)

* Add gobblin-temporal load generator for a single subsuming super-workflow with a configurable number of activities nested beneath

* Update per findbugs advice

* Improve processing of int props
…emporal `WorkUnit` evaluation (apache#3816)

* Implement Distributed Data Movement (DDM) Gobblin-on-Temporal `WorkUnit` evaluation

* Adjust work unit processing tuning for start-to-close timeout and nested execution branching

* Rework `ProcessWorkUnitImpl` and fix `FileSystem` misuse; plus convenience abstractions to load `FileSystem`, `JobState`, and `StateStore<TaskState>`

* Fix `FileSystem` resource lifecycle, uniquely name each workflow, and drastically reduce worker concurrent task execution

* Heed findbugs advice

* prep before commit

* Improve processing of required props

* Update comment in response to PR feedback
…p MysqlDagActio… (apache#3812)

* Create MySQL util class for re-usable methods and setup MysqlDagActionStore retention

* Add a java doc

* Address review comments

* Close scheduled executors on shutdown & clarify naming and comments

* Remove extra period making config key invalid

* implement Closeable

* Use try with resources

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
…it (apache#3818)

* add option to detect malformed ORC during commit phase

* better logging

* address comment

* catch more generic exception

* validate ORC file after close

* move validate in between close and commit

* syntax

* whitespace

* update log
)

* Use same flowExecutionId across participants
* Set config field as well in new FlowSpec
* Use gobblin util to create config
* Rename function and move to util
---------
Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
* [GOBBLIN-1951] Emit GTE when deleting corrupted ORC files

This commit adds ORC file validation during the commit phase and deletes
corrupted files. It also includes a test for ORC file validation.

* Linter fixes
…3817)

* Add framework and unit tests for DagActionStoreChangeMonitor

* Add more test cases and validation

* Add header for new file

* Move FlowSpec static function to Utils class

* Remove unused import

* Fix compile error

* Fix unit tests

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
…e#3822)

* Make jobname shortening in GaaS more aggressive

* Change long name prefix to flowgroup
umustafi and others added 23 commits November 15, 2023 14:11
* Monitor number of failed persisting leases to tune linger

* Increase default linger and epsilon values

* Add metric for lease persisting success

* Rename metrics

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
… poll records during runtime (apache#3827)

* address comments

* use connectionmanager when httpclient is not cloesable

* add uite test

* fix typo

* [GOBBLIN-1956] Make Kafka streaming pipeline be able to config the max poll records during runtime

* small refractor

---------

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
…e#3828)

* WIP

* Optimization to limit batchsize based on large record sizes

* Address review
… source vs. destination-side DB and table (apache#3835)

* Allow `IcebergDatasetFinder` to use separate names for source vs. destination-side DB and table

* Adjust Mockito.verify to pass test
…urrentExecution` (apache#3836)

* Prevent NPE in `FlowCompilationValidationHelper.validateAndHandleConcurrentExecution`

* improved `MultiHopFlowCompiler` javadoc
* Delete launch action event after persisting

* Fix default value for flowExecutionId retrieval from metadata map

* Address review comments and add unit test

* Code clean up

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
…apache#3833)

* Emit audit count after commit in IcebergMetadataWriter

* Unit tests by extracting to a post commit

* Emit audit count first

* find bugs complaint
…aaS (apache#3838)

* Add external data node for generic ingress/egress on GaaS

* Address reviews and cleanup

* Use URI representation for external dataset descriptor node

* Fix error message in containing check

* Address review
…or` name for the `IcebergTable`s it creates (apache#3842)

* Allow `IcebergCatalog` to specify the `DatasetDescriptor` name for the `IcebergTable`s it creates

* small method javadoc
…pache#3841)

* Consolidate processing dag actions to one code path

* Delete dag action in failure cases too

* Distinguish metrics for startup

* Refactor to avoid duplicated code and create static metrics proxy class

* Remove DagManager checks that don't apply on startup

* Add test to check kill/resume dag action removal after processing

* Remove unused import

* Initialize metrics proxy with Null Pattern

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
…sh WUs before they've actually run (apache#3844)

* Fix `CopyDataPublisher` to avoid committing post-publish WUs before they've actually run

* fixup findbugsMain
…GaaS (apache#3847)

* Revert "[GOBBLIN-1952] Make jobname shortening in GaaS more aggressive (apache#3822)"

This reverts commit 5619a0a.

* use configuration to keep specified jobname if enabled

* Cleanup
…eduler state (apache#3846)

* Ensure Adhoc Flows can be Executed in Multi-active Scheduler state

* Only delete spec for adhoc flows & always after orchestration

* Delete adhoc flows when dagManager is not present as well

* Fix flaky test for scheduler

* Add clarifying comment about failure recovery

* Re-ordered private method

* Move private methods again

* Enforce sequential ordering of unit tests to make more reliable

---------

Co-authored-by: Urmi Mustafi <umustafi@linkedin.com>
… source and dest files even when source is older (apache#3845)

* change should copy logic

* Add tests, address review

* Fix checkstyle

* Remove unused imports
…riptor` platform name for the `IcebergTable`s it creates (apache#3848)

* Allow an `IcebergCatalog` to override the `DatasetDescriptor` platform for the `IcebergTable`s it creates

* fixup javadoc
Add commit step to Gobblin temporal workflow for job publish
@codecov-commenter
Copy link

codecov-commenter commented Dec 20, 2023

Codecov Report

Attention: 399 lines in your changes are missing coverage. Please review.

Comparison is base (028b85f) 47.35% compared to head (e570361) 48.00%.

Files Patch % Lines
...ervice/monitoring/DagActionStoreChangeMonitor.java 0.00% 78 Missing ⚠️
...a/org/apache/gobblin/source/workunit/WorkUnit.java 0.00% 40 Missing ⚠️
.../management/copy/iceberg/IcebergDatasetFinder.java 0.00% 30 Missing ⚠️
...che/gobblin/data/management/copy/CopyableFile.java 0.00% 24 Missing ⚠️
...in/service/modules/orchestration/Orchestrator.java 41.17% 18 Missing and 2 partials ⚠️
...in/source/extractor/extract/kafka/KafkaSource.java 62.50% 16 Missing and 2 partials ⚠️
...blin/runtime/api/MysqlMultiActiveLeaseArbiter.java 80.48% 7 Missing and 9 partials ⚠️
...a/management/copy/publisher/CopyDataPublisher.java 72.54% 8 Missing and 6 partials ⚠️
...a/org/apache/gobblin/util/DBStatementExecutor.java 61.76% 11 Missing and 2 partials ⚠️
...che/gobblin/runtime/TaskStateCollectorService.java 67.56% 9 Missing and 3 partials ⚠️
... and 29 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff              @@
##             master      #25      +/-   ##
============================================
+ Coverage     47.35%   48.00%   +0.65%     
+ Complexity    10957     8885    -2072     
============================================
  Files          2152     1743     -409     
  Lines         85114    67472   -17642     
  Branches       9451     7303    -2148     
============================================
- Hits          40302    32387    -7915     
+ Misses        41163    32267    -8896     
+ Partials       3649     2818     -831     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Will-Lo Will-Lo merged commit edb3726 into master Dec 20, 2023
10 checks passed
@Will-Lo Will-Lo deleted the add-codeql-analysis branch December 20, 2023 23:29
@Will-Lo Will-Lo restored the add-codeql-analysis branch December 21, 2023 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants