-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mcl-processor): Update mcl processor hooks #11134
feat(mcl-processor): Update mcl processor hooks #11134
Conversation
WalkthroughThe recent changes enhance the Kafka consumer functionalities within the metadata jobs, emphasizing metadata change logs and notifications. New classes and methods improve event handling and listener registration, alongside increased configurability through the addition of consumer group suffixes. This refactoring aims to streamline interactions and clarify the codebase. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant MCLKafkaListenerRegistrar
participant MCLKafkaListener
participant MetadataChangeLogHook
Client->>MCLKafkaListenerRegistrar: Register Kafka Listener
MCLKafkaListenerRegistrar->>MCLKafkaListener: Create Listener
MCLKafkaListener->>MetadataChangeLogHook: Invoke Hooks on Data Change
MetadataChangeLogHook-->>MCLKafkaListener: Processed Data
MCLKafkaListener-->>Client: Acknowledge Processing
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
…ifferent consumer groups
60e05a3
to
f1187c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Outside diff range, codebase verification and nitpick comments (9)
metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/MetadataChangeLogHook.java (1)
21-27
: Add Javadoc forgetConsumerGroupSuffix()
.The new method
getConsumerGroupSuffix()
lacks a detailed Javadoc description. Consider adding more context about the purpose and usage of the consumer group suffix./** * Provides a suffix for the consumer group. * + * This suffix can be used to differentiate consumer groups for parallel processing. * * @return suffix */
metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/MCLKafkaListenerRegistrar.java (1)
113-119
: Optimize Consumer Group Name Construction.The
buildConsumerGroupName
method constructs consumer group names. Consider usingString.format
for better readability when constructing strings with multiple components.return suffix.isEmpty() ? consumerGroupBase : String.format("%s-%s", consumerGroupBase, suffix);metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/form/FormAssignmentHook.java (1)
Line range hint
92-109
: Consider Adding Logging for Hook Invocation.In the
invoke
method, consider adding logging to track when hooks are invoked and with what events. This can aid in debugging and monitoring the hook's behavior.log.info("Invoking FormAssignmentHook for event: {}", event);metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/ingestion/IngestionSchedulerHook.java (1)
Line range hint
62-83
: Consider Adding Logging for Scheduler Invocation.In the
invoke
method, consider adding logging to track when the scheduler is invoked and with what events. This can aid in debugging and monitoring the scheduler's behavior.log.info("Invoking IngestionSchedulerHook for event: {}", event);metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/event/EntityChangeEventGeneratorHook.java (2)
87-87
: Add documentation forconsumerGroupSuffix
.Consider adding a brief comment explaining the purpose of the
consumerGroupSuffix
field to improve code readability and maintainability.@Getter private final String consumerGroupSuffix; // Suffix for consumer group configuration
95-96
: Configuration properties are not defined.The configuration properties
entityChangeEvents.enabled
andentityChangeEvents.consumerGroupSuffix
are not defined in any configuration files within the codebase. This could lead to runtime issues if these properties are expected to be set externally. Please ensure they are correctly defined in the appropriate configuration files.Analysis chain
Validate configuration properties.
Ensure that the configuration properties
entityChangeEvents.enabled
andentityChangeEvents.consumerGroupSuffix
are correctly defined in the configuration files.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the configuration properties are defined. # Test: Search for the properties in configuration files. # Expect: Both properties should be present. rg 'entityChangeEvents\.enabled|entityChangeEvents\.consumerGroupSuffix' --glob "**/resources/**"Length of output: 99
Script:
#!/bin/bash # Description: Verify that the configuration properties are defined anywhere in the codebase. # Test: Search for the properties in all files. # Expect: Both properties should be present somewhere in the codebase. rg 'entityChangeEvents\.enabled|entityChangeEvents\.consumerGroupSuffix'Length of output: 482
metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/incident/IncidentsSummaryHook.java (2)
62-67
: Add comments for renamed fields.Adding comments for the renamed fields
incidentService
,isEnabled
, andmaxIncidentHistory
can improve code readability and help future maintainers understand their purpose.private final IncidentService incidentService; // Service for handling incidents private final boolean isEnabled; // Flag to enable or disable the hook private final int maxIncidentHistory; // Maximum number of incidents to keep in history
74-74
: Configuration propertyincidents.hook.consumerGroupSuffix
is not defined.The configuration property
incidents.hook.consumerGroupSuffix
is used inIncidentsSummaryHook.java
but is not defined in any configuration files within the repository. Please ensure it is added to the appropriate configuration file to avoid potential runtime issues.Analysis chain
Ensure configuration property is defined.
Verify that the configuration property
incidents.hook.consumerGroupSuffix
is correctly defined in the configuration files.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the configuration property is defined. # Test: Search for the property in configuration files. # Expect: Property should be present. rg 'incidents\.hook\.consumerGroupSuffix' --glob "**/resources/**"Length of output: 68
Script:
#!/bin/bash # Description: Search for the configuration property across the entire repository. # Test: Search for the property in all files. # Expect: Property should be present somewhere in the repository. rg 'incidents\.hook\.consumerGroupSuffix'Length of output: 245
metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/siblings/SiblingAssociationHook.java (1)
74-76
: Add comments for renamed fields.Adding comments for the renamed field
isEnabled
and the new fieldconsumerGroupSuffix
can improve code readability and help future maintainers understand their purpose.private final boolean isEnabled; // Flag to enable or disable the hook @Getter private final String consumerGroupSuffix; // Suffix for consumer group configuration
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (17)
- metadata-jobs/mae-consumer-job/src/main/java/com/linkedin/metadata/kafka/MaeConsumerApplication.java (2 hunks)
- metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/MCLKafkaListener.java (1 hunks)
- metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/MCLKafkaListenerRegistrar.java (1 hunks)
- metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/MetadataChangeLogHook.java (1 hunks)
- metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/UpdateIndicesHook.java (3 hunks)
- metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/event/EntityChangeEventGeneratorHook.java (5 hunks)
- metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/form/FormAssignmentHook.java (5 hunks)
- metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/incident/IncidentsSummaryHook.java (9 hunks)
- metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/ingestion/IngestionSchedulerHook.java (4 hunks)
- metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/siblings/SiblingAssociationHook.java (3 hunks)
- metadata-jobs/mae-consumer/src/test/java/com/linkedin/metadata/kafka/hook/spring/MCLGMSSpringTest.java (2 hunks)
- metadata-jobs/mae-consumer/src/test/java/com/linkedin/metadata/kafka/hook/spring/MCLMAESpringTest.java (2 hunks)
- metadata-jobs/mae-consumer/src/test/java/com/linkedin/metadata/kafka/hook/spring/MCLSpringCommonTestConfiguration.java (1 hunks)
- metadata-jobs/pe-consumer/src/main/java/com/datahub/event/PlatformEventProcessor.java (3 hunks)
- metadata-service/configuration/src/main/resources/application.yaml (3 hunks)
- metadata-service/factories/src/main/java/com/linkedin/gms/factory/kafka/KafkaEventConsumerFactory.java (1 hunks)
- metadata-service/war/src/main/java/com/linkedin/gms/CommonApplicationConfig.java (1 hunks)
Files skipped from review due to trivial changes (1)
- metadata-service/factories/src/main/java/com/linkedin/gms/factory/kafka/KafkaEventConsumerFactory.java
Additional comments not posted (36)
metadata-service/war/src/main/java/com/linkedin/gms/CommonApplicationConfig.java (1)
41-43
: Verify the impact of new packages in@ComponentScan
.The addition of
"com.linkedin.gms.factory.change"
,"com.datahub.event.hook"
, and"com.linkedin.gms.factory.notifications"
to the@ComponentScan
annotation expands the component scanning range. Ensure that these packages do not introduce any unintended beans or configurations that could affect the application context.Verification successful
Impact of New Packages in
@ComponentScan
VerifiedThe
BusinessAttributeUpdateHook
class in thecom.datahub.event.hook
package is annotated with@Component
, indicating it is a Spring bean. This confirms that the addition of this package to@ComponentScan
introduces new beans into the application context. Ensure that the behavior of this component is as expected and does not introduce any unintended side effects.
- Class Identified:
BusinessAttributeUpdateHook
incom.datahub.event.hook
- Annotation:
@Component
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the presence and impact of new packages in the component scan. # Test: List all classes in the newly added packages to check for unintended beans. fd --type f --extension java --exec rg --files-with-matches 'class ' {} \;Length of output: 212798
Script:
#!/bin/bash # Search for class definitions in the newly added packages to check for beans or components. # Packages to check: com.linkedin.gms.factory.change, com.datahub.event.hook, com.linkedin.gms.factory.notifications # List all Java files in the specified packages fd --type f --extension java | grep -E 'com/linkedin/gms/factory/change|com/datahub/event/hook|com/linkedin/gms/factory/notifications' | xargs -I {} rg 'class ' {}Length of output: 326
Script:
#!/bin/bash # Search for class definitions and Spring annotations in the newly added packages to check for beans or components. # Packages to check: com.linkedin.gms.factory.change, com.datahub.event.hook, com.linkedin.gms.factory.notifications # Find Java files in the specified packages and search for class definitions with Spring annotations fd --type f --extension java | grep -E 'com/linkedin/gms/factory/change|com/datahub/event/hook|com/linkedin/gms/factory/notifications' | xargs -I {} rg -A 5 '@Component|@Service|@Configuration' {}Length of output: 445
metadata-jobs/mae-consumer-job/src/main/java/com/linkedin/metadata/kafka/MaeConsumerApplication.java (1)
36-38
: Assess the effect of new packages in@ComponentScan
.The inclusion of
"com.linkedin.gms.factory.change"
,"com.datahub.event.hook"
, and"com.linkedin.gms.factory.notifications"
in the@ComponentScan
annotation suggests a shift towards enhanced event handling. Verify that these changes align with the application's objectives and do not introduce conflicts or redundant beans.metadata-jobs/mae-consumer/src/test/java/com/linkedin/metadata/kafka/hook/spring/MCLGMSSpringTest.java (2)
38-39
: Ensure correct bean retrieval forMCLKafkaListenerRegistrar
.The retrieval of the
MCLKafkaListenerRegistrar
bean from the application context is crucial for the test. Verify that the application context is correctly configured to provide this bean.Verification successful
Bean Configuration Likely Correct via Auto-Configuration
The use of
@EnableAutoConfiguration
in the test files suggests that Spring Boot's auto-configuration is likely handling theMCLKafkaListenerRegistrar
bean setup. This mechanism typically configures beans based on the classpath and other settings, indicating that the application context should be correctly providing this bean.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the configuration of `MCLKafkaListenerRegistrar` in the application context. # Test: Search for the bean definition in the codebase. Expect: Correct configuration of the registrar bean. rg --type java 'MCLKafkaListenerRegistrar'Length of output: 1175
Script:
#!/bin/bash # Description: Search for Spring configuration classes or XML files that define the MCLKafkaListenerRegistrar bean. # Test: Look for @Bean annotation or XML bean definitions for MCLKafkaListenerRegistrar. rg '@Bean.*MCLKafkaListenerRegistrar|<bean.*MCLKafkaListenerRegistrar' --type java --type xmlLength of output: 95
Script:
#!/bin/bash # Description: Search for component scanning configurations that might include MCLKafkaListenerRegistrar as a bean. # Test: Look for @ComponentScan or similar annotations that might automatically configure MCLKafkaListenerRegistrar. rg '@ComponentScan|@SpringBootApplication|@EnableAutoConfiguration' --type javaLength of output: 3894
41-54
: Verify hook assertions forMCLKafkaListenerRegistrar
.The assertions have been updated to reflect the new source of hooks. Ensure that these assertions are accurate and align with the expected behavior of the
MCLKafkaListenerRegistrar
.Verification successful
Hook Assertions Verified for
MCLKafkaListenerRegistrar
.The
getMetadataChangeLogHooks
method inMCLKafkaListenerRegistrar
correctly filters and sorts hooks, aligning with the assertions in the test file. The assertions are valid and reflect the expected behavior of the registrar.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the `MCLKafkaListenerRegistrar` provides the expected hooks. # Test: Search for the implementation details of `getMetadataChangeLogHooks`. Expect: Correct hook instances. ast-grep --lang java --pattern $'class MCLKafkaListenerRegistrar { $$$ List<MetadataChangeLogHook> getMetadataChangeLogHooks() { $$$ } $$$ }'Length of output: 156
Script:
#!/bin/bash # Check if the MCLKafkaListenerRegistrar class exists in the codebase ast-grep --lang java --pattern 'class MCLKafkaListenerRegistrar { $$$ }' # If the class exists, check for the getMetadataChangeLogHooks method ast-grep --lang java --pattern 'List<MetadataChangeLogHook> getMetadataChangeLogHooks() { $$$ }'Length of output: 173
Script:
#!/bin/bash # Search for the MCLKafkaListenerRegistrar class in the entire codebase rg 'class MCLKafkaListenerRegistrar' -A 10 --glob '*.java' # If the class is found, list all methods within it ast-grep --lang java --pattern 'class MCLKafkaListenerRegistrar { $$$ }' --json | jq '.captures[] | .text' | xargs -I {} ast-grep --lang java --pattern '$_() { $$$ }' {}Length of output: 1783
Script:
#!/bin/bash # Search for all method definitions in the MCLKafkaListenerRegistrar class file rg '^[ \t]*public .* MCLKafkaListenerRegistrar' -A 50 --glob 'metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/MCLKafkaListenerRegistrar.java'Length of output: 7190
metadata-jobs/mae-consumer/src/test/java/com/linkedin/metadata/kafka/hook/spring/MCLMAESpringTest.java (2)
36-37
: Ensure correct bean retrieval forMCLKafkaListenerRegistrar
.The retrieval of the
MCLKafkaListenerRegistrar
bean from the application context is crucial for the test. Verify that the application context is correctly configured to provide this bean.
39-52
: Verify hook assertions forMCLKafkaListenerRegistrar
.The assertions have been updated to reflect the new source of hooks. Ensure that these assertions are accurate and align with the expected behavior of the
MCLKafkaListenerRegistrar
.metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/UpdateIndicesHook.java (3)
55-60
: Ensure testability with@VisibleForTesting
constructor.The constructor annotated with
@VisibleForTesting
facilitates testing by omitting theconsumerGroupSuffix
. Ensure that this constructor is used appropriately in test cases.Verification successful
Testability confirmed for
@VisibleForTesting
constructor.The
@VisibleForTesting
constructor ofUpdateIndicesHook
is appropriately used in theUpdateIndicesHookTest.java
file, ensuring its intended purpose for testing. No further action is needed.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `@VisibleForTesting` constructor in test cases. # Test: Search for the usage of the `@VisibleForTesting` constructor. Expect: Correct usage in test files. rg --type java 'new UpdateIndicesHook\('Length of output: 588
40-40
: Ensure proper usage of@Getter
forconsumerGroupSuffix
.The
@Getter
annotation from Lombok is used to generate a getter forconsumerGroupSuffix
. Ensure that this field is accessed correctly throughout the codebase.Verification successful
Proper usage of
@Getter
forconsumerGroupSuffix
verified.The
@Getter
annotation forconsumerGroupSuffix
inUpdateIndicesHook
is correctly used. This field is accessed via thegetConsumerGroupSuffix
method, as required by theMetadataChangeLogHook
interface, and is used appropriately in the codebase.
UpdateIndicesHook
implementsMetadataChangeLogHook
, ensuring the method is correctly defined and used.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `consumerGroupSuffix` getter in the codebase. # Test: Search for the usage of `getConsumerGroupSuffix`. Expect: Correct usage of the getter method. rg --type java 'getConsumerGroupSuffix'Length of output: 367
Script:
#!/bin/bash # Check the context of `getConsumerGroupSuffix` usage in MCLKafkaListenerRegistrar.java rg --type java -A 5 -B 5 'getConsumerGroupSuffix' metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/MCLKafkaListenerRegistrar.java # Check the context of `getConsumerGroupSuffix` usage in MetadataChangeLogHook.java rg --type java -A 5 -B 5 'getConsumerGroupSuffix' metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/MetadataChangeLogHook.javaLength of output: 955
Script:
#!/bin/bash # Check if UpdateIndicesHook implements or extends MetadataChangeLogHook or any related interface/abstract class rg --type java 'class UpdateIndicesHook' -A 5 metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/UpdateIndicesHook.javaLength of output: 401
42-53
: Verify constructor parameter integration forconsumerGroupSuffix
.The constructor now includes a
consumerGroupSuffix
parameter. Ensure that this parameter is correctly passed and utilized within the class.metadata-jobs/mae-consumer/src/test/java/com/linkedin/metadata/kafka/hook/spring/MCLSpringCommonTestConfiguration.java (1)
37-43
: Review the impact of modified component scanning.The changes to the
@ComponentScan
annotation have expanded the scope of component scanning to include additional packages. Ensure that the newly included packages contain the necessary Spring-managed beans and that there are no conflicts or unintended dependencies introduced.Verification successful
No impact from modified component scanning due to absence of files.
The specified directories for the expanded component scanning do not contain any Java files. Therefore, the modification to the
@ComponentScan
annotation does not introduce any new Spring-managed beans or dependencies.
- Directories checked:
metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/dao/producer
metadata-jobs/mae-consumer/src/main/java/com/linkedin/gms/factory/change
metadata-jobs/mae-consumer/src/main/java/com/datahub/event/hook
metadata-jobs/mae-consumer/src/main/java/com/linkedin/gms/factory/notifications
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the presence of Spring-managed beans in the newly included packages. # Test: Search for @Component, @Service, @Repository, and @Configuration annotations in the new packages. # Expect: Relevant Spring-managed beans should be present. rg --type java '@Component|@Service|@Repository|@Configuration' -g 'metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/dao/producer/**' -g 'metadata-jobs/mae-consumer/src/main/java/com/linkedin/gms/factory/change/**' -g 'metadata-jobs/mae-consumer/src/main/java/com/datahub/event/hook/**' -g 'metadata-jobs/mae-consumer/src/main/java/com/linkedin/gms/factory/notifications/**'Length of output: 539
Script:
#!/bin/bash # List all Java files in the specified directories to verify their existence. fd --type f --extension java metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/dao/producer fd --type f --extension java metadata-jobs/mae-consumer/src/main/java/com/linkedin/gms/factory/change fd --type f --extension java metadata-jobs/mae-consumer/src/main/java/com/datahub/event/hook fd --type f --extension java metadata-jobs/mae-consumer/src/main/java/com/linkedin/gms/factory/notificationsLength of output: 416
metadata-jobs/pe-consumer/src/main/java/com/datahub/event/PlatformEventProcessor.java (1)
48-52
: Logging enhancement for platform hooks initialization.The added logging statement provides improved visibility into which platform hooks are enabled during the initialization of
PlatformEventProcessor
. This can aid in debugging and operational monitoring.metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/MCLKafkaListener.java (4)
36-39
: Logging initialization of MCL hooks.The logging statement in the constructor provides insight into the enabled MCL hooks for a given consumer group, which is useful for debugging and operational transparency.
42-54
: Comprehensive logging for message consumption.The detailed logging in the
consume
method provides valuable information about the Kafka message being processed, including consumer group, key, topic, partition, offset, and timestamp. This is beneficial for debugging and monitoring message flow.
57-67
: Error handling during message deserialization.The error handling mechanism for message deserialization is appropriate, as it logs the error and increments a failure counter. However, consider whether additional actions, such as alerting or retry mechanisms, are needed based on the criticality of the messages.
77-94
: Error handling for hook invocation.The error handling for hook invocation is designed to skip failed hooks and continue processing. This approach ensures that a single failure does not block the processing of other hooks, but it may lead to data inconsistency if not handled properly. Consider whether additional logging or alerting is required for failed hooks.
Verification successful
Consider Additional Alerting for Hook Invocation Failures
The current implementation logs errors for failed hook invocations using
log.error
. However, if these failures are critical, you might want to implement additional alerting mechanisms to ensure timely awareness and response to such issues.
- Location to Review:
metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/MCLKafkaListener.java
(Lines 77-94)Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify if there are additional logging or alerting mechanisms for hook invocation failures. # Test: Search for logging or alerting logic related to hook invocation failures. # Expect: Mechanisms should be in place if critical. rg --type java 'log.error|alert' -A 5Length of output: 396810
metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/MCLKafkaListenerRegistrar.java (2)
1-32
: Ensure Proper Configuration and Initialization.The class
MCLKafkaListenerRegistrar
is responsible for setting up Kafka listeners for Metadata Change Log hooks. Ensure that all configurations, such asconsumerGroupBase
,mclVersionedTopicName
, andmclTimeseriesTopicName
, are correctly set in the application properties to avoid runtime issues.
92-111
: Validate Kafka Listener Endpoint Configuration.The
createListenerEndpoint
method sets up the Kafka listener endpoint. Ensure that the method referenceMCLKafkaListener.class.getMethod("consume", ConsumerRecord.class)
is correct and that theMCLKafkaListener
class implements theconsume
method with the expected signature.Verification successful
Kafka Listener Endpoint Configuration Verified
The
consume
method in theMCLKafkaListener
class exists with the correct signature. The method reference increateListenerEndpoint
is valid and correctly configured.
- Method:
void consume(ConsumerRecord<String, GenericRecord> consumerRecord)
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the `consume` method exists in the `MCLKafkaListener` class with the correct signature. # Search for the method definition in the MCLKafkaListener class. ast-grep --lang java --pattern $'class MCLKafkaListener { $$$ void consume(ConsumerRecord<String, GenericRecord> record) { $$$ } $$$ }'Length of output: 148
Script:
#!/bin/bash # Find the file containing the MCLKafkaListener class and list its methods to verify the consume method. # Locate the file containing the MCLKafkaListener class. fd 'MCLKafkaListener.java' --exec cat {} + # List all methods in the MCLKafkaListener class to verify the consume method. ast-grep --lang java --pattern $'class MCLKafkaListener { $$$ $_() { $$$ } $$$ }'Length of output: 4334
metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/form/FormAssignmentHook.java (4)
58-62
: Field Renaming for Consistency.The fields
formService
andisEnabled
have been renamed for consistency with Java naming conventions. This improves code readability and maintainability.
74-76
: Testing Constructor Provides Flexibility.The addition of a testing constructor that defaults
consumerGroupSuffix
to an empty string is a good practice. It facilitates easier unit testing by reducing the dependencies required for instantiation.
87-87
: SimplifiedisEnabled
Method.The
isEnabled
method now directly returns theisEnabled
field, which simplifies the logic and improves readability.
67-72
: Verify Configuration for Consumer Group Suffix.The constructor now requires a
consumerGroupSuffix
parameter. Ensure that the configuration property${forms.hook.consumerGroupSuffix}
is correctly set in the application properties to avoid runtime issues.metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/ingestion/IngestionSchedulerHook.java (4)
33-36
: Field Renaming for Consistency.The fields
scheduler
andisEnabled
have been renamed for consistency with Java naming conventions. This improves code readability and maintainability.
48-51
: Testing Constructor Provides Flexibility.The addition of a testing constructor that defaults
consumerGroupSuffix
to an empty string is a good practice. It facilitates easier unit testing by reducing the dependencies required for instantiation.
56-56
: SimplifiedisEnabled
Method.The
isEnabled
method now directly returns theisEnabled
field, which simplifies the logic and improves readability.
41-46
: Verify Configuration for Consumer Group Suffix.The constructor now requires a
consumerGroupSuffix
parameter. Ensure that the configuration property${ingestionScheduler.consumerGroupSuffix}
is correctly set in the application properties to avoid runtime issues.metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/event/EntityChangeEventGeneratorHook.java (2)
116-116
: Confirm logical consistency ofisEnabled()
.The
isEnabled()
method now returns theisEnabled
variable. Ensure that this change maintains the intended logic throughout the codebase.Verification successful
Logical Consistency Verified for
isEnabled()
MethodThe
isEnabled()
method inEntityChangeEventGeneratorHook.java
correctly returns theisEnabled
variable, which is initialized based on an external configuration with a default value oftrue
. This ensures logical consistency with its intended purpose. No further issues were found.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify logical consistency of `isEnabled()` usage. # Test: Search for all usages of `isEnabled()` method. # Expect: Consistent logic with previous implementation. rg --type java 'isEnabled\(\)'Length of output: 5537
Script:
#!/bin/bash # Find the definition and usage of the `isEnabled` variable within the EntityChangeEventGeneratorHook.java file. fd 'EntityChangeEventGeneratorHook.java' -x rg 'isEnabled' {}Length of output: 395
3-3
: Ensure proper usage of@VisibleForTesting
.The
@VisibleForTesting
annotation is used to indicate that the constructor is primarily for testing purposes. Ensure that this constructor is only used in test scenarios to maintain code clarity and prevent misuse.metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/incident/IncidentsSummaryHook.java (1)
97-97
: Confirm logical consistency ofisEnabled()
.The
isEnabled()
method now returns theisEnabled
variable. Ensure that this change maintains the intended logic throughout the codebase.Verification successful
Logical consistency of
isEnabled()
is confirmed. TheisEnabled()
method inIncidentsSummaryHook
correctly returns theisEnabled
field, which is initialized via configuration and used to control the execution of theinvoke
method. This maintains the intended logic.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify logical consistency of `isEnabled()` usage. # Test: Search for all usages of `isEnabled()` method. # Expect: Consistent logic with previous implementation. rg --type java 'isEnabled\(\)'Length of output: 5537
Script:
#!/bin/bash # Search for the definition and usage of `isEnabled` within the `IncidentsSummaryHook` class. ast-grep --lang java --pattern 'class IncidentsSummaryHook { $$$ boolean isEnabled; $$$ }' # Search for the `isEnabled()` method implementation in `IncidentsSummaryHook`. ast-grep --lang java --pattern 'class IncidentsSummaryHook { $$$ public boolean isEnabled() { return $_; } $$$ }'Length of output: 226
Script:
#!/bin/bash # Search for the class definition of `IncidentsSummaryHook` to understand its structure. rg --type java 'class IncidentsSummaryHook' -A 20 # Search for the `isEnabled` field and method within `IncidentsSummaryHook` more broadly. rg --type java 'isEnabled' metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/incident/IncidentsSummaryHook.java -A 5 -B 5Length of output: 8808
metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/siblings/SiblingAssociationHook.java (2)
82-83
: Ensure configuration property is defined.Verify that the configuration property
siblings.consumerGroupSuffix
is correctly defined in the configuration files.
114-114
: Confirm logical consistency ofisEnabled()
.The
isEnabled()
method now returns theisEnabled
variable. Ensure that this change maintains the intended logic throughout the codebase.Verification successful
Logical consistency of
isEnabled()
is maintained inSiblingAssociationHook
. TheisEnabled()
method returns theisEnabled
variable, and its usage in theinvoke
method is consistent with its intended purpose of conditionally executing logic based on the enabled state. No issues found.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify logical consistency of `isEnabled()` usage. # Test: Search for all usages of `isEnabled()` method. # Expect: Consistent logic with previous implementation. rg --type java 'isEnabled\(\)'Length of output: 5537
Script:
#!/bin/bash # Extract the implementation of the `isEnabled()` method in SiblingAssociationHook ast-grep --lang java --pattern $'class SiblingAssociationHook { $$$ public boolean isEnabled() { $$$ } $$$ }'Length of output: 122
Script:
#!/bin/bash # Locate and extract the `isEnabled()` method implementation in SiblingAssociationHook rg -A 10 'public boolean isEnabled\(\)' metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/hook/siblings/SiblingAssociationHook.javaLength of output: 442
metadata-service/configuration/src/main/resources/application.yaml (6)
299-299
: LGTM!The addition of
consumerGroupSuffix
to thesiblings
section enhances configurability for Kafka consumer groups.
302-302
: LGTM!The addition of
consumerGroupSuffix
to theupdateIndices
section enhances configurability for Kafka consumer groups.
305-305
: LGTM!The addition of
consumerGroupSuffix
to theingestionScheduler
section enhances configurability for Kafka consumer groups.
306-310
: LGTM!The new properties in the
incidents.hook
section provide enhanced configurability for incident management and Kafka consumer groups.
387-387
: LGTM!The addition of
consumerGroupSuffix
to theentityChangeEvents
section enhances configurability for Kafka consumer groups.
472-472
: LGTM!The addition of
consumerGroupSuffix
to theforms.hook
section enhances configurability for Kafka consumer groups.
public void afterPropertiesSet() { | ||
Map<String, List<MetadataChangeLogHook>> hookGroups = | ||
getMetadataChangeLogHooks().stream() | ||
.collect(Collectors.groupingBy(MetadataChangeLogHook::getConsumerGroupSuffix)); | ||
|
||
log.info( | ||
"MetadataChangeLogProcessor Consumer Groups: {}", | ||
hookGroups.keySet().stream().map(this::buildConsumerGroupName).collect(Collectors.toSet())); | ||
|
||
hookGroups.forEach( | ||
(key, hooks) -> { | ||
KafkaListenerEndpoint kafkaListenerEndpoint = | ||
createListenerEndpoint( | ||
buildConsumerGroupName(key), | ||
List.of(mclVersionedTopicName, mclTimeseriesTopicName), | ||
hooks); | ||
registerMCLKafkaListener(kafkaListenerEndpoint, true); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider Error Handling for Hook Registration.
The afterPropertiesSet
method registers Kafka listeners for each consumer group. Consider adding error handling to manage potential issues during listener registration, such as invalid configurations or connectivity problems.
try {
registerMCLKafkaListener(kafkaListenerEndpoint, true);
} catch (Exception e) {
log.error("Failed to register Kafka listener for consumer group: {}", key, e);
}
"Failed to execute MCL hook with name {}", hook.getClass().getCanonicalName(), e); | ||
} | ||
} | ||
// TODO: Manually commit kafka offsets after full processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider implementing manual offset commits.
The TODO comment suggests implementing manual offset commits after full processing. This can ensure that messages are only marked as processed after successful execution of all hooks, reducing the risk of message loss.
Would you like assistance in implementing manual offset commits?
@VisibleForTesting | ||
public EntityChangeEventGeneratorHook( | ||
@Nonnull OperationContext systemOperationContext, | ||
@Nonnull final EntityChangeEventGeneratorRegistry entityChangeEventGeneratorRegistry, | ||
@Nonnull final SystemEntityClient entityClient, | ||
@Nonnull Boolean isEnabled) { | ||
this(systemOperationContext, entityChangeEventGeneratorRegistry, entityClient, isEnabled, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using builder pattern for constructor.
With multiple parameters, especially with optional ones like consumerGroupSuffix
, consider using a builder pattern to enhance readability and flexibility.
public static class Builder {
private OperationContext systemOperationContext;
private EntityChangeEventGeneratorRegistry entityChangeEventGeneratorRegistry;
private SystemEntityClient entityClient;
private Boolean isEnabled;
private String consumerGroupSuffix = "";
public Builder systemOperationContext(OperationContext systemOperationContext) {
this.systemOperationContext = systemOperationContext;
return this;
}
public Builder entityChangeEventGeneratorRegistry(EntityChangeEventGeneratorRegistry registry) {
this.entityChangeEventGeneratorRegistry = registry;
return this;
}
public Builder entityClient(SystemEntityClient client) {
this.entityClient = client;
return this;
}
public Builder isEnabled(Boolean isEnabled) {
this.isEnabled = isEnabled;
return this;
}
public Builder consumerGroupSuffix(String suffix) {
this.consumerGroupSuffix = suffix;
return this;
}
public EntityChangeEventGeneratorHook build() {
return new EntityChangeEventGeneratorHook(
systemOperationContext, entityChangeEventGeneratorRegistry, entityClient, isEnabled, consumerGroupSuffix);
}
}
@VisibleForTesting | ||
public IncidentsSummaryHook( | ||
@Nonnull final IncidentService incidentService, | ||
@Nonnull Boolean isEnabled, | ||
@Nonnull Integer maxIncidentHistory) { | ||
this(incidentService, isEnabled, maxIncidentHistory, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using builder pattern for constructor.
With multiple parameters, especially with optional ones like consumerGroupSuffix
, consider using a builder pattern to enhance readability and flexibility.
public static class Builder {
private IncidentService incidentService;
private Boolean isEnabled;
private Integer maxIncidentHistory;
private String consumerGroupSuffix = "";
public Builder incidentService(IncidentService incidentService) {
this.incidentService = incidentService;
return this;
}
public Builder isEnabled(Boolean isEnabled) {
this.isEnabled = isEnabled;
return this;
}
public Builder maxIncidentHistory(Integer maxIncidentHistory) {
this.maxIncidentHistory = maxIncidentHistory;
return this;
}
public Builder consumerGroupSuffix(String suffix) {
this.consumerGroupSuffix = suffix;
return this;
}
public IncidentsSummaryHook build() {
return new IncidentsSummaryHook(incidentService, isEnabled, maxIncidentHistory, consumerGroupSuffix);
}
}
@VisibleForTesting | ||
public SiblingAssociationHook( | ||
@Nonnull final SystemEntityClient systemEntityClient, | ||
@Nonnull final EntitySearchService searchService, | ||
@Nonnull Boolean isEnabled) { | ||
this(systemEntityClient, searchService, isEnabled, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using builder pattern for constructor.
With multiple parameters, especially with optional ones like consumerGroupSuffix
, consider using a builder pattern to enhance readability and flexibility.
public static class Builder {
private SystemEntityClient systemEntityClient;
private EntitySearchService searchService;
private Boolean isEnabled;
private String consumerGroupSuffix = "";
public Builder systemEntityClient(SystemEntityClient systemEntityClient) {
this.systemEntityClient = systemEntityClient;
return this;
}
public Builder searchService(EntitySearchService searchService) {
this.searchService = searchService;
return this;
}
public Builder isEnabled(Boolean isEnabled) {
this.isEnabled = isEnabled;
return this;
}
public Builder consumerGroupSuffix(String suffix) {
this.consumerGroupSuffix = suffix;
return this;
}
public SiblingAssociationHook build() {
return new SiblingAssociationHook(systemEntityClient, searchService, isEnabled, consumerGroupSuffix);
}
}
I do think a docs update for this would be helpful, but lgtm 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range, codebase verification and nitpick comments (3)
docs/how/kafka-config.md (3)
119-122
: Clarify the default behavior of MCL hooks.The section explains the default behavior but could benefit from a clearer introduction to the new feature of separating hooks into different consumer groups.
Consider rephrasing to emphasize the new capability right at the beginning.
131-138
: Ensure consistency in descriptions.The descriptions in the table vary in detail. Ensure each entry provides sufficient context for users to understand its purpose.
Consider adding more context to some descriptions, such as explaining what "primary processing hook" means in this context.
Line range hint
140-151
: Clarify Docker and Helm configuration instructions.The instructions for applying configurations using Docker and Helm are clear but could benefit from additional examples or links to further documentation.
Consider adding links to detailed guides on setting environment variables in Docker and Helm.
Tools
LanguageTool
[uncategorized] ~117-~117: Loose punctuation mark.
Context: ...ae-consumer -KAFKA_CONSUMER_GROUP_ID
: The name of the kafka consumer's group ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~124-~124: Possible missing comma found.
Context: ...s could alsp be separated into separate groups which allows for controlling paralleliz...(AI_HYDRA_LEO_MISSING_COMMA)
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- docs/how/kafka-config.md (1 hunks)
Additional context used
LanguageTool
docs/how/kafka-config.md
[uncategorized] ~124-~124: Possible missing comma found.
Context: ...s could alsp be separated into separate groups which allows for controlling paralleliz...(AI_HYDRA_LEO_MISSING_COMMA)
The various MCL Hooks could alsp be separated into separate groups which allows for controlling parallelization and | ||
prioritization of the hooks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the typo and improve clarity.
There's a typo in "alsp" which should be "also." Additionally, a comma is needed for clarity.
- The various MCL Hooks could alsp be separated into separate groups which allows for controlling parallelization and
+ The various MCL Hooks could also be separated into separate groups, which allows for controlling parallelization and
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
The various MCL Hooks could alsp be separated into separate groups which allows for controlling parallelization and | |
prioritization of the hooks. | |
The various MCL Hooks could also be separated into separate groups, which allows for controlling parallelization and | |
prioritization of the hooks. |
Tools
LanguageTool
[uncategorized] ~124-~124: Possible missing comma found.
Context: ...s could alsp be separated into separate groups which allows for controlling paralleliz...(AI_HYDRA_LEO_MISSING_COMMA)
* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820) * refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2) (datahub-project#10764) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819) * feat(ingest/transformer): tags to terms transformer (datahub-project#10758) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822) * feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823) * feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824) * feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825) * add flag for includeSoftDeleted in scroll entities API (datahub-project#10831) * feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832) * feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826) * add scroll parameters to openapi v3 spec (datahub-project#10833) * fix(ingest): correct profile_day_of_week implementation (datahub-project#10818) * feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(cli): add more details to get cli (datahub-project#10815) * fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836) * fix(ingestion): fix datajob patcher (datahub-project#10827) * fix(smoke-test): add suffix in temp file creation (datahub-project#10841) * feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784) * feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645) Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> * docs(patch): add patch documentation for how implementation works (datahub-project#10010) Co-authored-by: John Joyce <john@acryl.io> * fix(jar): add missing custom-plugin-jar task (datahub-project#10847) * fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391) Co-authored-by: John Joyce <john@acryl.io> * docs(): Update posts.md (datahub-project#9893) Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * chore(ingest): update acryl-datahub-classify version (datahub-project#10844) * refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834) * fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849) * fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848) * fix(smoke-test): missing test for move domain (datahub-project#10837) * ci: update usernames to not considered for community (datahub-project#10851) * env: change defaults for data contract visibility (datahub-project#10854) * fix(ingest/tableau): quote special characters in external URL (datahub-project#10842) * fix(smoke-test): fix flakiness of auto complete test * ci(ingest): pin dask dependency for feast (datahub-project#10865) * fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542) * feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829) * chore(ingest): Mypy 1.10.1 pin (datahub-project#10867) * docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852) * docs: add new js snippet (datahub-project#10846) * refactor(ingestion): remove company domain for security reason (datahub-project#10839) * fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498) Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> * fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874) * fix(manage-tokens): fix manage access token policy (datahub-project#10853) * Batch get entity endpoints (datahub-project#10880) * feat(system): support conditional write semantics (datahub-project#10868) * fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890) * feat(ingest/lookml): shallow clone repos (datahub-project#10888) * fix(ingest/looker): add missing dependency (datahub-project#10876) * fix(ingest): only populate audit stamps where accurate (datahub-project#10604) * fix(ingest/dbt): always encode tag urns (datahub-project#10799) * fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727) * fix(ingestion/looker): column name missing in explore (datahub-project#10892) * fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879) * feat(conditional-writes): misc updates and fixes (datahub-project#10901) * feat(ci): update outdated action (datahub-project#10899) * feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902) Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> * feat(ingest): add snowflake-queries source (datahub-project#10835) * fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906) * docs: add new company to adoption list (datahub-project#10909) * refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(ui) Finalize support for all entity types on forms (datahub-project#10915) * Index ExecutionRequestResults status field (datahub-project#10811) * feat(ingest): grafana connector (datahub-project#10891) Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916) * feat(dataset): add support for external url in Dataset (datahub-project#10877) * docs(saas-overview) added missing features to observe section (datahub-project#10913) Co-authored-by: John Joyce <john@acryl.io> * fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882) * fix(structured properties): allow application of structured properties without schema file (datahub-project#10918) * fix(data-contracts-web) handle other schedule types (datahub-project#10919) * fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * Add feature flag for view defintions (datahub-project#10914) Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> * feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884) * fix(airflow): add error handling around render_template() (datahub-project#10907) * feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830) * feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904) * fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845) * feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864) * feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813) * fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924) * fix(build): fix lint fix web react (datahub-project#10896) * fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912) * feat(ingest): report extractor failures more loudly (datahub-project#10908) * feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905) * fix(ingest): fix docs build (datahub-project#10926) * fix(ingest/snowflake): fix test connection (datahub-project#10927) * fix(ingest/lookml): add view load failures to cache (datahub-project#10923) * docs(slack) overhauled setup instructions and screenshots (datahub-project#10922) Co-authored-by: John Joyce <john@acryl.io> * fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903) * fix(entityservice): fix merging sideeffects (datahub-project#10937) * feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938) Co-authored-by: John Joyce <john@Johns-MBP.lan> * chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: John Joyce <john@Johns-MBP.lan> * Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939) * docs: add learning center to docs (datahub-project#10921) * doc: Update hubspot form id (datahub-project#10943) * chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941) * fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895) * fix(ingest/abs): split abs utils into multiple files (datahub-project#10945) * doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950) * fix(ingest/setup): feast and abs source setup (datahub-project#10951) * fix(connections) Harden adding /gms to connections in backend (datahub-project#10942) * feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952) * fix(docs): make graphql doc gen more automated (datahub-project#10953) * feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723) * fix(spark-lineage): default timeout for future responses (datahub-project#10947) * feat(datajob/flow): add environment filter using info aspects (datahub-project#10814) * fix(ui/ingest): correct privilege used to show tab (datahub-project#10483) Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> * feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955) * add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956) * fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966) * fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965) * fix(airflow/build): Pinning mypy (datahub-project#10972) * Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974) * fix(ingest/test): Fix for mssql integration tests (datahub-project#10978) * fix(entity-service) exist check correctly extracts status (datahub-project#10973) * fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982) * bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986) * fix(ui): Remove ant less imports (datahub-project#10988) * feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987) * feat(ingest/cli): init does not actually support environment variables (datahub-project#10989) * fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991) * feat(ingest/spark): Promote beta plugin (datahub-project#10881) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967) * feat(ingest): add `check server-config` command (datahub-project#10990) * feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466) Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig. This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig. I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled. Fixes datahub-project#9705 * fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993) * fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771) * feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985) * feat(ingest): improve `ingest deploy` command (datahub-project#10944) * fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920) - allow excluding soft-deleted entities in relationship-queries - exclude soft-deleted members of groups * fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996) * doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984) Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro@acryl.io> * fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006) * fix(ui/ingest): Support invalid cron jobs (datahub-project#10998) * fix(ingest): fix graph config loading (datahub-project#11002) Co-authored-by: Pedro Silva <pedro@acryl.io> * feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011) * feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999) * feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935) * fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858) * feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * docs: standardize terminology to DataHub Cloud (datahub-project#11003) * fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013) * docs(slack) troubleshoot docs (datahub-project#11014) * feat(propagation): Add graphql API (datahub-project#11030) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * feat(propagation): Add models for Action feature settings (datahub-project#11029) * docs(custom properties): Remove duplicate from sidebar (datahub-project#11033) * feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(propagation): Add Documentation Propagation Settings (datahub-project#11038) * fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040) * fix(ci): smoke test lint failures (datahub-project#11044) * docs: fix learning center color scheme & typo (datahub-project#11043) * feat: add cloud main page (datahub-project#11017) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662) Co-authored-by: John Joyce <john@acryl.io> * docs: fix typo (datahub-project#11046) * fix(lint): apply spotless (datahub-project#11050) * docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034) * feat(cli): Add run-id option to put sub-command (datahub-project#11023) Adds an option to assign run-id to a given put command execution. This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline. * fix(ingest): improve sql error reporting calls (datahub-project#11025) * fix(airflow): fix CI setup (datahub-project#11031) * feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039) * fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971) * (chore): Linting fix (datahub-project#11015) * chore(ci): update deprecated github actions (datahub-project#10977) * Fix ALB configuration example (datahub-project#10981) * chore(ingestion-base): bump base image packages (datahub-project#11053) * feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051) * fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910) * feat(ingest/tableau): add retry on timeout (datahub-project#10995) * change generate kafka connect properties from env (datahub-project#10545) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ingest): fix oracle cronjob ingestion (datahub-project#11001) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062) * feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041) * build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063) * docs(ingest): update developing-a-transformer.md (datahub-project#11019) * feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056) * feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * docs(airflow): update min version for plugin v2 (datahub-project#11065) * doc(ingestion/tableau): doc update for derived permission (datahub-project#11054) Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(py): remove dep on types-pkg_resources (datahub-project#11076) * feat(ingest/mode): add option to exclude restricted (datahub-project#11081) * fix(ingest): set lastObserved in sdk when unset (datahub-project#11071) * doc(ingest): Update capabilities (datahub-project#11072) * chore(vulnerability): Log Injection (datahub-project#11090) * chore(vulnerability): Information exposure through a stack trace (datahub-project#11091) * chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089) * chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088) * chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059) * chore(vulnerability): Overly permissive regex range (datahub-project#11061) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix: update customer data (datahub-project#11075) * fix(models): fixing the datasetPartition models (datahub-project#11085) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(docs-site): hiding learn more from cloud page (datahub-project#11097) * fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098) * docs: Refactor customer stories section (datahub-project#10869) Co-authored-by: Jeff Merrick <jeff@wireform.io> * fix(release): fix full/slim suffix on tag (datahub-project#11087) * feat(config): support alternate hashing algorithm for doc id (datahub-project#10423) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: John Joyce <john@acryl.io> * fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007) * fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * chore: Update contributors list in PR labeler (datahub-project#11105) * feat(ingest): tweak stale entity removal messaging (datahub-project#11064) * fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104) * fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080) * feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069) * docs: update graphql docs on forms & structured properties (datahub-project#11100) * test(search): search openAPI v3 test (datahub-project#11049) * fix(ingest/tableau): prevent empty site content urls (datahub-project#11057) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(entity-client): implement client batch interface (datahub-project#11106) * fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114) * fix(ingest): downgrade column type mapping warning to info (datahub-project#11115) * feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118) * fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111) * fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122) * fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092) * fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121) * fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366) * feat(ui): Changes to allow editable dataset name (datahub-project#10608) Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> * fix: remove saxo (datahub-project#11127) * feat(mcl-processor): Update mcl processor hooks (datahub-project#11134) * fix(openapi): fix openapi v2 endpoints & v3 documentation update * Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update" This reverts commit 573c1cb. * docs(policies): updates to policies documentation (datahub-project#11073) * fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139) * feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116) * fix(mutator): mutator hook fixes (datahub-project#11140) * feat(search): support sorting on multiple fields (datahub-project#10775) * feat(ingest): various logging improvements (datahub-project#11126) * fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(docs-site) cloud page spacing and content polishes (datahub-project#11141) * feat(ui) Enable editing structured props on fields (datahub-project#11042) * feat(tests): add md5 and last computed to testResult model (datahub-project#11117) * test(openapi): openapi regression smoke tests (datahub-project#11143) * fix(airflow): fix tox tests + update docs (datahub-project#11125) * docs: add chime to adoption stories (datahub-project#11142) * fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158) * fix(kafka-setup): add missing script to image (datahub-project#11190) * fix(config): fix hash algo config (datahub-project#11191) * test(smoke-test): updates to smoke-tests (datahub-project#11152) * fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193) * chore(kafka): kafka version bump (datahub-project#11211) * readd UsageStatsWorkUnit * fix merge problems * change logo --------- Co-authored-by: Chris Collins <chriscollins3456@gmail.com> Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: dushayntAW <158567391+dushayntAW@users.noreply.github.com> Co-authored-by: sagar-salvi-apptware <159135491+sagar-salvi-apptware@users.noreply.github.com> Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> Co-authored-by: Kevin Chun <kevin1chun@gmail.com> Co-authored-by: jordanjeremy <72943478+jordanjeremy@users.noreply.github.com> Co-authored-by: skrydal <piotr.skrydalewicz@gmail.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: sid-acryl <155424659+sid-acryl@users.noreply.github.com> Co-authored-by: Julien Jehannet <80408664+aviv-julienjehannet@users.noreply.github.com> Co-authored-by: Hendrik Richert <github@richert.li> Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: Felix Lüdin <13187726+Masterchen09@users.noreply.github.com> Co-authored-by: Pirry <158024088+chardaway@users.noreply.github.com> Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: cburroughs <chris.burroughs@gmail.com> Co-authored-by: ksrinath <ksrinath@users.noreply.github.com> Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com> Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: ipolding-cais <155455744+ipolding-cais@users.noreply.github.com> Co-authored-by: Tamas Nemeth <treff7es@gmail.com> Co-authored-by: Shubham Jagtap <132359390+shubhamjagtap639@users.noreply.github.com> Co-authored-by: haeniya <yanik.haeni@gmail.com> Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com> Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> Co-authored-by: 808OVADOZE <52988741+shtephlee@users.noreply.github.com> Co-authored-by: noggi <anton.kuraev@acryl.io> Co-authored-by: Nicholas Pena <npena@foursquare.com> Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> Co-authored-by: ethan-cartwright <ethan.cartwright.m@gmail.com> Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> Co-authored-by: Nadav Gross <33874964+nadavgross@users.noreply.github.com> Co-authored-by: Patrick Franco Braz <patrickfbraz@poli.ufrj.br> Co-authored-by: pie1nthesky <39328908+pie1nthesky@users.noreply.github.com> Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <130968841+joelmataKPN@users.noreply.github.com> Co-authored-by: Ellie O'Neil <110510035+eboneil@users.noreply.github.com> Co-authored-by: Ajoy Majumdar <ajoymajumdar@hotmail.com> Co-authored-by: deepgarg-visa <149145061+deepgarg-visa@users.noreply.github.com> Co-authored-by: Tristan Heisler <tristankheisler@gmail.com> Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io> Co-authored-by: Davi Arnaut <davi.arnaut@acryl.io> Co-authored-by: Pedro Silva <pedro@acryl.io> Co-authored-by: amit-apptware <132869468+amit-apptware@users.noreply.github.com> Co-authored-by: Sam Black <sam.black@acryl.io> Co-authored-by: Raj Tekal <varadaraj_tekal@optum.com> Co-authored-by: Steffen Grohsschmiedt <gitbhub@steffeng.eu> Co-authored-by: jaegwon.seo <162448493+wornjs@users.noreply.github.com> Co-authored-by: Renan F. Lima <51028757+lima-renan@users.noreply.github.com> Co-authored-by: Matt Exchange <xkollar@users.noreply.github.com> Co-authored-by: Jonny Dixon <45681293+acrylJonny@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: Pinaki Bhattacharjee <pinakipb2@gmail.com> Co-authored-by: Jeff Merrick <jeff@wireform.io> Co-authored-by: skrydal <piotr.skrydalewicz@acryl.io> Co-authored-by: AndreasHegerNuritas <163423418+AndreasHegerNuritas@users.noreply.github.com> Co-authored-by: jayasimhankv <145704974+jayasimhankv@users.noreply.github.com> Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> Co-authored-by: David Leifker <david.leifker@acryl.io>
Allow MCL Hooks to consume messages independently with different consumer groups. By default no change is made to the MCL consumer group names and all hooks share the same consumer group.
Reasons for separate consumer groups include parallel hook execution and prioritization.
Checklist
Summary by CodeRabbit
New Features
consumerGroupSuffix
configuration for improved consumer group management in various hooks.Bug Fixes
Documentation
Refactor