-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Productionize Streaming Jobs for Service Dependencies #4590
Comments
Would one possible implementation to be to use ServiceGraphConnector to create service dependencies or is it not suitable for Jaeger's dependencies diagrams? |
Expression of Interest in this Mentorship Project - Productionize Streaming Jobs for Service Dependencies Hello everyone, I am genuinely interested in participating in this project as part of the LFX mentorship program in Q3. I have a strong understanding of the Distributed Tracing domain, having read the entire book Mastering Distributed Tracing. Additionally, I have relevant experience that I believe aligns well with the objectives of the project. Specifically, I have installed Jaeger in a production environment using the Kubernetes Operator. Additionally, I have configured Spark jobs to detect one-hop service dependencies in a simple instrumented application, comprising two services. One of these services fetches the IP address from a remote API endpoint, while the other formats the data. My Instrumented Application Tracing ArchitectureMy Instrumented Application in Production ClusterQuestions
Follow-up research resourcesAs I prepare to contribute, I would greatly appreciate it if you could recommend any additional resources or documentation to help me better understand this project and its specific requirements. Looking forward to participating in this exciting endeavor! |
@mohamedawnallah I think it's worth looking into |
I wanted to share my progress so far on this issue. I have gained a clear understanding of the Additionally, I have introduced two new analytics metrics for the
These metrics have been essential in understanding the Trace DSL API and the implementation of the Gremlin Query/Traversal Language from the Apache Tinkerpop Project. Furthermore, I have observed that |
I'm currently considering an implementation approach for this project. One idea is to enhance the @yurishkuro, I'd love to hear your thoughts on this! |
What I am curious about is whether it's possible to consolidate streaming business logic into a library
Few years ago it wasn't possible because Spark and Flink used different APIs to describe the transformation flows. But since Java Streams were introduced, I was under impression that the UDFs could be expressed in Java Streams and work for both. This is just my assumption, would be good to confirm. The reason why I think it's useful to have this reusability is because supporting Spark allows offline batch processing, which may be a useful feature for some, not to mention that some organizations are running only Spark and not Flink. |
@yurishkuro I have recently explored the idea of consolidating streaming business logic into a library to make it compatible with multiple streaming runtimes, such as Apache Flink and Apache Spark. While the The In contrast, Apache Flink simplifies the process with a unified DataStream API, which can handle both batch and streaming processing modes without the need to rewrite code. This feature makes Flink a more flexible choice, especially for organizations that wanna use a single data processing platform with the same API for both batch and stream processing. In conclusion, while Java Streams might be not suitable for the desired cross-runtime compatibility, Apache Flink's DataStream API offers a promising solution for building reusable streaming and batching business logic that can be deployed seamlessly. @yurishkuro I also would love to hear your thoughts on this! |
Hey @yurishkuro I'd still like to work on this issue outside the official LFX mentorship. Any thoughts? |
@mohamedawnallah most of our code is already written for Flink, so it's fine to keep it and package for prod deployment. |
Great so by packaging Jaeger Analytics Flink for production means:
@yurishkuro I'd love to hear your thoughts on this and if there is anything I'm missing |
Yes, plus (4) set up CI integration tests to validate that those packages are operational. But on (1), the docker-compose is not the "production" packaging, usually it's just an example & integration test, while the actual packaging is just the runnable Docker images. Another option is to extend the K8S Operator to support deployment of these images too (when used with Kafka, of course). |
on (2), it would be good to support other backends too, not just Cassandra (at minimum ES / OS). |
Thanks @yurishkuro for your additions. I know "ES" stands for ElasticSearch but What does "OS" stand for in the storage? |
OpenSearch |
Okay, I'm gonna start working on the issue but I'd like to know if you've any suggestions about communication while working on this issue Also the repository dedicated to this issue is Jaeger Analytics Flink? |
I would recommend creating a proposal / plan of what you plan to do and how. This is not a 1-day project, so the plan should contain multiple milestones. We can copy them as a checklist into the ticket description and tick off as each milestone is reached. This would provide good visibility on the progress. |
Sounds great!! I'm gonna send a proposal soon of what I plan to do and how regards |
Currently we have two analytics solutions for generating service maps:
Objectives:
The text was updated successfully, but these errors were encountered: