Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Monitoring for Platform System to Support ATN Data Ingestion Pipeline #126

Open
1 task
iwensu0313 opened this issue Dec 17, 2024 · 4 comments
Open
1 task
Assignees
Labels
ATN Issues relating to the Animal Telemetry Network

Comments

@iwensu0313
Copy link

iwensu0313 commented Dec 17, 2024

Who is requesting this?

Axiom

What is being requested?

Build out improved monitoring for ATN data types (platforms) and error logging to assist in troubleshooting.

What is the requested deadline and why?

No response

What is the current status quo (i.e., what happens if this does not get done)?

Over the past 4-6 months there have been issues related to the auto-ingestion pipeline (e.g. a small % of deployments not making its way through to the portal, active deployments stopping).

Through troubleshooting, sometimes the issue is due to missing metadata, sometimes re-running the ingestion pipeline resolves it (but lately it doesn't always do the trick). There has been an increase in time spent troubleshooting data ingestion pipeline for ATN DAC and it's unclear what the cause of the issue is.

Building transparency through monitoring and better logging can help us become more efficient with ongoing maintenance of the ATN data ingestion pipeline.

What indicates this is done (i.e., how do we know this is complete)?

  • More informative logging for platform system and ATN DAC data to assist in troubleshooting (e.g. show last updated time)

Provide a description or any other important information.

No response

@iwensu0313
Copy link
Author

iwensu0313 commented Dec 17, 2024

I added a GitHub ticket for what I mentioned at the ATN data meeting today summarizing the request, but assigned a Blocked status since we have not decided prioritize it at the moment. Let me know if you have add'l questions! @MathewBiddle @mmckinzie @conniekot

@iwensu0313 iwensu0313 changed the title Improve Monitoring for ATN Data Ingestion Pipeline Improve Monitoring for Platform/ATN Data Ingestion Pipeline Dec 17, 2024
@iwensu0313 iwensu0313 changed the title Improve Monitoring for Platform/ATN Data Ingestion Pipeline Improve Monitoring for Platform System to Support ATN Data Ingestion Pipeline Dec 17, 2024
@iwensu0313
Copy link
Author

iwensu0313 commented Dec 18, 2024

I do want to note and capture that some amount of work and troubleshooting related to this will need to be done to address and fix the recent observation that some active Wildlife Computer deployments stopped flowing into the portal at the end of October. It's been hard to tell what is going on, so we do need to add in some better error logging to assist w/ the troubleshooting. Separate but related to this larger effort that this ticket is referencing.

I'm not really sure where to provide updates on this or new observed issues, but maybe a ticket in the "Automate ingestion of different tag types" Milestone that hangs around for tracking maintenance and troubleshooting required to fix existing data streams? Or create a new ticket per issue that we need to pull in developers for? For example, I created this one #117 to capture the effort of having to adjust the data pipeline to accommodate GPE3 situations.

@MathewBiddle MathewBiddle added the ATN Issues relating to the Animal Telemetry Network label Dec 18, 2024
@MathewBiddle
Copy link
Contributor

Thank you for adding this @iwensu0313

To help me further understand this issue, it would be beneficial if we could understand exactly which point in the process we are running into issues.

  • Is this an issue with bringing data into the ATN DAC? Ingest
  • Is this an issue with processing the data once in the DAC? Processing
  • Is it an issue with displaying data on the ATN portal? Visualization

I'm trying to understand where exactly datasets have been having issues. If it's strictly for visualization purposes, we should discuss how much value add that provides and if it's something we need to support at the moment.

I think we should start thinking about new activities in this kind of a framework to help us prioritize what should be resourced.

@iwensu0313
Copy link
Author

iwensu0313 commented Dec 19, 2024

It appears that the issue is happening in the connection between Wildlife Computers API and our ingestion system. We have checked the WC API and there is data. Then we check our ingestion system and some stuff is found, some stuff is not. We are not sure why yet. We've recently been improving the health of our systems/servers/storage as they are getting overloaded and offloading into new ones (we do this periodically but more so lately and is an example of regular maintenance that needs to be done as part of overall cyberinfra maintenance for ALL projects we support). But again, not exactly sure what the issue is yet. But the amount of human troubleshooting time for ingests seems to be increasing. So I would say Ingest

I like that framing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATN Issues relating to the Animal Telemetry Network
Projects
Status: Blocked
Development

No branches or pull requests

2 participants