You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a crawl I ran recently I got ~50k "Visit with visit_id xxx got interrupted". I was running a crazy crawl, so that was fine. But that's a lot of events.
I was able to use the sentry api to download 2,600 of the events and compare the visit_ids with the crawl_history and incomplete tables.
A markdown version of the table is below.
For these 2,600 visits none got through to "ok" status for the "finalize" command. And none were in the incomplete table.
This all makes sense. And it's really exciting to see all this information and data captured in the crawl history table.
As the crawl history table appears to be now accurately gathering up this info, I'd like to propose removing the exception that is propagating up to sentry because it's an exception that's been well handled.
It may be that folks find it useful to see this information streaming to sentry. Just wanted to bring it up.
These visit ids are from 2,611 events that were downloaded in GatherSentryEvents. There were ~50k of these events but I was only able to get these 2,611 from the api for whatever reason.
In this notebook we want to see what these visit ids correspond with - failed commands or incomplete visits or something else?
array(['Visit with visit_id 4155575224537606 got interrupted',
'Visit with visit_id 4155575224537606 got interrupted',
'Visit with visit_id 2076258438298646 got interrupted', ...,
'Visit with visit_id 4347570855194635 got interrupted',
'Visit with visit_id 8821906423776802 got interrupted',
'Visit with visit_id 8821906423776802 got interrupted'],
dtype=object)
The text was updated successfully, but these errors were encountered:
birdsarah
changed the title
Consider removing "visit_id" was interrupted error
Consider removing "visit with visit_id xxx was interrupted error"
May 30, 2020
In a crawl I ran recently I got ~50k "Visit with visit_id xxx got interrupted". I was running a crazy crawl, so that was fine. But that's a lot of events.
I was able to use the sentry api to download 2,600 of the events and compare the visit_ids with the crawl_history and incomplete tables.
A markdown version of the table is below.
For these 2,600 visits none got through to "ok" status for the "finalize" command. And none were in the incomplete table.
This all makes sense. And it's really exciting to see all this information and data captured in the crawl history table.
As the crawl history table appears to be now accurately gathering up this info, I'd like to propose removing the exception that is propagating up to sentry because it's an exception that's been well handled.
It may be that folks find it useful to see this information streaming to sentry. Just wanted to bring it up.
Client
Cluster
These visit ids are from 2,611 events that were downloaded in GatherSentryEvents. There were ~50k of these events but I was only able to get these 2,611 from the api for whatever reason.
In this notebook we want to see what these visit ids correspond with - failed commands or incomplete visits or something else?
The text was updated successfully, but these errors were encountered: