Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json pbp scraping gives incorrect xC/yC data. #38

Open
JB13 opened this issue Apr 22, 2024 · 1 comment
Open

Json pbp scraping gives incorrect xC/yC data. #38

JB13 opened this issue Apr 22, 2024 · 1 comment

Comments

@JB13
Copy link
Contributor

JB13 commented Apr 22, 2024

I was pulling down data (using both scrape_seasons and scrape_games) and I noticed that ~30% of shots either had no xC/yC data, or just had it listed at one of the bullet points:

image

Looking into it a bit, it looks like the "eventID" values in the json are no longer guaranteed to be in order (See snippet of json below). In json_pbp.py, I removed the sorted_events logic, and get data in the "right" order:

image

Not sorting seems to work mostly? Still need to investigate cases where html event length != json event length. Sorting by seconds_elapsed doesn't work great for stoppages, then faceoffs at the same time point.

I'll might have time to try to find a more elegant fix to this (and maybe adding a test that grabs a couple plays from a game to confirm it's being parsed correctly in the future). But wanted to write this down/make note of it in case anyone else is looking at it.

"plays": [
        {
            "eventId": 102,
            "periodDescriptor": {
                "number": 1,
                "periodType": "REG"
            },
            "timeInPeriod": "00:00",
            "timeRemaining": "20:00",
            "situationCode": "1551",
            "homeTeamDefendingSide": "left",
            "typeCode": 520,
            "typeDescKey": "period-start",
            "sortOrder": 8
        },
        {
            "eventId": 101,
            "periodDescriptor": {
                "number": 1,
                "periodType": "REG"
            },
            "timeInPeriod": "00:00",
            "timeRemaining": "20:00",
            "situationCode": "1551",
            "homeTeamDefendingSide": "left",
            "typeCode": 502,
            "typeDescKey": "faceoff",
            "sortOrder": 9,
            "details": {
                "eventOwnerTeamId": 18,
                "losingPlayerId": 8478519,
                "winningPlayerId": 8475158,
                "xCoord": 0,
                "yCoord": 0,
                "zoneCode": "N"
            }
        },
        {
            "eventId": 8,
            "periodDescriptor": {
                "number": 1,
                "periodType": "REG"
            },
            "timeInPeriod": "00:35",
            "timeRemaining": "19:25",
            "situationCode": "1551",
            "homeTeamDefendingSide": "left",
            "typeCode": 516,
            "typeDescKey": "stoppage",
            "sortOrder": 15,
            "details": {
                "reason": "icing"
            }
        },
        {
            "eventId": 103,
            "periodDescriptor": {
                "number": 1,
                "periodType": "REG"
            },
            "timeInPeriod": "00:35",
            "timeRemaining": "19:25",
            "situationCode": "1551",
            "homeTeamDefendingSide": "left",
            "typeCode": 502,
            "typeDescKey": "faceoff",
            "sortOrder": 17,
            "details": {
                "eventOwnerTeamId": 14,
                "losingPlayerId": 8476925,
                "winningPlayerId": 8478519,
                "xCoord": -69,
                "yCoord": 22,
                "zoneCode": "D"
            }
        },
        {
            "eventId": 9,
            "periodDescriptor": {
                "number": 1,
                "periodType": "REG"
            },
            "timeInPeriod": "00:48",
            "timeRemaining": "19:12",
            "situationCode": "1551",
            "homeTeamDefendingSide": "left",
            "typeCode": 503,
            "typeDescKey": "hit",
            "sortOrder": 20,
            "details": {
                "xCoord": 64,
                "yCoord": 42,
                "zoneCode": "D",
                "eventOwnerTeamId": 18,
                "hittingPlayerId": 8474568,
                "hitteePlayerId": 8476453
            }
        },
@HarryShomer
Copy link
Owner

Looking into it a bit, it looks like the "eventID" values in the json are no longer guaranteed to be in order (See snippet of json below).

@JB13 Thanks for the heads up. That's a bummer.

Looking at the JSON your provided, I wonder what "sortOrder" represents. That's seems to be increasing for each subsequent event. That might work, though I have no idea what the actual value represents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants