Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert ticket v3 HTML to JSON tickets #22

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions lidlplus/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import logging
import re
from datetime import datetime, timedelta
from json import JSONDecodeError

import requests

Expand All @@ -16,6 +17,7 @@
LegalTermsException,
MissingLogin,
)
from lidlplus.html_receipt import parse_html_receipt

try:
from getuseragent import UserAgent
Expand All @@ -39,7 +41,7 @@ class LidlPlusApi:

_CLIENT_ID = "LidlPlusNativeClient"
_AUTH_API = "https://accounts.lidl.com"
_TICKET_API = "https://tickets.lidlplus.com/api/v2"
_TICKET_API = "https://tickets.lidlplus.com/api"
_COUPONS_API = "https://coupons.lidlplus.com/api"
_COUPONS_V1_API = "https://coupons.lidlplus.com/app/api/"
_PROFILE_API = "https://profile.lidlplus.com/profile/api"
Expand Down Expand Up @@ -257,7 +259,7 @@ def tickets(self, only_favorite=False):
If set to False (the default), all tickets will be retrieved.
:type onlyFavorite: bool
"""
url = f"{self._TICKET_API}/{self._country}/tickets"
url = f"{self._TICKET_API}/v2/{self._country}/tickets"
kwargs = {"headers": self._default_headers(), "timeout": self._TIMEOUT}
ticket = requests.get(f"{url}?pageNumber=1&onlyFavorite={only_favorite}", **kwargs).json()
tickets = ticket["tickets"]
Expand All @@ -268,8 +270,16 @@ def tickets(self, only_favorite=False):
def ticket(self, ticket_id):
"""Get full data of single ticket by id"""
kwargs = {"headers": self._default_headers(), "timeout": self._TIMEOUT}
url = f"{self._TICKET_API}/{self._country}/tickets"
return requests.get(f"{url}/{ticket_id}", **kwargs).json()
url = f"{self._TICKET_API}/v2/{self._country}/tickets/{ticket_id}"
try:
return requests.get(url, **kwargs).json()
except JSONDecodeError:
url = f"{self._TICKET_API}/v3/{self._country}/tickets/{ticket_id}"
receipt_json = requests.get(url, **kwargs).json()
return parse_html_receipt(
date=receipt_json["date"],
html_receipt=receipt_json["htmlPrintedReceipt"],
)

def coupon_promotions_v1(self):
"""Get list of all coupons API V1"""
Expand Down
65 changes: 65 additions & 0 deletions lidlplus/html_receipt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
from typing import Any
import re

import lxml.html as html


VAT_TYPE_LINE_ENDING_PATTERN = re.compile(r" [A-Z]$")


def parse_html_receipt(date: str, html_receipt: str) -> dict[str, Any]:
dom = html.document_fromstring(html_receipt)

receipt = {
"date": date,
"itemsLine": [],
}
for node in dom.xpath(r".//span[starts-with(@id, 'purchase_list_line_')]"):
if "class" not in node.attrib:
if not VAT_TYPE_LINE_ENDING_PATTERN.search(node.text):
continue

*name_parts, price = node.text[:-2].split()
receipt["itemsLine"].append(
{
"name": " ".join(name_parts),
"originalAmount": price,
"isWeight": True,
"discounts": [],
}
)
elif node.attrib["class"] == "currency":
receipt["currency"] = {"code": node.text.strip(), "symbol": node.attrib["data-currency"]}
elif node.attrib["class"] == "article":
if node.text.startswith(" "):
continue

quantity_text = node.get("data-art-quantity")
if quantity_text is None:
is_weight = False
quantity = 1
elif "," in quantity_text:
is_weight = True
quantity = quantity_text
else:
is_weight = False
quantity = quantity_text

receipt["itemsLine"].append(
{
"name": node.attrib["data-art-description"],
"currentUnitPrice": node.attrib["data-unit-price"],
"isWeight": is_weight,
"quantity": quantity,
"discounts": [],
}
)
elif node.attrib["class"] == "discount":
discount = abs(parse_float(node.text.split()[-1]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throws an IndexError when I'm running it because some of the span elements contain just white text so the node.text.split() returns an empty list.

Here's the HTML on my receipt:

<span id="purchase_list_line_3" class="discount css_bold" data-promotion-id="promo_id">   Coupon Plus reward</span>
<span id="purchase_list_line_3" class="discount" data-promotion-id="promo_id">      </span>
<span id="purchase_list_line_3" class="discount" data-promotion-id="promo_id">     </span>
<span id="purchase_list_line_3" class="discount css_bold" data-promotion-id="promo_id">-0.69</span>```

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, they just can't be consistent... Thank you for providing another data point with which we can figure out all the formats they use for this! I'll try to implement the format you provided once I have time to actually do this though... You can always suggest changes and I'll be happy to use them of course.

For now I'm thinking a regex searching for something like -\d+[\.,]\d{2}$ would be best.

Btw shouldn't the code currently fail in parsing the first line's reward word as float instead of the whitespace-split-index-error that you're describing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rush to implement this. I would have done it myself but I was hesitant to suggest changes cause I'm still trying to understand what's happening. I will for sure once I get more familiar with the project. :)

I think because the class of the first line is discount ccs_bold instead of just discount it's not parsed at all.

So, does the HTML differ from country to country? Or does it depend on the coupon you use and whether it's a percentage/flat discount? That's the only receipt with a coupon I have so that's my only data point :/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hesitant to suggest changes cause I'm still trying to understand what's happening. I will for sure once I get more familiar with the project.

No worries :) This is not even that tied to this specific project, but more to the actual lidl API since AFAIK there's no public documentation for it and people just somehow reverse engineered it.

I think because the class of the first line is discount ccs_bold instead of just discount it's not parsed at all.

Right, of course, missed that.

So, does the HTML differ from country to country? Or does it depend on the coupon you use and whether it's a percentage/flat discount?

There's definitely some difference for some reason. See diogotcorreia/lidl-to-grocy's lidl/test/receipt.html. It uses I think the same format as what I saw and implemented in this PR. It's weird that your receipt is different, but we'll probably have to implement a common parsing for all possible formats. Currently I'm blocked on #23 though so I can't verify if anything changed recently in my receipts, but in my lidl-plus android app I don't see any discounts as bold as you probably would.


We could probably do something like this to support both formats:

if ...:
    ...
elif {"discount", "css_bold"}.issubset(node.attrib["class"].split()) and try_parse_float(node.text):
    ...
elif node.attrib["class"] == "discount":
    ...

receipt["itemsLine"][-1]["discounts"].append({"amount": str(discount).replace(".", ",")})

return receipt


def parse_float(text: str) -> float:
return float(text.replace(",", "."))
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ oic>=1.4.0
requests>=2.28.1
selenium-wire>=5.1.0
webdriver-manager>=3.8.5
blinker==1.7.0
blinker==1.7.0
lxml>=5.3.0