Convert FlexT descriptions into Kaitai ones #292

KOLANICH · 2017-11-15T09:40:19Z

I've found the webpage with one more language to create binary parsers. It also has own library of descriptions. Some of them are missing in kaitai_struct_formats, for example SQLite.
http://geos0.icc.ru/scripts/WWWBinV.dll/Cat

In a e-mail conversation Alexei Hmelnov stated that a one may assume that all the descriptions are under MIT license and that he is going to add a license to the site when he have time.

The script to download the formats from the site

#!/usr/bin/env python3
import encodings
import json  # don't optionally replace with ujson, serializes differently!
import os
from collections import OrderedDict
from pathlib import Path

import bs4
import dateutil.parser
import ratelimit  # https://github.com/tomasbasham/ratelimit
from tqdm import tqdm

try:
	import httpx
except ImportError:
	import requests as httpx

base = "http://geos0.icc.ru"
catalog = base + "/scripts/WWWBinV.dll/Cat"


def getCharset(soup: bs4.BeautifulSoup, default: str = "utf-8") -> str:
	el = soup.select_one("head > meta[http-equiv=Content-Type]")
	if el:
		for part in el["content"].split(";"):
			if "=" in part:
				part = part.split("=")
				if len(part) > 1 and part[0].lower == "charset":
					return part[1]
	return default


def bin2soup(bin: (bytes, bytearray)) -> bs4.BeautifulSoup:
	enc = "windows-1251"
	str = bin.decode(encoding=enc, errors="replace")
	soup = bs4.BeautifulSoup(str, "html5lib")
	enc1 = getCharset(soup, "windows-1251")
	if enc != enc1:
		str = bin.decode(encoding=enc1)
		soup = bs4.BeautifulSoup(str, "html5lib")
	return soup


def buildIndex(targetDir: Path) -> OrderedDict:
	catalogCacheFile = targetDir / "Cat"
	if not catalogCacheFile.is_file():
		catalogTextEncoded = httpx.get(catalog).content
		catalogCacheFile.write_bytes(catalogTextEncoded)
	else:
		print("Index source is already present. Delete " + str(catalogCacheFile) + " to regenerate")
		catalogTextEncoded = catalogCacheFile.read_bytes()

	parsed = bin2soup(catalogTextEncoded)
	table = parsed.select_one("table")
	rows = table.select("tr")
	res = {}
	header = [el.text.strip().lower() for el in rows[0].select("td")]

	rows = rows[1:]
	rowsRes = OrderedDict()
	for row in rows:
		rowRes = OrderedDict(zip(header, [el.text.strip() for el in row.select("td")]))
		# rowRes["date"]=dateutil.parser.parse(rowRes["date"])
		rowRes["uri"] = base + row.select_one("a[href]")["href"]

		cat = rowsRes
		if rowRes["class"] not in cat:
			cat[rowRes["class"]] = {}
		cat = cat[rowRes["class"]]
		if rowRes["status"] not in cat:
			cat[rowRes["status"]] = {}
		cat = cat[rowRes["status"]]

		cat[rowRes["file"]] = rowRes
	return rowsRes


def writeSource(soup: bs4.BeautifulSoup, fileName: Path) -> None:
	meta = {}
	for el in soup.select("head > meta"):
		if "name" in el.attrs:
			meta[el.attrs["name"]] = el.attrs["content"]
	metaStr = soup.select_one("font").text
	source = soup.select_one("pre").text

	source = "% " + metaStr + "\n" + "% " + json.dumps(meta) + "\n\n" + source

	with fileName.open("wt", encoding="utf-8") as f:
		f.write(source)


@ratelimit.rate_limited(period=2)
def downloadFormat(uri: str, path: Path) -> None:
	req = httpx.get(uri)
	soup = bin2soup(req.content)
	writeSource(soup, path)


def downloadFormats(index: OrderedDict, targetDir: Path) -> None:
	for clsName, cls in tqdm(index.items(), desc="Classes"):
		clsPath = targetDir / clsName.replace(".", "").replace("/", "").replace("\\", "")
		for statusName, status in tqdm(cls.items(), desc="Statuses in " + clsName):
			statusPath = clsPath / statusName.replace(".", "").replace("/", "").replace("\\", "")
			os.makedirs(str(statusPath), mode=0o771, exist_ok=True)
			for formatName, formatDescr in tqdm(status.items(), desc="Formats in " + statusName):
				formatPath = statusPath / formatName.replace("/", "").replace("\\", "")
				tqdm.write("downloading : " + formatDescr["uri"] + " -> " + str(formatPath))
				downloadFormat(formatDescr["uri"], formatPath)


def main() -> None:
	targetDir = Path(".")
	indexCacheFile = targetDir / "cat.json"
	if not indexCacheFile.is_file():
		with indexCacheFile.open("wt", encoding="utf-8") as f:
			index = buildIndex(targetDir)
			json.dump(index, f, indent="\t")
	else:
		print("Index is already present. Delete " + str(indexCacheFile) + " to regenerate")
		with indexCacheFile.open("rt", encoding="utf-8") as f:
			index = json.load(f)
	downloadFormats(index, targetDir)


if __name__ == "__main__":
	main()

He also have provided me with a formal grammar of his language.

GreyCat · 2017-11-16T10:53:50Z

Thanks, that's another really interesting gem that I had no idea about! Looks like the project is kind of abandoned, though :(

for example SQLite.

Hmm, actually I recall doing a pretty convinicing implementation of sqlite... Have I forgot to commit it?..

In a e-mail conversation Alexei Hmelnov stated that a one may assume that all the descriptions are under MIT license and that he is going to add a license to the site when he have time.

Yeah, I've got your e-mail, thanks! Confirming his intention to make it available under MIT-like license as well.

AFAIU, Alexei used Coco/R for grammar description, which technically can generate parsers for Python or Ruby, so probably it won't be that hard to create a script that would do the conversion.

GreyCat · 2017-11-16T11:03:19Z

set byteorder rev
set byteorder norm

/me facepalms

KOLANICH · 2017-11-16T17:47:44Z

@GreyCat, I have already found, that it is a CoCo/R grammar, but there is some issues with the python generator / the grammar file itself.

The generator says that the grammar file is invalid. In fact it looks like it is really invalid for vanilla CoCo/R, for example it uses ) ( instead of } { and identifier* as in regexps instead of CoCo/R's {identifier}.

I have replaced these stuff with valid CoCo/R, but it still fails in another place, I'm not very familiar with coco/R , but another grammar with the same tokens works fine. Maybe there is still something wrong in this grammar, but I am not sure untill I finished fixing the generator to be able to run the test suite.

GreyCat · 2017-11-16T17:51:53Z

Well, may be manual conversion would be more feasible in this case. Probably we should do a list of formats available there which are not yet available for KS? Quite a few of these formats are available in our repo already.

KOLANICH · 2017-11-16T18:07:21Z

The script in the first post does create the list. There are too many formats we don't have, so I guess we need a tool.

But a manual conversion of the grammar to some alive parser like ply or parglare would be totally feasible, if I couldn't make pyCoCo/R work.

KOLANICH · 2017-11-23T19:03:38Z

Failed to make CoCoPy to compile that grammar. (CoCo/R is LL(1), but the thing seems to use some extensions beyond LL(1) and some additional, I haven't found any description or an implementation of CoCo/R using these extensions). Converted the grammar to the parglare one (parglare claims to be the fastest parser generator (but we know that CoCoPy is the fastest because it generates the code and because it is LL(1)) for python parglare generates parsers for LR and GLR, GLR, which is more powerful). The grammar compiles but doesn't work even in GLR mode (says that "\n" is expected) :(. Damn slow even in LR mode. Need future debugging.

NotWearingPants · 2023-06-03T19:45:11Z

Original link is broken (when you click a file type), here's a new one-
http://geos0.icc.ru/scripts/WWWBinV.dll/Cat

KOLANICH · 2023-06-03T23:27:18Z

Thanks, @NotWearingPants, for letting us know. Thanks, @generalmimon, for fixing.

@NotWearingPants, I have fixed some bugs within the download script too. If you want to work on the flext converter, https://github.com/KOLANICH-specs/kaitai_struct_formats/tree/FlexT/flext may be helpful. It is a temporary branch, should not be PRed as it is. Just a scratch space for my convenience.

cugu added the related project label Sep 14, 2019

KOLANICH mentioned this issue Jun 9, 2020

Is there a way to convert the template of 010editor into Kaitai's #755

Open

treiher mentioned this issue Jul 14, 2020

FlexT Componolit/systematization-tools#51

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert FlexT descriptions into Kaitai ones #292

Convert FlexT descriptions into Kaitai ones #292

KOLANICH commented Nov 15, 2017 •

edited

Loading

GreyCat commented Nov 16, 2017

GreyCat commented Nov 16, 2017

KOLANICH commented Nov 16, 2017

GreyCat commented Nov 16, 2017

KOLANICH commented Nov 16, 2017 •

edited

Loading

KOLANICH commented Nov 23, 2017 •

edited

Loading

NotWearingPants commented Jun 3, 2023

KOLANICH commented Jun 3, 2023 •

edited

Loading

Convert FlexT descriptions into Kaitai ones #292

Convert FlexT descriptions into Kaitai ones #292

Comments

KOLANICH commented Nov 15, 2017 • edited Loading

GreyCat commented Nov 16, 2017

GreyCat commented Nov 16, 2017

KOLANICH commented Nov 16, 2017

GreyCat commented Nov 16, 2017

KOLANICH commented Nov 16, 2017 • edited Loading

KOLANICH commented Nov 23, 2017 • edited Loading

NotWearingPants commented Jun 3, 2023

KOLANICH commented Jun 3, 2023 • edited Loading

KOLANICH commented Nov 15, 2017 •

edited

Loading

KOLANICH commented Nov 16, 2017 •

edited

Loading

KOLANICH commented Nov 23, 2017 •

edited

Loading

KOLANICH commented Jun 3, 2023 •

edited

Loading