Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite topology tools with Construct to improve the readability and maintainability #755

Merged
merged 5 commits into from
Sep 24, 2021

Conversation

ghost
Copy link

@ghost ghost commented Aug 23, 2021

I'd like to improve the topology parser to get more info like the widget core, but the legacy code contains too many magic number which is hard to understand and maintain. Here I will rewrite the topology tools with Construct which is a powerful declarative and symmetrical parser and builder for binary data to make the code more maintainable.

Signed-off-by: Yongan Lu yongan.lu@intel.com

@ghost ghost requested a review from aiChaoSONG August 23, 2021 05:59
@plbossart
Copy link
Member

@marc-hb @mengdonglin are you good with the added dependency on a package?

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 23, 2021

I agree one should never re-invent the wheel and implement a binary parser without relying on some library like Construct. On the other hand:

@ranj063
Copy link
Contributor

ranj063 commented Aug 23, 2021

There's a decoder in alsa-utils (-d switch with alsatplg IIRC). We should look into using it instead of depending on an external package

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 23, 2021

There's a decoder in alsa-utils (-d switch with alsatplg IIRC).

Is it in C? This PR proposes to "upgrade" our manual Python implementation to a higher level approach in order to reduce code size and maintenance. Downgrading to a low-level, memory management/corruption language would go the opposite direction. Python is slow but I don't think parsing topologies is in any critical path for us.

Is alsa-utils a C-only project? If yes then that would justify keeping an implementation in a higher level language here for the lack of a better place.

@ghost ghost force-pushed the tplgtool-ng branch 4 times, most recently from 5a7125b to 5d11bc7 Compare August 24, 2021 05:48
@ghost
Copy link
Author

ghost commented Aug 24, 2021

Hi @plbossart, I'm trying to use pip to install the dependencies in GitHub Actions because apt seems to install the packages for Python whose path is different than setup-python use. You can see my modification in abf226d.

@ghost ghost changed the title [WIP] Rewrite topology tools with Construct to improve the readability and maintainability [WIP][SKIP CI] Rewrite topology tools with Construct to improve the readability and maintainability Aug 24, 2021
@marc-hb
Copy link
Collaborator

marc-hb commented Aug 24, 2021

pip versus apt is an important question but it's a premature question. A much more important question is: "Does SOF validation goes down when Construct goes down?"

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 24, 2021

but the legacy code contains too many magic number which is hard to understand and maintain.

Naming magic numbers does not require a new library. BTW I see the existing parser already uses https://docs.python.org/3/library/struct.html extensively (which will never "go down")

Here I will rewrite the topology tools with Construct which is a powerful declarative and symmetrical parser and builder for binary data to make the code more maintainable.

How much more maintainable? Can you please share some expected numbers like lines of code and examples of repetitive code not needed any more, number of classes and functions, cyclomatic complexity, ease of unit testing, error message clarity and any other metric of your choice. No precise number, just expected orders of magnitude (if there's no order of magnitude difference then it's probably not worth it)

@ghost
Copy link
Author

ghost commented Aug 24, 2021

Hi @marc-hb, thanks for your feedback, and sorry for my late reply because I spent some time collecting information.

  • Can you explain why Construct is the best choice as opposed to for instance Kaitai? As noted by @plbossart, tying our fate to Construct's is not a decision that should be made lightly. How popular and actively maintained is Construct?

I think Kaitai could be a good choice if we need to reuse the parser in different languages. But what we need here is just a topology parser in Python and Kaitai have more heavy dependency and complex workflow (we describe the binary format in a ksy file and need the compiler to translate Kaitai Struct language to Python). Construct works like an embedded DSL, so we write everything in Python and we can just run pip install construct for the requirement. I only find Construct as a stable, maintained actively, self-contained, declarative binary data parser in PyPI, so I believe this is the best choice. I think Construct is the most popular declarative binary data parser in Python (libraries.io) and it's still actively maintained this year.

Sorry for my misleading words, what we actually make is a topology tool for testing and debuging, and parsing topology is just the first step. With this tool, we can not only dump some topology information but also render a topology graph. As #471 (comment) said, some parameters can only be extracted from topology, and what I want to do is just a replacement for the origin topology tool. Moreover, writing the parser ourselves allow us to parse some private data like schedule core, so it's a more flexible way than an official tool. Also, alsatplg -d can decode a binary topology file to the configuration file, but it just moves the problem from parsing a binary file to parsing a configuration file.

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 24, 2021

Thanks for the detailed explanations, very useful and convincing. I think what could finish convincing me is a before/after comparison on a small sample bit of code.

but it just moves the problem from parsing a binary file to parsing a configuration file.

OK but parsing text is much lower maintenance than binaries and there is probably a wider choice of libraries to help with that.

@ghost
Copy link
Author

ghost commented Aug 24, 2021

How much more maintainable? Can you please share some expected numbers like lines of code and examples of repetitive code not needed any more, number of classes and functions, cyclomatic complexity, ease of unit testing, error message clarity and any other metric of your choice. No precise number, just expected orders of magnitude (if there's no order of magnitude difference then it's probably not worth it)

I'm not familiar with those metrics, so I'd like to give an example in an intuitive way:

For parsing PCM struct, the legacy code:

sof-test/tools/tplgtool.py

Lines 335 to 379 in 5bb6679

def _parse_pcm_struct(self, bytes_data):
pcm_fields = ["size", "pcm_name", "dai_name", "pcm_id", "dai_id", "playback", "capture",
"compress", "stream", "num_streams", "caps", "flag_mask", "flags", "priv"]
values = []
values.append(struct.unpack("I",bytes_data[:4])[0])
values.append(self._parse_char_array(bytes_data[4:48]))
values.append(self._parse_char_array(bytes_data[48:92]))
for i in list(struct.iter_unpack("I", bytes_data[92:112])):
values.append(i[0])
# parse snd_soc_tplg_stream array
stream_list = []
tplg_stream_size = 72 # tplg_stream size is 72
stream_data = bytes_data[112:688]
for idx in range(8):
stream_start = tplg_stream_size * idx
stream_list.append(self._parse_stream_struct(stream_data[stream_start: stream_start + 72]))
values.append(stream_list)
values.append(struct.unpack("I", bytes_data[688:692])[0])
# parse snc_soc_tplg_stream_caps
stream_cap_list = []
tplg_stream_caps_size = 104
stream_cap_data = bytes_data[692:900]
for idx in range(2):
start = tplg_stream_caps_size * idx
stream_cap_list.append(self._parse_stream_cap_struct(stream_cap_data[start:start+tplg_stream_caps_size]))
values.append(stream_cap_list)
values.append(struct.unpack("I", bytes_data[900:904])[0])
values.append(struct.unpack("I", bytes_data[904:908])[0])
priv_size = struct.unpack("I", bytes_data[908:912])[0]
priv = {"size":priv_size}
if priv_size == 0:
priv["data"] = None
else :
priv["data"] = bytes_data[908:908+priv_size]
values.append(priv)
pcm = dict(zip(pcm_fields, values))
if len(bytes_data[912 + priv_size -1:]) < 4:
return pcm, None
return pcm, bytes_data[912+priv_size:]

it contains many magic numbers as field offsets. If there is any change, we need to reevaluate those offsets and hope we won't miscalculate them in some steps.

My new code with Construct:

self._pcm = Struct( # snd_soc_tplg_pcm
"size" / Int32ul,
"pcm_name" / PaddedString(self._id_name_maxlen, "ascii"),
"dai_name" / PaddedString(self._id_name_maxlen, "ascii"),
"pcm_id" / Int32ul,
"dai_id" / Int32ul,
"playback" / Int32ul,
"capture" / Int32ul,
"compress" / Int32ul,
"stream" / Array(self._stream_config_max, self._stream),
"num_streams" / Int32ul,
"caps" / Array(2, self._stream_caps),
"flag_mask" / Int32ul,
"flags" / FlagsEnum(Int32ul, DaiLnkFlag),
"priv" / self._private,
)

it is simply corresponding to struct snd_soc_tplg_pcm in alsa-lib. If something changed, just keep the mapping rule and everything will work fine.

OK but parsing text is much lower maintenance than binaries and there is probably a wider choice of libraries to help with that.

Well, I don't think parsing binary is much higher maintenance than text if we accept Construct. And I find the decoded configuration file seems lost some information or may need another step to retrieve them. So I think it could be a choice but I do not appreciate it.

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 30, 2021

Thanks for the example. I don't think it's a very fair example though: the reason why there are hardcoded offsets everywhere right now is not really because of the difference between struct and construct but because the author didn't really understand how to use struct and used extremely short formats, often just one character long!

However I agree that construct code looks more readable in general. Also, if someone (you) makes the effort to fix the currently pretty bad code then he might as well go the extra mile and switch to a different and more readable solution.

So +1 from me for the Construct dependency. It looks like a good project. (I briefly looked at but haven't reviewed the code in this PR)

@ghost ghost force-pushed the tplgtool-ng branch 4 times, most recently from 47d5d76 to 531371a Compare August 31, 2021 02:55
@ghost ghost force-pushed the tplgtool-ng branch 2 times, most recently from 3abf82b to 6ff8401 Compare September 1, 2021 08:12
marc-hb
marc-hb previously approved these changes Sep 1, 2021
.github/workflows/pull_request.yml Outdated Show resolved Hide resolved
.github/workflows/pull_request.yml Outdated Show resolved Hide resolved
@ghost ghost force-pushed the tplgtool-ng branch 2 times, most recently from 082457a to a6e20cf Compare September 14, 2021 09:03
@ghost
Copy link
Author

ghost commented Sep 14, 2021

All of the pylint warnings for sof-tplgreader.py aren't introduced by me. I'm focusing on tplgtool.py, so I only make minimal changes to sof-tplgreader.py for compatibility.

@ghost ghost changed the title [WIP] Rewrite topology tools with Construct to improve the readability and maintainability Rewrite topology tools with Construct to improve the readability and maintainability Sep 14, 2021
@ghost ghost marked this pull request as ready for review September 14, 2021 09:56
@ghost ghost self-requested a review as a code owner September 14, 2021 09:56
Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the new code looks much more readable! Great effort. I'm surprised that it is not that much shorter but that's probably because of all the pydocs (thanks!) and also a lot more whitespace (now I'm a bit curious about the characters count)

My main issue is with the almost complete rewrite in place and "Big Bang" migration. Instead of replacing the existing files, can you please create two new files sof-tplgreader2.py and tplgtool2.py? Or any other better name you can think of. This will provide a smooth transition period where anyone or anything stuck with a bug or small difference in the new tool can report the bug or incompatibility and then very quickly, locally and temporarily revert back to the old one to unblock themselves.

I checked and there are fewer than 15 references to the old names in sof-test. They don't need to be switched all at the same time: Continuous Integration.

The old scripts can be deleted later. This will break any references outside sof-test that we failed to spot but by that time the new scripts will have been well tested and all new bugs and most incompatibilities squashed.

To not lose the git history of sof-tplgreader.py you can have a very first commit that merely copies to sof-tplgreader2.py without changing it yet, this helps git follow the history.

.github/workflows/pull_request.yml Outdated Show resolved Hide resolved
.github/workflows/pull_request.yml Outdated Show resolved Hide resolved
@ghost
Copy link
Author

ghost commented Sep 15, 2021

@marc-hb Creating new files for smooth migration is a good idea. Because I'm focusing on new tplgtool in this PR and @aiChaoSONG encourage me to rewrite sof-tplgreader.py too in the future, I think we can leave sof-tplgreader.py unchanged this time and postpone the migration for sof-tplgreader.py until rewriting it. For a smooth transition, we can start to draw topology graphs with the new tplgtool2.py in our CI.

@ghost ghost requested a review from marc-hb September 15, 2021 07:39
@ghost ghost force-pushed the tplgtool-ng branch 2 times, most recently from 3c0d446 to a33d8de Compare September 16, 2021 07:36
* python3-graphviz for drawing topology graph
* python3-construct for parsing topology file

Signed-off-by: Yongan Lu <yongan.lu@intel.com>
contruct is a new dependency for topology parsing.

Signed-off-by: Yongan Lu <yongan.lu@intel.com>
aiChaoSONG
aiChaoSONG previously approved these changes Sep 24, 2021
Copy link

@aiChaoSONG aiChaoSONG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes or explanations required, let's use duplex instead of both

@aiChaoSONG
Copy link

Thanks, @YongAnLu00 for the tool, let's merge the tool, let's do bug fixes and features in further PR

@aiChaoSONG aiChaoSONG merged commit 530d8eb into thesofproject:main Sep 24, 2021
@ghost ghost deleted the tplgtool-ng branch September 24, 2021 08:10
1. write a declarative topology file parser with Construct
2. TplgFormatter in legacy code is confusing, split its
functions into more explicit parts: GroupedTplg and TplgGraph
2.1 GroupedTplg works like a simple wrapper to access different
types of blocks more conveniently
2.2 TplgGraph build components graph for drawing and searching
components through graph

Signed-off-by: Yongan Lu <yongan.lu@intel.com>
deprecate legacy tplgtool.py and recommend user migrate to
tplgtool2.py

Signed-off-by: Yongan Lu <yongan.lu@intel.com>
Add new dependency info and tplgtool2.py description to README

Signed-off-by: Yongan Lu <yongan.lu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants