Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial chars in readme misinterpreted as audio file #17108

Closed
epistemex opened this issue Sep 20, 2021 · 7 comments · Fixed by #23355
Closed

Initial chars in readme misinterpreted as audio file #17108

epistemex opened this issue Sep 20, 2021 · 7 comments · Fixed by #23355
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented type/bug

Comments

@epistemex
Copy link

epistemex commented Sep 20, 2021

Gitea Version

1.15.2 built with GNU Make 4.1, go1.16.7

Git Version

not relevant

Operating System

Debian 10/x64

How are you running Gitea?

apt install / locally

Database

SQLite

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Description

When uploading a readme.md file starting with ID3 the file is mistaken for an MP3 audio file. Example content

ID3Toy
======

Read and write ID3 tags (v1.0, v1.1, v2.0, v2.3, v2.4), APE (v1, v2) and LYRICS 3 tags (v1, v2) in MP3 files.
...

(MP3 files may start with ID3 tag but shouldn't be used to identify such files to begin with, but that's another issue :) )

So my README.md files shows up as raw file:

2021-09-20_18-36

and clicking on the file line itself shows me the audio player:

2021-09-20_18-36_1

The content of the readme file is correct, verified and is just text (the example above is a direct copy from it). :)

Screenshots

Network activity snapshot

2021-09-20_18-44

Adding a new-line at the top of the file acts as a workaround and Gitea accepts the file properly as text/md file:

2021-09-20_19-47

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Sep 21, 2021

Just to explain: if the mime type is detected automatically from server side, every file starts with ID3 will be reported as audio file. Maybe some files' mime-types should be detected by other methods.

image

@axifive axifive added type/bug issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented labels Sep 21, 2021
@axifive
Copy link
Member

axifive commented Sep 21, 2021

it's http.DetectContentType() in renderFile() function

@wxiaoguang
Copy link
Contributor

Yep, typesniffer.DetectContentType and http.DetectContentType work like file command, it only checks some bytes of the file. To fix the "bug", the filenames should also be considered, some well-known filenames should have explicit mime-types, but it might need a refactor. Not sure if it's worthy to do so.

@silverwind
Copy link
Member

Seems like a bug in http.DetectContentType, I would report it to golang.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Sep 29, 2021

To be honest, I do not treat it as a bug. Because the mime auto-detectors just work as this behavior. You can not always correctly guess a file type by its content or extension name (eg: nginx, https://github.com/nginx/nginx/blob/master/conf/mime.types )

@delvh
Copy link
Member

delvh commented Sep 29, 2021

The problem is:
As @wxiaoguang mentioned, even the Linux command file produces in that circumstance an audio file. For me as well.
So, if both commonly used type detection systems (file and go-http.DetectContentType) behave the same here, I think it is not a bug and instead intended. This particular char sequence does not seem to be a good choice for the start of a file.
And I can somewhat understand why this needs to be done: audio files must be detected somehow. I simply wouldn't have guessed that such an easy string would be used as the identifier.

As remediation, try to prepend a before the leading chars. On my machine, file then shows ASCII text instead of an audio file.

I think that this issue can then be closed here because even if it is surprisingly a bug and not intended, there is nothing Gitea can or should do here.

@wxiaoguang
Copy link
Contributor

Will be fixed by: Fix ID3 content detection #23355

techknowlogick added a commit that referenced this issue Mar 8, 2023
Close #17108

This PR uses a trick (removing the ID3 tag) to detect the content again
to to see whether the content is text type.

---------

Co-authored-by: delvh <dev.lh@web.de>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
GiteaBot pushed a commit to GiteaBot/gitea that referenced this issue Mar 8, 2023
Close go-gitea#17108

This PR uses a trick (removing the ID3 tag) to detect the content again
to to see whether the content is text type.

---------

Co-authored-by: delvh <dev.lh@web.de>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
GiteaBot pushed a commit to GiteaBot/gitea that referenced this issue Mar 8, 2023
Close go-gitea#17108

This PR uses a trick (removing the ID3 tag) to detect the content again
to to see whether the content is text type.

---------

Co-authored-by: delvh <dev.lh@web.de>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
lunny added a commit that referenced this issue Mar 8, 2023
Backport #23355

Close #17108

This PR uses a trick (removing the ID3 tag) to detect the content again
to to see whether the content is text type.

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: delvh <dev.lh@web.de>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants