-
Notifications
You must be signed in to change notification settings - Fork 1
mime.h
mime.c and mime.h are an existing module to support what trn loosely calls “MIME”, which includes the handling of HTML posts and the MIME-decoding aspect of non-ASCII text posts.
Trn’s HTML scanner is a sloppy string replacement engine similar to w3m’s but generally even sloppier (though in some cases it tries harder). It makes no attempt to, for example, render alt text or lay out tables; it does not detect decorative images or convert HTML links to plain text. But on the other hand, it tries to detect whether a BLOCKQUOTE should be rendered as an indent or as a USENET-style “>” attribution.
HTML is handled by scanning, not parsing, and it’s not very meaningful to talk about supported or unsupported tags. A small number of tags are handled (either in the filter_html() main loop or in tag_action()); unhandled tags are simply thrown out.
Level 2 scanning state is controlled by a bitmask that does not nest, so while individual inner elements like TITLE or STYLE can be hidden, you can’t mark HEAD as hidden because when TITLE or STYLE closes the hidden flag (HF_IN_HIDING) will be reset.
Type | Meaning |
---|---|
struct hblk | |
struct html_tags | Descriptor for a handled HTML tag. |
struct mimecap_entry | |
struct mime_sect |
Constant | Meaning |
---|---|
MSF_INLINE | |
MSF_ALTERNATIVE | |
MSF_ALTERNADONE |
Bitmask | Meaning |
---|---|
HF_IN_TAG | Currently inside a tag |
HF_IN_COMMENT | Currently inside a comment (within a tag) |
HF_IN_HIDING | Any #text found should not be displayed |
HF_IN_PRE | Currently within a PRE element |
HF_IN_DQUOTE | Currently inside double quotes (within a tag) |
HF_IN_SQUOTE | Currently inside single quotes (within a tag) |
HF_QUEUED_P | |
HF_P_OK | |
HF_QUEUED_NL | |
HF_NL_OK | |
HF_NEED_INDENT | |
HF_SPACE_OK | |
HF_COMPACT |
Constant | Meaning |
---|---|
HTML_MAX_BLOCKS |
Bitmask | Meaning |
---|---|
TF_BLOCK | |
TF_HAS_CLOSE | |
TF_NL | |
TF_P | |
TF_BR | |
TF_LIST | |
TF_HIDE | |
TF_SPACE | |
TF_TAB |
These must match tagattr below.
Constant | Meaning |
---|---|
TAG_BLOCKQUOTE | The BLOCKQUOTE tag. |
TAG_BR | The BR tag. |
TAG_DIV | The DIV tag. |
TAG_HR | The HR tag. |
TAG_IMG | The IMG tag. |
TAG_LI | The LI tag. |
TAG_OL | The OL tag. |
TAG_P | The P tag. |
TAG_PRE | The PRE tag. |
TAG_SCRIPT | The SCRIPT tag. |
TAG_STYLE | The STYLE tag. |
TAG_TD | The TD tag. |
TAG_TH | The TH tag. |
TAG_TR | The TR tag. |
TAG_TITLE | The TITLE tag. |
TAG_UL | The UL tag. |
TAG_XML | The XML tag (non-standard). |
LAST_TAG | Total number of handled tags. |
Constant | Meaning |
---|---|
CLOSING_TAG | |
OPENING_TAG |
Constant | Meaning |
---|---|
NOT_MIME | Not a MIME post (cf. is_mime). |
TEXT_MIME | A text/plain attachment. |
ISOTEXT_MIME | A text/plain attachment in ISO-8859-1 (not used by the UTF-8 patch). |
MESSAGE_MIME | |
MULTIPART_MIME | |
IMAGE_MIME | |
AUDIO_MIME | |
APP_MIME | |
UNHANDLED_MIME | An unknown MIME attachment. |
SKIP_MIME | |
DECODE_MIME | |
BETWEEN_MIME | |
END_OF_MIME | |
HTMLTEXT_MIME | |
ALTERNATE_MIME |
Not sure how these are used.
Constant | Meaning |
---|---|
MCF_NEEDSTERMINAL | |
MCF_COPIOUSOUTPUT |
Global variable | Type | Meaning |
---|---|---|
auto_view_inline | bool | Whether trn should automatically decode inline attachments. |
mime_article | MIME_SECT | |
mimecap_list | LIST* | |
mime_getc_line | char* | |
mime_section | MIME_SECT* | Level 2 MIME scanning state. See HF_* constants above for the html field. |
mime_state | short | Level 1 MIME scanning state. See *_MIME constants above. |
multipart_separator | char* | Label to represent a MIME boundary in article display. |
tagattr | HTML_TAGS [] | The list of handled HTML tags. Must match TAG_* above. |
- int filter_html(char* t, char* f)
- t: pointer to “to” buffer
- f: pointer to “from” buffer
- Return value: purpose of the return value is unknown
Converts the HTML post in f into plain-text form and put it in f. Uses tag_action().
Note that the current code strips double and single quotes and looks at only the first 31 characters in a tag, so it’s not possible to handle alt texts, for example.
- mimecap_ptr(n)
- void mime_SetArticle()
Set up mime_article structure based on article's headers. The function manipulates the global variables htype, is_mime and multimedia_mime directly.
mime_SetArticle re-sets is_mime when Content-Transfer-Encoding is 7bit
or 8bit
(e.g. CJK).
I can’t see how this is justifiable given headers (esp. From and Subject) can still be QP-quoted,
but for some reason this seems to be fine.
- static char* tag_action(char* t, char* word, bool_int opening_tag)
- t: pointer to “to” buffer
- word: the first 31 characters inside the angle brackets
- opening_tag: TRUE if we’re on an opening tag, FALSE if we’re on a closing tag
- Return value: an updated pointer to (possibly a different spot in) the “to” buffer
Performs state manipulations based on the tag in word, including possibly modifying the “to” buffer in t, partly based on the tagattr array.