Skip to content

Latest commit

 

History

History
105 lines (75 loc) · 3.61 KB

UriFilenamePrefixer.adoc

File metadata and controls

105 lines (75 loc) · 3.61 KB

UriFilenamePrefixer

Problem description

An imaginary software application processes HTML documents of a documentation ensemble containing multiple other HTML documents.

The ensemble originates from outside the application and is provided to the application by the client’s staff using commercially available documentation software.

The documents can be virtually connected in a tree-like manner, but are not physically stored hierarchically.

A document contains references to other HTML documents of the ensemble.

Because of the processing of those "raw" documents in the application, it became necessary to prefix the file name segment of those URIs with a given value.

Example structure of the ensemble
/
|-- html
|   |-- doc1.html (Document 1)
|   +-- doc2.html (Document 2)
+-- img
    |-- img1.png
    +-- img2.png

Tasks

Given doc1.html references doc2.html
When prefixing ensemble document references with my_prefix
Then doc1.html contains the reference my_doc2.html.

Questions

  1. Should you care about edge cases?

    Examples:

    • anchored ensemble references (e.g. doc3.html#section-a)

    • fully-qualified relative references (e.g. ./doc2.html)

    • traversing relative references (e.g. ../doc3.html)

    • absolute references to HTML documents (e.g. https://example.org/foo.html)

    • JavaScript references (e.g. javascript:alert('foo.html'))

    • references with non-compliant referenes (e.g. doc2<.html)

Hints

  1. It’s not necessary to create a fully fledged HTML fixture, use direct examples for references to stimulate the system under test.

  2. Assume the reference passed to the prefixer points to an HTML document, no need to filter non-.html-references.

  • RFC 3986 - Uniform Resource Identifiers (URI): Generic Syntax

Diary

While reviewing the solution to this problem, I encountered the following:

  • the content of the references were manipulated by primitive string operations

  • no usage of standard library APIs

  • happy-path testing

The solution handled direct, same-dir references without problems but tangled up references pointing to external HTML resources (URLs) and traversing references.

Examples:

So I came up with some additional edge-case tests and refactored the solution to make use of standard library APIs (URI).

At the review meeting it was argued that

  • no external URLs were ever seen in HTML documents,

  • references with path traversal (../) and JavaScript URIs (javasript:…​) were considered invalid HTML

In my opinion the first argument violates the "Assertive Programming" principle of pragmatic programming. The code simply did not reflect this statement.

The second argument revealed a worrying knowledge gap.

Personal conclusion

I associate two important insights with this experience:

  1. I need to prepare better for incomplete knowledge.

  2. I need to learn to acknowledge pragmatic solutions but also try to better communicate concerns about correctness and completeness.