Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Docile.jl and Markdown.jl into Base #8514

Closed
quinnj opened this issue Sep 29, 2014 · 85 comments
Closed

Integrate Docile.jl and Markdown.jl into Base #8514

quinnj opened this issue Sep 29, 2014 · 85 comments
Labels
docs This change adds or pertains to documentation

Comments

@quinnj
Copy link
Member

quinnj commented Sep 29, 2014

Some good discussion started here.

This is to more formerly track integrating the necessary parts into Base since it seems some good consensus is building.

@one-more-minute
@MichaelHatherly

@ViralBShah
Copy link
Member

Also pinging @shashi @dcjones

@ViralBShah ViralBShah added the docs This change adds or pertains to documentation label Sep 29, 2014
@MichaelHatherly
Copy link
Member

I'd be happy to get going on this. Also pinging @stevengj since he's be the source of some great input here. I'll put together a PR over the next few days.

@ViralBShah
Copy link
Member

Also pinging @johnmyleswhite who originally suggested to me about Docile.jl as a good starting point. Also @dmbates has been looking for this.

@MikeInnes
Copy link
Member

@MichaelHatherly I recommend waiting until I've worked through overhauling Markdown.jl before finalising anything / setting up PRs. There will probably be some technical changes to work out within Docile/Markdown to get the string macros etc. working smoothly.

@MichaelHatherly
Copy link
Member

Yes, just saw your break-everything branch. I'll wait on your changes.

@JeffBezanson
Copy link
Member

Docile does look quite good.

I'm wondering about

  1. Maybe we should add some special syntax to avoid the ->
  2. How much can we cut down the dependencies?

For point (2), I feel the system should be as lazy as possible, just populating metadata with strings until interaction and display happen.

@MichaelHatherly
Copy link
Member

Special syntax would be nice. The -> is providing the LineNumberNode that I'm using to get file and line numbers for metadata. Would be great to retain that info.

Docile doesn't really have any dependencies, it's just harvesting strings and metadata. Lexicon's providing the presentation layer.

I agree about the laziness -- I don't have any hard numbers, but when I was parsing docstrings during @doc package loading was quite a bit slower.

@jakebolewski
Copy link
Member

Another thing that needs to be hashed out is what non-standard form of markdown we wish to support. Inline latex, tables, and cross references seem necessary.

@MichaelHatherly
Copy link
Member

@jakebolewski CommonMark [1, 2] looks reasonably promising. Inline math is a must-have feature -- not sure whether that would be part of the spec though.

[1] MichaelHatherly/Docile.jl#33
[2] http://jgm.github.io/stmd/spec.html

@IainNZ
Copy link
Member

IainNZ commented Sep 29, 2014

HttpServer now uses Docile (thanks to @astrieanna), which could be another interesting case study:
https://github.com/JuliaWeb/HttpServer.jl/blob/master/src/HttpServer.jl

@MichaelHatherly
Copy link
Member

That's cool, thanks @astrieanna. Guess I can't make breaking changes now!

@IainNZ
Copy link
Member

IainNZ commented Sep 29, 2014

Hah, doesn't stop anyone else ;)

@johnmyleswhite
Copy link
Member

I think this is the right way to go. I agree with Jeff's point that special syntax would make Docile nicer to work with.

@stevengj
Copy link
Member

CommonMark doesn't have any standard for embedded equations; see this discussion. Pandoc's $...$ + heuristic (opening $ can't be followed by whitespace, closing $ can't be followed by a digit or preceded by whitespace) seems like the most widely used at this point, and is what is used in Jupyter/IJulia.

@stevengj
Copy link
Member

My understanding in #3988 was always that there would eventually be a special syntax for this; macros are only for prototyping.

@MichaelHatherly
Copy link
Member

Was syntax ever agreed upon for this? Something along the lines of:

doc """
...
"""
function foo(x)

end

Where doc is a new keyword whose ending keyword is function, type, immutable etc.

Or just without the doc at all and any unassigned string above a documentable block of code is taken to be a docstring?

@StefanKarpinski
Copy link
Member

Jeff and I just talked about this today and a bare string literal in void context followed by a definition seems like the way to go. This should be lowered by the parser something like this:

"`frob(x)` frobs the heck out of `x`."

function frob(x)
  # commence frobbing
end

becomes the moral equivalent of this:

let doc = "`frob(x)` frobs the heck out of `x`."
  if haskey(__DOC__, :frob)
    __DOC__[:frob] *= doc
  else
    __DOC__[:frob] = doc
  end
end

function frob(x)
  # commence frobbing
end

Important points about this approach:

  1. parsing has no side-effects – the construction of the documentation structure still occurs when the code is actually evaluated, not when it is parsed.
  2. each module has its own const __DOC__ = Dict{Symbol,UTF8String} dictionary; this is important for reloading modules.
  3. This ends up just appending all the docs for a given name, including separate doc strings for a single generic function.

An open issue is how to handle adding methods to functions from other modules. Does the definition go into the current module's __DOC__ dict? What symbol is used for the doc key then?

@timholy
Copy link
Member

timholy commented Sep 30, 2014

Just to comment that it's super-exciting to see momentum on this. Looking forward to seeing what emerges.

@MichaelHatherly
Copy link
Member

Is using Symbol as the key type a necessary requirement? Does doing this not restrict the kind of things that can be documented -- namely individual Methods of a Function?

If

"`frob(x)` frobs the heck out of `x`."

function frob(x)
  # commence frobbing
end

is instead translated to

function frob(x)
  # commence frobbing
end

let doc = "`frob(x)` frobs the heck out of `x`."
  if haskey(__DOC__, frob)
    __DOC__[frob] *= doc
  else
    __DOC__[frob] = doc
  end
end

then you could use the Function/Method etc as the key instead of a Symbol -- some adjustments to the let-block not shown. Is this approach feasible?

For adding docs to methods that are being extended from those in a different module, I'd be in favour of adding them to the current module's __DOC__. I'd find it a bit odd if the docs I write for a method end up in a different module.

@MikeInnes
Copy link
Member

Stefan's proposal looks good to me, but +1 for being either being aware of methods properly or being limited to one docstring per function (as opposed to concatenating each successive dosctring regardless). Another way to do this might be something like

 __DOC__[:frob][(Int, String...)] = "`frob(x)` frobs the heck out of `x`."

function frob(x::Int, ys::String...)
# ...

i.e. indexing doc strings by type as well as name. Key points in this approach:

  1. The redefinition problem is handled at the function level rather than the module level, which means that
    1. Redefining functions/methods works in a sane way, as opposed to endlessly concatenating onto the existing doc string
    2. This will automatically make reloading modules do the expected thing too, so a module-local __DOC__ isn't necessary to solve that problem (though it might be useful for other reasons)
  2. It removes the dependency on the order of definitions. So you could do fancy things like making more doc strings for more general methods appear first.

(1.i) is my main concern – redefining functions messing up their own docs is something we could probably live with / work around / ignore, but if we can solve this early it will make for a much better interactive experience, I think.

@stevengj
Copy link
Member

Key problems with this approach:

  • As discussed elsewhere (e.g. MichaelHatherly/Docile.jl#29), there is no need for documentation objects to be a string; they can be any Julia object with the appropriate writemime methods. e.g. imagine a documentation object like docfromfile("foo.tex"). A doc keyword allows more generality here.
  • Docile currently allows additional metadata to be stored in the documentation, e.g. doc "frob(x) ..." { :section => "Frobnicators" }
  • Docile currently allows doc* vs. doc in order to distinguish documentation for a Function in general vs. documentation for a Method.

One possibility would be to make the doc keyword optional for string literals (including string macros like md"..."), but to allow it for more complicated documentation.

@shashi
Copy link
Contributor

shashi commented Sep 30, 2014

Documentation specific to argument signature is definitely better than concatenation, +1 for documentation being anything with a writemime method. A transformation like:

 __DOC__[:frob][(Int, String...)] = () -> "`frob(x)` frobs the heck out of `x`."

function frob(x::Int, ys::String...)
# ...

would also let us evaluate documentation objects only when they are needed. e.g. help(frob) could call the closure and cache the result.

@jakebolewski
Copy link
Member

I know that this is probably not a popular opinion but I really think we should consider using Restructured Text at least for the default markup in Base. It supports everything we will want (inline math / code, cross-links, tables, etc.), supports extensions in the standard for functionality we would want to add, and would allow us to reuse all the tooling in developed in the Python world (Sphinx, ReadTheDocs, etc.) which imo is the best out there.

Otherwise I see us developing yet another superset of Markdown to support our needs which may or may not be consumable by other tools. I guess if we pick a superset with better tooling support (such as PanDoc markdown with all the extensions) we might be able to mitigate this problem.

@JeffBezanson
Copy link
Member

These are good points. Having an API for this is key, as that will allow even more flexibility than a keyword. For fancy documentation needs, use the API instead of the special syntax.

It's probably also true that we'll want to associate docs with particular type signatures.

I think associating arbitrary metadata with every docstring is overengineering at this point. Where we are, we can't even ask for help for a simple function in a package.

@StefanKarpinski
Copy link
Member

ReStructured Text is awful. I wrote most of the original manual and writing it in Markdown was a pleasure. Writing documentation has been a painful chore ever since we switched from Markdown to RST. Having complicated formatting types for documentation is overkill and something that we can consider, if at all, only if there's strong evidence of a real need in practice. I don't think there will be any such need. There should be essentially no choice about documentation – the worst possible situation is one where everyone writes docs in their personal favorite format and there are a dozen of them. There should be one reasonable way to write docs that works well and that everyone is familiar with. What we generate during parsing should be simple and easy for the parser to construct – i.e. just strings – and these strings should look decent if you just show them as is. Markdown fits the bill perfectly – it is already (by design) how people intuitively markup plain text content.

@stevengj
Copy link
Member

@shashi, as I've discussed in the abovementioned Docile issue, the plan for typical documentation objects (e.g. Markdown text) is to store only the unparsed string when the file is loaded. Parsing of the AST, generation of HTML, etcetera, is only performed "lazily" when the help is requested in some format.

@jakebolewski, the choice of format is orthogonal to this feature if my suggestion is adopted. Markdown documentation would be md"..." (creating a MarkdownString object), Restructured Text would be rst"..." (creating a RestructuredText object), etc. Each would have appropriate writemime methods to generate text/html, text/latex, or whatever. We can argue about what format should be used in Base elsewhere.

@stevengj
Copy link
Member

@JeffBezanson, we absolutely have to have some kind of metadata if you want to have any possibility of generating offline documentation, because you can't just have a long list of 3000 functions in Base, sorted alphabetically. At the very least, you have to be able to mark what section and subsection of the manual they should appear in.

@StefanKarpinski
Copy link
Member

Let's cross that bridge when we get there.

@JeffBezanson
Copy link
Member

I am extremely hawkish about load time but I'm not really worried about the slowdown from metadata dicts on docstrings, for reasons that have been discussed already: (1) they're not all that slow, (2) not all docstrings will have them, (3) they can be shared among docstrings.

My main concern is getting something simple working first so we can have help and docs for packages ASAP. After that there are concerns about complexity and where various information should be stored, but we can continue to discuss that while enjoying the availability of package help :)

@ViralBShah
Copy link
Member

+1 to having something that works for packages asap.

@MichaelHatherly
Copy link
Member

Yes, +1 to having something vaguely like what's been discussed in this thread soon. I'm happy to adjust Docile to match whatever makes it into Base so that 0.3 packages can have documentation too.

@stevengj
Copy link
Member

I agree that we should get something asap, with the caveat that major flaws and disagreements should be things that are resolvable later without much breakage.

Adding documentation metadata is something that can be done later without breakage, because most docstrings won't have metadata so we will want an optional syntax anyway.

Changing "..." to md"..." if you want markdown-syntax docstrings will be a painful breakage to impose later.

@StefanKarpinski
Copy link
Member

Regarding the "..." vs md"..." change, if the default is markdown, then changing later is a matter of making markdown the default and allowing other formats optionally. It strikes me as weird to indicate the flavor of markup on a per-doc-string basis. Are you going to use lots of different markups in a single file or even a single project? I'm really not convinced that we'll ever need more than one.

@stevengj
Copy link
Member

@StefanKarpinski, note that we'll need a string macro anyway in order to easily use LaTeX equations in Markdown (otherwise you have to backslash like crazy).

@StefanKarpinski
Copy link
Member

That would be true if we couldn't change the parser ;-)

@astrieanna
Copy link
Contributor

I would prefer format-agnostic documentation (requiring only writemime).

I don’t think “getting something out fast” is affected by which of these we chose. Making something work for special Julia Markdown strings only vs. an equivalent MarkdownString type doesn’t seem like a big difference as far as implementation effort.

Forcing everyone to use the same format seems unfortunate. I agree with having a strong default (i.e. shipping and using only one format in base), but choosing not to support any other format is actively preventing anyone from ever using a different format. There is always some dissent about formats, and if someone strongly prefers rst for their project (for the toolchain, or whatever), then there’s no reason to actively prevent them from doing so.

An example of using different types of documentation in one package: some documentation might be in a separate file, so those functions would just like to refer to the file path & have the file actually read lazily. This could be accomplished with a different type (FileDocString or whatever) that behaves appropriately.

Allowing user-defined documentation formats would also allow users to define their own extensions to Julia Markdown -- and try them out without forcing them on anyone else or needing to modify the Julia parser.

@porterjamesj
Copy link
Contributor

FWIW I'm in violent agreement with @stevengj w.r.t allowing whatever system we end up with to store arbitrary metadata, not just strings. My impression is that the clojure community (e.g.) has benefited tremendously from this and built some really cool stuff (core.typed anyone?) on top of it, and it seems uncharacteristically restrictive (for what I see as the "Julian" attitude about this sort of thing) to not allow it.

@shashi
Copy link
Contributor

shashi commented Oct 1, 2014

What @porterjamesj said! Just learned about how Clojure does this: http://en.wikibooks.org/wiki/Learning_Clojure/Meta_Data - very neat! IMO a good implementation would make documentation a special case of a general mechanism to attach metadata to certain kinds of objects. (at least under the hood while providing sufficient syntactic sugar.)

ref: #3988

@porterjamesj
Copy link
Contributor

IMO a good implementation would make documentation a special case of a general mechanism to attach metadata to certain kinds of objects. (at least under the hood while providing sufficient syntactic sugar.)

which, unless I'm mistaken, is exactly what @stevengj has been arguing for.

@catawbasam
Copy link
Contributor

I like the idea of having "..." / """..."""be Julia's default Markdown, whatever flavor that is, so we and our tools don't have to think very hard about how to deal with basic comments.

I'd also like to see provision, even if just a placeholder for now, to add flexible metadata. Although most docs right now are either plain text or rich text, there are plenty of areas where a picture or equation would really help, and with tools like IJulia and Juno we already have much of the infrastructure required to serve rich help.

@stevengj
Copy link
Member

stevengj commented Oct 1, 2014

Note also that if we support attaching an arbitrary "documentation" object with output via writemime, then including dictionaries of metadata can be implemented on top of this. e.g. you can have a MetaDoc type that wraps the "actual" documentation object plus a Symbol=>Any dictionary of other metadata:

doc MetaDoc(md"My documentation...", [:author=>"SGJ", :status=>"buggy"])
foo(...) = ...

(Where, as I mentioned above, we probably need an optional doc keyword for any documentation object that is not a string literal or string macro.)

@johnmyleswhite
Copy link
Member

FWIW, I find @stevengj's suggestion really compelling. It seems much easier to make an initial pass that's very vague about what "should" go in a MetaDoc object and flesh it out, than to take a stricter rule about strings and later replace it with MetaDoc objects.

@carlobaldassi
Copy link
Member

That would be true if we couldn't change the parser ;-)

I'll just note as a minor point that using some kind of clue, like md or doc or whatever, would make things much simpler for editors' and IDE's highlighting, for properly displaying special characters, LaTeX etc., since we can't really expect editors to implement full-blown parsers. Maybe that could be mitigated by using "a string at global scope is documentation" as a proxy rule, but I suspect that could turn out messy.

@ivarne
Copy link
Member

ivarne commented Oct 4, 2014

We already have a concept that for creating new syntactic elements, and they are called macros and string macros. Having different rules for "escaping $variables and \newLatexFunctions in string literals" in different contexts would be inconsistent, making Julia more confusing.

I'd argue that two extra letters to type for Markdown parsing isn't a big problem. If you use markdown for formatting your documentation, you'll probably have a multiline doc, and two characters seem like a small annoyance.

I agree that it is poor style to mix different documentation formats in a single file, it might sometimes be useful. That way you can gradually change format in a file without having to fix all the issues at once. Usually design discussions in Julia has not been won by the argument "someone is going to use this feature to write horrible unreadable code".

@MikeInnes
Copy link
Member

I'd argue that two extra letters to type for Markdown parsing isn't a big problem. If you use markdown for formatting your documentation, you'll probably have a multiline doc, and two characters seem like a small annoyance.

I have to disagree with this. Firstly, a lot of docstrings are likely to look like

"`push!(object, x)`: Append x to the object."

i.e. not multiline.

That said, it's not really about the two character overhead. The fact is that most people will use the most the most convenient documentation form available, so defaulting to plain docstrings amounts to endorsing them.

I'm all for supporting richer formats (tex"" etc.) but supporting both plain and rich docs doesn't make much sense – markdown opens a lot of opportunities (nice presentation, syntax highlighting, structured information etc.) without making things more cumbersome, so we should encourage people to use it over plain text as much as possible. Treating "..." docstrings as md"..." is a very simple and effective way to do that.

@RauliRuohonen
Copy link

we should encourage people to use it over plain text as much as possible. Treating "..." docstrings as md"..." is a very simple and effective way to do that.

This is a good idea. One of the problems with Python docstrings is that they are plain text, and you can't get people to use anything else unless it's endorsed by the language implementation. TIMTOWTDI leads to everyone using the lowest common denominator, i.e. plain text. Unambiguously going with one default markup language in Julia makes it better. Markdown is a good choice, especially as IJulia is the de facto "more than plaintext" display environment for Julia.

@tonyhffong
Copy link

Putting myself in the loop to make sure Lint can check through doc string correctly.

@ivarne
Copy link
Member

ivarne commented Oct 4, 2014

I think that rather the problem with Python docstrings is that there is no standard way of specifying the format. That means that when you aggregate documentation from docstrings, you have to guess the format, and computers are bad at guessing, so the feature is little used.

@one-more-minute Maybe that is a valid case, but if I want to save characters to type I'll rather not have to repeat the signature inside the docstring, but have it automatically captured from the actual signature on the next line.

@stevengj
Copy link
Member

stevengj commented Oct 9, 2014

By the way, another reason to support (a) plain-text strings and (b) non-literal documentation strings is importing help from other languages.

e.g. in PyPlot I define various functions which are wrappers around Python functions, and I want their help to be automatically imported from the Python docstring (which is plain text). If we have a doc keyword (or @doc) that supports arbitrary Julia expressions, and allows plain-text strings, this will be easy:

const bar_py = pyplot["bar"]
doc convert(String, bar_py["__doc__"])
function bar(...)
end

Note also that if you make the doc macro automatically interpret string literals as Markdown, but which interprets string-valued expressions as plain text, then you will get different results for:

doc "*foo*" foo(x) = ...
# versus:
const foodoc = "*foo*"
doc foodoc foo(x) = ...

Whereas if you interpret "..." consistently as a plain-text string, and require md"..." for Markdown, the behavior is a bit more comprehensible.

@ViralBShah
Copy link
Member

Just checking in here to see if we have something usable to start with. Are we still waiting on Markdown.jl?

@MikeInnes
Copy link
Member

Markdown.jl is already in that other PR (which is good to go as far as I'm concerned, though I'm happy to make any changes if I've missed anything of course).

@MikeInnes
Copy link
Member

Oh yeah we can totally close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests