The Weary Travelers

A blog for computer scientists


Date: 2023-08-06
Author: Suhail
Suhail.png

How to export org-mode to HTML

In a previous post, one notable omission on ways to generate static websites was to use something that isn’t a static-site generator by itself, but has the capabilities of static-site generation.1See is-a vs has-a. In this post we will look at one such candidate: specifically, org-mode.2Org-mode is a package that is distributed as part of GNU Emacs.

In this article we will limit ourselves to converting a single org file to HTML and not make drastic changes to how various syntactic constructs are converted to HTML. As such the below will be presently out of scope.3These will be addressed in subsequent articles.

This article uses the following version of org-mode:

9.6.7

Let’s dig in!

What is org-mode?

org-mode is an overloaded term. The term can be used to refer to:

  • either the org-mode package in Emacs, or
  • the file syntax that org-mode the package can parse, or
  • the interactive major-mode that the org-mode package in Emacs provides when you open a file in org-mode syntax

In this post we’ll refer to the syntax as org5Or, org syntax. and we’ll refer to the package as org-mode. Interactive capabilities of the major-mode are out of scope of this article, but they are numerous and worth exploring if the reader is unaware. To make the comparison with static-site generators easier, below is a summary in a format we’ve used previously.

Org-mode6 Extensibility, both for general markup and specifically for syntax highlighting requires Emacs Lisp knowledge.

  • Written in Emacs lisp
  • License: GPL-3.0-or-later
  • Initial release: 20037Initial release of the HTML-export functionality seems to have been in 2011.
  • Stable release: 2023
  • Input formats
  • Supported frontend frameworks
  • Themes
  • Syntax highlighting via Emacs’ major-modes
  • Support for migration to Org

Convert org to HTML

I.e., if so desired13But, really, why? the reliance on Emacs can be reduced to the process of converting to HTML.

For instance, given an org file with the following content:

#+TITLE: A weary traveler who is lost

* What they might look like
#+caption: A weary traveller
[[../../../static/old-man-suitcase.png]]

* What they might be thinking
#+begin_verse
I grow wearier...
Possibly I am lost, but
I am not yet done.

-- Suhail
#+end_verse

It can be exported using the below command:14Where $file.org stands for name of the file.

emacs --batch --no-init-file --file=$file.org --eval "(progn (require 'org) (setq org-export-allow-bind-keywords t) (org-html-export-to-html))"

The above command invokes Emacs in “batch” mode, ensuring that it doesn’t process any custom initialization15With --not-init-file., opens the org file in question and then specifies that the following Emacs Lisp code be executed16Via the --eval argument. which:

  • Ensures that org-mode is loaded.17Not strictly necessary, since org-mode is included in recent Emacs and would be loaded by visiting the org file.
  • Enables the use of the BIND keyword.
  • Exports the loaded org file to HTML.

The generated HTML is as follows:

Configure the export process

In this section, we will practically see how to alter the export process to meet our needs. We will focus on two aspects of the auto-generated HTML:

Configure the Table of Contents

The table of contents includes all headlines in the document. Its depth is therefore the same as the headline levels in the file. If you need to use a different depth, or turn it off entirely, set the org-export-with-toc variable accordingly. You can achieve the same on a per file basis, using the following toc item in OPTIONS keyword.

Using the keyword syntax, we can turn off the table of contents entirely:

#+OPTIONS: toc:nil

#+TITLE: A weary traveler who is lost

* What they might look like
#+caption: A weary traveller
[[../../../static/old-man-suitcase.png]]

* What they might be thinking
#+begin_verse
I grow wearier...
Possibly I am lost, but
I am not yet done.

-- Suhail
#+end_verse

Which, upon export, will result in the below HTML:

Using the num export keyword, we can also toggle section-numbers:

#+OPTIONS: num:nil

#+TITLE: A weary traveler who is lost

* What they might look like
#+caption: A weary traveller
[[../../../static/old-man-suitcase.png]]

* What they might be thinking
#+begin_verse
I grow wearier...
Possibly I am lost, but
I am not yet done.

-- Suhail
#+end_verse

Which, upon export, will result in the below HTML:

We could also combine the above settings:

#+OPTIONS: toc:nil
#+OPTIONS: num:nil

#+TITLE: A weary traveler who is lost

* What they might look like
#+caption: A weary traveller
[[../../../static/old-man-suitcase.png]]

* What they might be thinking
#+begin_verse
I grow wearier...
Possibly I am lost, but
I am not yet done.

-- Suhail
#+end_verse

Which, upon export, will result in the below HTML:

We can also tweak the text in the table of contents entry using property syntax:

Normally Org uses the headline for its entry in the table of contents. But with ALT_TITLE property, a different entry can be specified for the table of contents.

#+TITLE: A weary traveler who is lost

* What they might look like
:PROPERTIES:
:ALT_TITLE: Looks
:END:
#+caption: A weary traveller
[[../../../static/old-man-suitcase.png]]

* What they might be thinking
:PROPERTIES:
:ALT_TITLE: Thoughts
:END:
#+begin_verse
I grow wearier...
Possibly I am lost, but
I am not yet done.

-- Suhail
#+end_verse

Which, upon export, will result in the below HTML:

We can, if we so desire, alter the placement of the table of contents as well:

Org normally inserts the table of contents directly before the first headline of the file. To move the table of contents to a different location, first turn off the default with org-export-with-toc variable or with #+OPTIONS: toc:nil. Then insert #+TOC: headlines N at the desired location(s).

Configure the Postamble

However, there are some export settings that require us to alter some elisp18Emacs Lisp. variables. For instance, in order to alter the postamble we have to:

Set org-html-preamble to a string to override the default format string.

[…]

The above also applies to org-html-postamble and org-html-postamble-format.

In order to set these variables during the export process, we have to use the BIND keyword.

After consulting the source code, we can now modify the postamble:

#+BIND: org-html-postamble "<p class=\"author\">Author: %a</p>\n<p class=\"date\">Date: %d</p>"
#+AUTHOR: Suhail
#+DATE: [2023-08-06 Sun]
#+BIND: org-html-metadata-timestamp-format "%F"

#+OPTIONS: toc:nil
#+OPTIONS: num:nil

#+TITLE: A weary traveler who is lost

* What they might look like
#+caption: A weary traveller
[[../../../static/old-man-suitcase.png]]

* What they might be thinking
#+begin_verse
I grow wearier...
Possibly I am lost, but
I am not yet done.

-- Suhail
#+end_verse

Which, upon export, will result in the below HTML:

When the documentation is insufficient

While org-mode is quite well documented, it also has very many ways of configuring different aspects of it, including the export process. There are times when the documentation proves to be insufficient. In those moments, we have to look at the documentation in the source code.

For exporting org to HTML, there are two places of interest:

For instance, to figure out how to modify how timestamps are formatted we peruse through the options in ox-html till we come across the line with the option :html-metadata-timestamp-format:

160: (:html-metadata-timestamp-format nil nil org-html-metadata-timestamp-format)

The format of the above options is the same as that for org-export-options-alist. After the property name, the values, in order, are:

  • KEYWORD: A string which denotes the keyword that sets the value of this property.
  • OPTION: A string which denotes how to set the value of this property via the OPTIONS keyword.
  • DEFAULT: The default value of the property and also the variable whose value can be used to alter the value of the property.
  • BEHAVIOR: How to handle multiple keywords, when possible, for the same property. If not provided, the default behaviour is to keep the first value.

The documentation for org-html-metadata-timestamp-format confirms our hypothesis.

"Format used for timestamps in preamble, postamble and metadata. See `format-time-string' for more information on its components."

The referenced function format-time-string is an Emacs function documented here. Among other things it notes:

%F

This stands for the ISO 8601 date format, which is like %+4Y-%m-%d except that any flags or field width override the + and (after subtracting 6) the 4.

Equipped with this information we are now finally able to customize the postamble to meet our needs.

Reference: org syntax cheatsheet

Org is primarily about organizing and searching through your plain-text notes. However, it also provides a lightweight yet robust markup language for rich text formatting and more.

Unlike Markdown,22Which refers to a collection of similar syntaxes. org is a single syntax.23Similar to reStructuredText. While the Orgmode website does an excellent job of documenting the details, below we summarize some of the highlights.

Metadata syntax

The “metadata syntax” corresponds to the “and more” part of the above comment. In the present context,24That of exporting org-mode to HTML. this syntax is used to affect the export process.

Comment syntax

Lines starting with zero or more whitespace characters followed by one # and a whitespace are treated as comments and, as such, are not exported.

Likewise, regions surrounded by #+BEGIN_COMMENT#+END_COMMENT are not exported.

Keyword syntax

Keywords are structured according to the following pattern:

#+KEY: VALUE

KEY A string consisting of any non-whitespace characters, other than call (which would forms a babel call element). VALUE A string consisting of any characters but a newline.

Some notable keywords of relevance to the export process:

Property syntax

Properties are key–value pairs. When they are associated with a single entry or with a tree they need to be inserted into a special drawer (see Drawers) with the name PROPERTIES, which has to be located right below a headline, and its planning line (see Deadlines and Scheduling) when applicable. Each property is specified on a single line, with the key—surrounded by colons—first, and the value after it. Keys are case-insensitive. Here is an example:

* CD collection
** Classic
*** Goldberg Variations
    :PROPERTIES:
    :Title:     Goldberg Variations
    :Composer:  J.S. Bach
    :END:

Rich-text syntax

Links

The general link format, … looks like this:

[[LINK][DESCRIPTION]]

or alternatively

[[LINK]]

Additionally, several ways of defining internal links25I.e., within a file such as foo.org. are supported:

  • [[#my-custom-id]] will point to a node with CUSTOM_ID property set to my-custom-id.
  • [[*My section]] will point to a headline with the name My section.
  • [[my target]] will first try and look for (and match to) an occurrence of <<my target>> in the file; if none found, it’ll try and match to an element with the NAME set to my target.

Paragraphs and text formatting

Paragraphs are separated by at least one empty line. If you need to enforce a line break within a paragraph, use \\ at the end of a line.

In addition, there also ways to represent blocks of text that preserve line-breaks26Verse block., quote a passage from another document27Quote block., and centering some text.28Center block.

Text can also be italicized etc:

You can make words *bold*, /italic/, _underlined_, =verbatim= and ~code~, and, if you must, +strike-through+. Text in the code and verbatim string is not processed for Org specific syntax; it is exported verbatim.

[…]

Sometimes, when marked text also contains the marker character itself, the result may be unsettling… You can use zero width space29Unicode 0x200B. to help Org sorting out the ambiguity.

You can also have superscripts and subscripts:

^ and _ are used to indicate super- and subscripts. To increase the readability of ASCII text, it is not necessary, but OK, to surround multi-character sub- and superscripts with curly braces.

And horizontal lines:

A line consisting of only dashes, and at least 5 of them, is exported as a horizontal line.

As well as specify custom HTML attributes:

Org files can also have special directives to the HTML export back-end. For example, by using #+ATTR_HTML lines to specify new format attributes30Such as, CSS class, inlined style etc. to31Including, but not limited to. <a> or <img> tags.

Footnotes​

Two kinds of footnotes are supported:

  • Anonymous footnotes32I.e., where the definition is inlined at the point of reference.
  • Named footnotes33Which may be referenced multiple times.

An inline footnote is as follows:

Some text[fn::An inline footnote.] and then some more text after the footnote.

Whereas a named footnote:

… is started by a footnote marker in square brackets in column 0, no indentation allowed. It ends at the next footnote definition, headline, or after two consecutive empty lines. The footnote reference is simply the marker in square brackets, inside text. Markers always start with fn:. For example:

The Org website[fn:55] now looks a lot better than it used to.
...
[fn:55] The link is: https://orgmode.org

Figures and captions

An image is a link to an image file that does not have a description part, for example

file:./img/cat.jpg

Equivalently, we may also have:

[[./img/cat.jpg]]

We can also add captions:

#+CAPTION: my caption
[[./img/cat.jpg]]

And customize the styling of it:

#+ATTR_HTML: :width 300px
#+CAPTION: my caption
[[./img/cat.jpg]]

Tables

Any line with | as the first non-whitespace character is considered part of a table. | is also the column separator. Moreover, a line starting with |- is a horizontal rule. It separates rows explicitly. Rows before the first horizontal rule are header lines.

The width of columns is automatically determined by the table editor. The alignment of a column is determined automatically from the fraction of number-like versus non-number fields in the column.

[…]

To set the width of a column, one field anywhere in the column may contain just the string <N> where N specifies the width as a number of characters.

[…]

If you would like to overrule the automatic alignment of number-rich columns to the right and of string-rich columns to the left, you can use <r>, <c> or <l> in a similar fashion. You may also combine alignment and field width like this: <r10>.

LaTeX and special symbols

And Greek letters:

You can use LaTeX-like syntax to insert special symbols—named entities—like \alpha to indicate the Greek letter34α, or \to to indicate an arrow35… If you need such a symbol inside a word, terminate it with a pair of curly brackets.

[…]

During export, these symbols are transformed into the native format of the exporter back-end. Strings like \alpha are exported as &alpha; in the HTML output…

One can also embed LaTeX:36Which, by default, when exported to HTML will use Mathjax, but can also be configured to transcode math into images.

LaTeX fragments do not need any special marking at all. The following snippets are identified as LaTeX source code:

  • Environments of any kind.37When MathJax is used, only the environments recognized by MathJax are processed. When dvipng, dvisvgm, or ImageMagick suite is used to create images, any LaTeX environment is handled. The only requirement is that the \begin statement appears on a new line, preceded by only whitespace.
  • Text within the usual LaTeX math delimiters. To avoid conflicts with currency specifications, single $ characters are only recognized as math delimiters if the enclosed text contains at most two line breaks, is directly attached to the $ characters with no whitespace in between, and if the closing $ is followed by whitespace, punctuation or a dash. For the other delimiters, there is no such restriction, so when in doubt, use \(...\) as inline math delimiters.

Source code38What org-mode refers to as “Literal examples”.

Source code can be embedded using #+BEGIN_SRC and #+END_SRC delimiters. When done so, the code will be highlighted based on the configured syntax highlighting in Emacs. A consequence of this fact is that simply by adding syntax highlighting capabilities to your editor,39Assuming you use Emacs as your editor. one can get syntax highlighting in the exported output.40Using the htmlize Emacs package for HTML output format. For monospace content, the #+BEGIN_EXAMPLE and #+END_EXAMPLE delimiters can be used instead. Additionally,

Both in example and in src snippets, you can add a -n switch to the end of the #+BEGIN line, to get the lines of the example numbered. The -n takes an optional numeric argument specifying the starting line number of the block. If you use a +n switch, the numbering from the previous numbered snippet is continued in the current one. The +n switch can also take a numeric argument. This adds the value of the argument to the last line of the previous block to determine the starting line number.

There’s also the ability to link to specific lines in the source code as well as the ability to highlight the specific line in the code example when hovering over a reference in the generated HTML.

Comments

Comments can be left on twitter, mastodon, as well as below, so have at it.

To view the Giscus comment thread, enable Giscus and GitHub’s JavaScript or navigate to the specific discussion on Github.

Footnotes:

2

Org-mode is a package that is distributed as part of GNU Emacs.

3

These will be addressed in subsequent articles.

4

Deriving from, in this case, ox-html.

5

Or, org syntax.

6

Extensibility, both for general markup and specifically for syntax highlighting requires Emacs Lisp knowledge.

7

Initial release of the HTML-export functionality seems to have been in 2011.

8

And likely incompatible with the latest release.

9

In fact, we use it in this blog.

10

A CSS theme inspired by Edward Tufte’s books and handouts which, among other things, supports notes in the right margin.

12

Specifically, ox-html.

13

But, really, why?

14

Where $file.org stands for name of the file.

15

With --not-init-file.

16

Via the --eval argument.

17

Not strictly necessary, since org-mode is included in recent Emacs and would be loaded by visiting the org file.

19

For different versions of org, alter the git tag as needed.

21

I.e., from here to here.

22

Which refers to a collection of similar syntaxes.

24

That of exporting org-mode to HTML.

25

I.e., within a file such as foo.org.

29

Unicode 0x200B.

30

Such as, CSS class, inlined style etc.

31

Including, but not limited to.

32

I.e., where the definition is inlined at the point of reference.

33

Which may be referenced multiple times.

34

α

35

36

Which, by default, when exported to HTML will use Mathjax, but can also be configured to transcode math into images.

37

When MathJax is used, only the environments recognized by MathJax are processed. When dvipng, dvisvgm, or ImageMagick suite is used to create images, any LaTeX environment is handled.

38

What org-mode refers to as “Literal examples”.

39

Assuming you use Emacs as your editor.

40

Using the htmlize Emacs package for HTML output format.