The Weary Travelers

A blog for computer scientists


Date: 2023-09-03
Author: Suhail
Suhail.png

How to overcome syntactic limitations in org-mode

HTML markup is expressive, but verbose. It has given rise to several lightweight markup languages that can be exported to HTML.1E.g., Markdown family of syntaxes, reStructuredText, AsciiDoc etc.

Each lightweight markup language trades off expressivity and readability. As such not every valid HTML construct may be directly expressible in every lightweight markup language. When describing tabular information using org syntax, for instance, the below isn’t directly expressible:

As a contrast, tables in reStructuredText can contain nested tables as well as lists:

+------------+------------+----------------------+
| Header 1   | Header 2   | Header 3             |
+============+============+======================+
| body row 1 | column 2   | column 3             |
+------------+------------+----------------------+
| body row 2 | Cells may span columns.           |
+------------+------------+----------------------+
| body row 3 | Cells may  | - Cells              |
+------------+ span rows. | - can contain        |
|            |            |                      |
| body row 4 |            |    - lists           |
|            |            |                      |
|            |            | - with some nesting  |
+------------+------------+----------------------+

Can we selectively make use of reStructuredText’s syntax within an org-mode file that we intend to export to HTML?

Evaluation of code blocks

org-mode is much more than simply a markup language, and there’s a principled way of leveraging some of its capabilities in order to escape this syntactic limitation.2Specifically, by defining evaluation of a code block in a specific markup language as exporting it to a particular output format, one can overcome any syntactic limitations that org-mode may pose.

In addition to its abilities to export to different output formats via exporter back-ends, org-mode also has extensive literate programming capabilities, such as the ability to extract and evaluate code blocks in different languages.

An important feature of Org’s management of source code blocks is the ability to pass variables, functions, and results to one another using a common syntax for source code blocks in any language. Although most literate programming facilities are restricted to one language or another, Org’s language-agnostic approach lets the literate programmer match each programming task with the appropriate computer language and to mix them all together in a single Org document. This interoperability among languages explains why Org’s source code management facility was named Org Babel by its originators, Eric Schulte and Dan Davison.

In order to define how to interpret a code block in a language <lang>, we need to define a function named org-babel-execute:<lang>. Additionally, the convention is to define this function in a package named ob-<lang>.3Specifically, the latter convention is what org-babel-do-load-languages depends on for enabling evaluation support.

But what is export, if not simply a particular kind of evaluation?

Exporting as an instance of evaluation

The python3-docutils package provides rst2html4 which is a commandline utility that allows one to export rst to HTML. Thus all we need to do in org-babel-execute:rst is to invoke rst2html4 with the appropriate parameters and we’re done.

rst2html4, by default, generates the full HTML page. However, this behaviour can be modified by passing in an explicit --template parameter.

--template=<file>       Template file. (UTF-8 encoded, default:
                        "/usr/lib/python3.11/site-
                        packages/docutils/writers/html4css1/template.txt")

By providing the below template4Obtained by trial-and-error on the default template. we are able to instruct rst2html4 to only generate the content in the <body> tag.

%(body_pre_docinfo)s
%(docinfo)s
%(body)s

We will additionally wrap the generated HTML output in a <div> element with custom classes to allow for styling to be configured.

(require 'org-macs)
(require 'ob)
(require 'ob-dot) ;; we reuse `org-babel-expand-body:dot'

;;; completion support during interactive use
(defconst org-babel-header-args:rst '((class . :any) (cmd . :any))
  "RST-specific header arguments.")

;;; main/essential code
(defvar org-babel-default-header-args:rst
  '((:results . "html") (:class . "") (:cmd . "rst2html4"))
  "Default arguments to use when evaluating an RST source block.")

(defun org-babel-execute:rst (body params)
  "Define execution of an `rst-mode' block as exporting.
BODY is the `rst-mode' code block and PARAMS are the header
arguments.

This function defines an additional header-argument `:class'
which defines additional classes that need to be added to the
wrapping element when exporting to HTML.

Exporting to outputs other than HTML, while possible, isn't yet
implemented."
  (let ((results (split-string (cdr (assq :results params)))))
    (cond
     ((member "html" results)
      (let* ((classes (cdr (assq :class params)))
             (cmd (cdr (assq :cmd params)))
             (template (org-babel-temp-file "rst-" ".txt"))
             (cmdline (format "--template=%s" (org-babel-process-file-name template)))
             (coding-system-for-read 'utf-8)
             (coding-system-for-write 'utf-8)
             (in-file (org-babel-temp-file "rst-" ".rst"))
             (cmdstring (concat cmd
                                " " cmdline
                                " " (org-babel-process-file-name in-file))))
        (with-temp-file template
          (insert "%(body_pre_docinfo)s\n%(docinfo)s\n%(body)s"))
        (with-temp-file in-file
          (insert (org-babel-expand-body:dot body params)))
        (format "<div class='%s-snippet %s'>\n %s </div>"
                cmd classes
                (org-babel-eval cmdstring ""))))
     ((member "latex" results )
      (error "LaTeX export of RST block not yet implemented"))
     (t (error "Result format not supported")))))

(defun org-babel-prep-session:rst (_session _params)
  "Return an error because RST does not support sessions."
  (error "RST does not support sessions"))

(provide 'ob-rst)

Defining ob-rst.el as above, while necessary, isn’t sufficient by itself. We also have to enable code evaluation for rst.

By default, only Emacs Lisp is enabled for evaluation. To enable or disable other languages, customize the org-babel-load-languages variable either through the Emacs customization interface, or by adding code to the init file as shown next.

In this example, evaluation is enabled for Emacs Lisp as well as reStructuredText.

(org-babel-do-load-languages
 'org-babel-load-languages
 '((emacs-lisp . t)
   (rst . t)))

Using the above, we are able to use rst syntax to define richer tables and have them be converted to HTML automatically during export. For instance, this rst-mode snippet in an org file:

#+begin_src rst :exports results :eval yes
  +------------+------------+----------------------+
  | Header 1   | Header 2   | Header 3             |
  +============+============+======================+
  | body row 1 | column 2   | column 3             |
  +------------+------------+----------------------+
  | body row 2 | Cells may span columns.           |
  +------------+------------+----------------------+
  | body row 3 | Cells may  | - Cells              |
  +------------+ span rows. | - can contain        |
  |            |            |                      |
  | body row 4 |            |    - lists           |
  |            |            |                      |
  |            |            | - with some nesting  |
  +------------+------------+----------------------+
#+end_src

Results in the following HTML table being generated.

Header 1 Header 2 Header 3
body row 1 column 2 column 3
body row 2 Cells may span columns.
body row 3 Cells may span rows.
  • Cells

  • can contain

    • lists
  • with some nesting

body row 4

Conclusion

When using org-mode as a lightweight markup language, if a syntactic limitation is encountered,5Which is not inherent to the output format being exported to. the remedy is straightforward.

  1. Identify a lightweight markup language more suited to the task.
  2. Define an org-babel-execute:<lang> function6In ob-<lang>.el. which exports code in language <lang> to the desired output format.7 Pandoc may be used if a native conversion facility doesn’t exist.
  3. Enable <lang> via org-babel-do-load-languages.8 Step 4: Profit.

That’s it. That’s the idea.

Comments

Comments can be left on twitter, mastodon, as well as below, so have at it.

To view the Giscus comment thread, enable Giscus and GitHub’s JavaScript or navigate to the specific discussion on Github.

Footnotes:

2

Specifically, by defining evaluation of a code block in a specific markup language as exporting it to a particular output format, one can overcome any syntactic limitations that org-mode may pose.

3

Specifically, the latter convention is what org-babel-do-load-languages depends on for enabling evaluation support.

4

Obtained by trial-and-error on the default template.

5

Which is not inherent to the output format being exported to.

6

In ob-<lang>.el.

7

Pandoc may be used if a native conversion facility doesn’t exist.