Markdown tables are hellish:
- You cannot write text in cells longer than a couple of words, and even less a list.
- If the dialect allows paragraph 1, it is inconvenient to format.
- If the cells are not aligned, the table cannot be read.
- There is no support for similar tables and automation, such as line numbering.
It's time to write a filter for Pandoc that draws tables from structured YAML, with row numbering, horizontal orientation, graph templates, and at the same time figure out how to write Lua filters.
I usually write texts in Markdown and convert to the target format using Pandoc. This is a program that converts documents between formats, for example, from Markdown you can get HTML, and another dialect of MD, and DOCX, and PDF (more than 30 input and more than 50 output formats). Pandoc Markdown has many convenient extensions for links, footnotes, signatures, formulas.
Pandoc works as a composition of functions (it would have been written in Haskell): a specific input format β abstract representation of a document β a specific output format. An abstract representation can be changed using filters written in Lua. Filters do not need to know about the output format, but they can take it into account.
Our filter will search for abstract blocks of code in the conditional language table
, read YAML inside them and generate abstract representations of tables that Pandoc itself will produce in the target format.
pandoc --lua-filter table.lua input.md -o output.html
What are the alternatives and why are they worse?
- HTML tables work only in Markdown and are converted only to HTML; only the problem of rich formatting in cells is solved.
- Table generators require switching from a text editor, it is inconvenient to edit the contents of the cells in them ( example ).
- Editors plugins ( Emacs Org-Mode , VIM plugins ) are not universal and not always accessible.
In contrast, pandoc-crossref
and all Pandoc buns work with the filter for summary tables. The filter can also be used to generate standard Markdown tables by specifying the appropriate output format. Of the disadvantages:
- Cells cannot be merged; Pandoc does not support this (yet).
- For horizontal tables, stylization has to be done using the output format, for example, through CSS.
The description of the table includes three parts:
Table structure
An ordered list of graphs (columns):
- At a minimum, a column should have a title.
- In order to be able to rearrange the columns without touching the data, the record attribute displayed in the column (
id
) must be specified. - Special columns do not have id, but have a description of how to fill them. First you need a serial number (
special: number
). - Column alignment (
align
).
Also, the table can be vertical or horizontal (
orientation
). In the latter case, the graphs will be rows.
Table properties: ID for links (
id
) and signature (caption
). Pandoc allows you to sign tables, but not blocks of code.
Data in the form of an array of YAML dictionaries.
The structure can be common to several tables, so you can describe it both directly with the table, and once in the metadata (front-matter), and then refer to the named template.
Implementation Plan:
From the metadata of the document we form a dictionary of templates.
For each block of code with the class
table
:
- We parse the YAML tables.
- If a template is specified, we take it from the dictionary, otherwise we fill out the template from YAML.
- We fill in the individual properties of the table from YAML.
- We form the table entries from YAML (the record is a row in a regular table or a column in a horizontal one).
- We "draw" a table according to a template, properties and records.
The upper level is implemented as written (all code is available at the link at the end of the article):
function Pandoc(doc) local meta_templates = doc.meta['table-templates'] if meta_templates then for name, value in pairs(meta_templates) do templates[name] = parse_template(value) end end local blocks = pandoc.walk_block(pandoc.Div(doc.blocks), { CodeBlock = create_table }) return pandoc.Pandoc(blocks, doc.meta) end
The parse_template()
function slightly converts the metadata format. Pandoc represents their values ββas MetaBlock
and MetaInline
. Either simple lines are made of them pandoc.utils.stringify()
function (for example, orientation), or visual elements (for example, a block of text in the column heading).
About debugging. There are many examples in the Pandoc documentation, but types are not described in great detail. For debugging filters, it is convenient to have a variable dump function. Serious libraries print too many details, I prefer one of the simple options .
local function to_inlines(content) if content == nil then return {} elseif type(content) == 'string' then return {pandoc.Str(content)} elseif type(content) == 'number' then return to_inlines(tostring(content)) elseif content.t == 'MetaInlines' then inlines = {} for i, item in ipairs(content) do inlines[i] = item end return inlines end end local function to_blocks(content) if (type(content) == 'table') and content.t == 'MetaBlocks' then return content else return {pandoc.Plain(to_inlines(content))} end end
The create_table()
function is called for each block of code in triple backtics.
We are only interested in code blocks βin the languageβ of table
:
if not contains('table', block.classes) then return block end
To parse YAML inside a code block, we create a document consisting only of YAML metadata, parse it with Pandoc and leave only metadata:
local meta = pandoc.read('---\n' .. block.text .. '\n---').meta
Next, from meta
link to a template or table structure and properties of a specific table is read.
The fill_table()
function reads from meta
data for the attributes specified in the description of the graph. At the same stage, if the column is marked as special, its contents are generated:
local data = {} for i, serie in ipairs(template.series) do if serie.special == 'number' then data[i] = to_blocks(#datum + 1) else data[i] = to_blocks(item[serie.id]) end end
The format_table()
function forms the resulting array of cells depending on the orientation of the table and creates an abstract table object. It should be noted that if the widths or headers should be set for all columns or for none, otherwise Pandoc will simply not create a table.
You can put the finished script into ~/.local/share/pandoc
(the ~/.local/share/pandoc
data directory ) to access it by name from anywhere.
PS
As for accounting for the output format with filters. For example, I write spoilers in Pandoc like this:
::: {.spoiler title=""} . :::
There are no spoilers in the Pandoc document model, so the filter should produce raw blocks in approximately the following way. Of course, the real code ( spoiler.lua
) should take into account the output format through the FORMAT
variable, and not mechanically: the fragment below produces raw blocks in HTML, although the output format is markdown.
function Div(el) if not el.attr or not contains('spoiler', el.attr.classes) then return el end local title = el.attr.attributes['title'] or '' table.insert(el.content, 1, pandoc.RawBlock('html', '<' .. 'spoiler title="' .. title .. '">', 'RawBlock')) table.insert(el.content, pandoc.RawBlock('html', '<' .. '/spoiler>', 'RawBlock')) return el.content end