Normal tables in Markdown







Markdown tables are hellish:







  1. You cannot write text in cells longer than a couple of words, and even less a list.
  2. If the dialect allows paragraph 1, it is inconvenient to format.
  3. If the cells are not aligned, the table cannot be read.
  4. There is no support for similar tables and automation, such as line numbering.


It's time to write a filter for Pandoc that draws tables from structured YAML, with row numbering, horizontal orientation, graph templates, and at the same time figure out how to write Lua filters.







I usually write texts in Markdown and convert to the target format using Pandoc. This is a program that converts documents between formats, for example, from Markdown you can get HTML, and another dialect of MD, and DOCX, and PDF (more than 30 input and more than 50 output formats). Pandoc Markdown has many convenient extensions for links, footnotes, signatures, formulas.







Pandoc works as a composition of functions (it would have been written in Haskell): a specific input format β†’ abstract representation of a document β†’ a specific output format. An abstract representation can be changed using filters written in Lua. Filters do not need to know about the output format, but they can take it into account.







Our filter will search for abstract blocks of code in the conditional language table



, read YAML inside them and generate abstract representations of tables that Pandoc itself will produce in the target format.







 pandoc --lua-filter table.lua input.md -o output.html
      
      





What are the alternatives and why are they worse?









In contrast, pandoc-crossref



and all Pandoc buns work with the filter for summary tables. The filter can also be used to generate standard Markdown tables by specifying the appropriate output format. Of the disadvantages:









The description of the table includes three parts:







  1. Table structure







    An ordered list of graphs (columns):







    • At a minimum, a column should have a title.
    • In order to be able to rearrange the columns without touching the data, the record attribute displayed in the column ( id



      ) must be specified.
    • Special columns do not have id, but have a description of how to fill them. First you need a serial number ( special: number



      ).
    • Column alignment ( align



      ).


    Also, the table can be vertical or horizontal ( orientation



    ). In the latter case, the graphs will be rows.







  2. Table properties: ID for links ( id



    ) and signature ( caption



    ). Pandoc allows you to sign tables, but not blocks of code.







  3. Data in the form of an array of YAML dictionaries.









The structure can be common to several tables, so you can describe it both directly with the table, and once in the metadata (front-matter), and then refer to the named template.







Implementation Plan:







  1. From the metadata of the document we form a dictionary of templates.







  2. For each block of code with the class table



    :







    1. We parse the YAML tables.
    2. If a template is specified, we take it from the dictionary, otherwise we fill out the template from YAML.
    3. We fill in the individual properties of the table from YAML.
    4. We form the table entries from YAML (the record is a row in a regular table or a column in a horizontal one).
    5. We "draw" a table according to a template, properties and records.




The upper level is implemented as written (all code is available at the link at the end of the article):







 function Pandoc(doc) local meta_templates = doc.meta['table-templates'] if meta_templates then for name, value in pairs(meta_templates) do templates[name] = parse_template(value) end end local blocks = pandoc.walk_block(pandoc.Div(doc.blocks), { CodeBlock = create_table }) return pandoc.Pandoc(blocks, doc.meta) end
      
      





The parse_template()



function slightly converts the metadata format. Pandoc represents their values ​​as MetaBlock



and MetaInline



. Either simple lines are made of them pandoc.utils.stringify()



function (for example, orientation), or visual elements (for example, a block of text in the column heading).







About debugging. There are many examples in the Pandoc documentation, but types are not described in great detail. For debugging filters, it is convenient to have a variable dump function. Serious libraries print too many details, I prefer one of the simple options .







Functions for converting metadata to document elements
 local function to_inlines(content) if content == nil then return {} elseif type(content) == 'string' then return {pandoc.Str(content)} elseif type(content) == 'number' then return to_inlines(tostring(content)) elseif content.t == 'MetaInlines' then inlines = {} for i, item in ipairs(content) do inlines[i] = item end return inlines end end local function to_blocks(content) if (type(content) == 'table') and content.t == 'MetaBlocks' then return content else return {pandoc.Plain(to_inlines(content))} end end
      
      





The create_table()



function is called for each block of code in triple backtics.







We are only interested in code blocks β€œin the language” of table



:







 if not contains('table', block.classes) then return block end
      
      





To parse YAML inside a code block, we create a document consisting only of YAML metadata, parse it with Pandoc and leave only metadata:







 local meta = pandoc.read('---\n' .. block.text .. '\n---').meta
      
      





Next, from meta



link to a template or table structure and properties of a specific table is read.







The fill_table()



function reads from meta



data for the attributes specified in the description of the graph. At the same stage, if the column is marked as special, its contents are generated:







 local data = {} for i, serie in ipairs(template.series) do if serie.special == 'number' then data[i] = to_blocks(#datum + 1) else data[i] = to_blocks(item[serie.id]) end end
      
      





The format_table()



function forms the resulting array of cells depending on the orientation of the table and creates an abstract table object. It should be noted that if the widths or headers should be set for all columns or for none, otherwise Pandoc will simply not create a table.







You can put the finished script into ~/.local/share/pandoc



(the ~/.local/share/pandoc



data directory ) to access it by name from anywhere.







PS



As for accounting for the output format with filters. For example, I write spoilers in Pandoc like this:







 ::: {.spoiler title=""}  . :::
      
      





There are no spoilers in the Pandoc document model, so the filter should produce raw blocks in approximately the following way. Of course, the real code ( spoiler.lua



) should take into account the output format through the FORMAT



variable, and not mechanically: the fragment below produces raw blocks in HTML, although the output format is markdown.







 function Div(el) if not el.attr or not contains('spoiler', el.attr.classes) then return el end local title = el.attr.attributes['title'] or '' table.insert(el.content, 1, pandoc.RawBlock('html', '<' .. 'spoiler title="' .. title .. '">', 'RawBlock')) table.insert(el.content, pandoc.RawBlock('html', '<' .. '/spoiler>', 'RawBlock')) return el.content end
      
      





References






All Articles