skribilo-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 3/3] Add CommonMark reader.


From: Ludovic Courtès
Subject: [PATCH 3/3] Add CommonMark reader.
Date: Sat, 17 Feb 2024 11:27:11 +0100

* src/guile/skribilo/reader/commonmark.scm,
tests/readers/commonmark.test: New files.
* configure.ac: Check for (commonmark) ans define ‘BUILD_COMMONMARK_READER’.
* src/guile/Makefile.am (readers, EXTRA_DIST): Add ‘commonmark.scm’.
* doc/user/syntax.skb (The CommonMark Syntax): New section.
(The Outline Syntax): Adjust initial sentence.
* .guix/skribilo-package.scm (skribilo)[propagated-inputs]: Add
GUILE-COMMONMARK.
(skribilo/guile-2.0): Remove GUILE-COMMONMARK from ‘propagated-inputs’.
---
 .guix/skribilo-package.scm               |  16 +-
 configure.ac                             |  10 +
 doc/user/syntax.skb                      |  62 ++++++-
 src/guile/Makefile.am                    |  13 +-
 src/guile/skribilo/reader/commonmark.scm | 221 +++++++++++++++++++++++
 tests/readers/commonmark.test            | 147 +++++++++++++++
 6 files changed, 459 insertions(+), 10 deletions(-)
 create mode 100644 src/guile/skribilo/reader/commonmark.scm
 create mode 100644 tests/readers/commonmark.test

diff --git a/.guix/skribilo-package.scm b/.guix/skribilo-package.scm
index 12bb2d1..c7d47ab 100644
--- a/.guix/skribilo-package.scm
+++ b/.guix/skribilo-package.scm
@@ -1,6 +1,6 @@
 ;;; guix.scm  --  Build recipe for GNU Guix.
 ;;;
-;;; Copyright © 2020, 2023 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2020, 2023-2024 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2023 Arun Isaac <arunisaac@systemreboot.net>
 ;;;
 ;;; This file is part of Skribilo.
@@ -24,7 +24,7 @@
   #:use-module (gnu packages ghostscript)
   #:use-module (gnu packages guile)
   #:use-module ((gnu packages guile-xyz)
-                #:select (guile-lib guile-reader))
+                #:select (guile-commonmark guile-lib guile-reader))
   #:use-module (gnu packages imagemagick)
   #:use-module (gnu packages lout)
   #:use-module (gnu packages pkg-config)
@@ -53,7 +53,9 @@
 
     ;; The 'skribilo' command needs them, and for people using Skribilo as a
     ;; library, these inputs are needed as well.
-    (propagated-inputs (list guile-reader guile-lib))))
+    (propagated-inputs (list guile-reader
+                             guile-lib
+                             guile-commonmark))))
 
 (define with-guile-2.0
   (package-input-rewriting/spec `(("guile" . ,(const guile-2.0)))
@@ -76,7 +78,13 @@
 
 (define-public skribilo/guile-2.0
   (package
-    (inherit (with-guile-2.0 skribilo))
+    (inherit (with-guile-2.0
+              (package
+                (inherit skribilo)
+                (propagated-inputs
+                 (modify-inputs (package-propagated-inputs skribilo)
+                   ;; XXX: Guile-CommonMark has test failures on Guile 2.0.
+                   (delete "guile-commonmark"))))))
     (name "guile2.0-skribilo")
     (inputs
      (modify-inputs (package-inputs skribilo)
diff --git a/configure.ac b/configure.ac
index 5ad964a..af1cb75 100644
--- a/configure.ac
+++ b/configure.ac
@@ -75,6 +75,16 @@ fi
 AM_CONDITIONAL([BUILD_GEMTEXT_READER],
   [test "x$have_srfi_171" == "xyes"])
 
+# Check for Guile-CommonMark, needed for the `commonmark' reader.
+GUILE_MODULE_AVAILABLE([have_commonmark], [(commonmark)])
+if test "x$have_commonmark" != "xyes"; then
+  AC_MSG_WARN([Guile-CommonMark needed by the `commonmark' reader is missing.])
+fi
+
+AM_CONDITIONAL([BUILD_COMMONMARK_READER],
+  [test "x$have_commonmark" == "xyes"])
+
+
 # Look for `convert', from ImageMagick.
 AC_PATH_PROG([CONVERT], [convert])
 if test "x$CONVERT" == "x"; then
diff --git a/doc/user/syntax.skb b/doc/user/syntax.skb
index 2de7cbd..ce8ed6d 100644
--- a/doc/user/syntax.skb
+++ b/doc/user/syntax.skb
@@ -1,7 +1,7 @@
 ;;; syntax.skb  --  The Skribilo syntaxes.
-;;; -*- coding: iso-8859-1 -*-
+;;; -*- coding: utf-8 -*-
 ;;;
-;;; Copyright 2008, 2009  Ludovic Court�s <ludo@gnu.org>
+;;; Copyright 2008, 2009, 2024  Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright 2001, 2002, 2003, 2004  Manuel Serrano
 ;;;
 ;;;
@@ -132,12 +132,66 @@ specifying a width:])
                   in the output document (such as HTML column ,(code "\"0*\"")
                   specification). Note that this way of specifying width
                   is strictly unportable.])))))
-   
+
+   (section :title [The CommonMark Syntax]
+            :ident "commonmark-syntax"
+
+      (p [While the Skribe syntax is super expressive, you may instead prefer
+something less expressive but more lightweight.  The ,(tt [commonmark]) reader
+addresses this needs: it reads inputs that follows the ,(ref :url
+"https://commonmark.org/"; :text [CommonMark syntax])—essentially a subset of
+what you may know as Markdown—and produces a document from there.  The simplest
+CommonMark input looks like this:]
+
+        (example :legend [Example input using the CommonMark syntax.]
+         (tt (pre "\
+# Title of the Document
+
+Here is an introduction.
+
+## First Chapter
+
+This is the first chapter, and here’s a hyperlink to the [CommonMark
+spec](https://commonmark.org).
+
+## Second Chapter
+
+Well, you get the idea."))))
+
+       (p [Alternatively, the input can start with metadata headers in the
+style of ,(ref :url "https://dthompson.us/projects/haunt.html"; :text [Haunt]),
+which is useful for instance if you’d like to specify the document author(s):]
+
+         (example :legend [CommonMark input with metadata headers.]
+          (tt (pre "\
+title: Title of the Document
+author: Charlie
+---
+
+Here goes the introduction.
+
+# First Chapter
+
+This is the first chapter.
+
+## Sub-section
+
+… and this is a subsection."))))
+
+       (p [Once you have a CommonMark document, pass ,(tt 
[--reader=commonmark])
+or ,(tt [-R commonmark]) to ,(tt [skribilo]).  For example, here is how you 
would render
+a CommonMark document as PDF ,(it [via]) LaTeX:]
+
+         (disp :verb #t [
+skribilo -R commonmark -t latex document.md \\
+  | pdflatex])
+
+         [The resulting file is ,(tt [texput.pdf]) by default.]))
 
    (section :title [The Outline Syntax]
             :ident "outline-syntax"
       
-      (p [Alternatively, Skribilo allows documents to be written in a
+      (p [You may also write documents in
 plain text format, with almost no markup.  Instead, conventions borrowed
 from ,(ref :text [Emacs' Outline Mode] :url
 "http://www.gnu.org/software/emacs/manual/html_node/emacs/Outline-Mode.html";)
diff --git a/src/guile/Makefile.am b/src/guile/Makefile.am
index 09bb7da..80c03be 100644
--- a/src/guile/Makefile.am
+++ b/src/guile/Makefile.am
@@ -1,5 +1,4 @@
-# Copyright 2005, 2006, 2007, 2008, 2009, 2010, 2012,
-#   2015, 2018, 2020 Ludovic Courtès <ludo@gnu.org>
+# Copyright 2005-2010, 2012, 2015, 2018, 2020, 2024 Ludovic Courtès 
<ludo@gnu.org>
 # Copyright 2022 Arun Isaac <arunisaac@systemreboot.net>
 #
 # This file is part of Skribilo.
@@ -70,6 +69,16 @@ EXTRA_DIST += skribilo/reader/gemtext.scm
 
 endif !BUILD_GEMTEXT_READER
 
+if BUILD_COMMONMARK_READER
+
+readers += skribilo/reader/commonmark.scm
+
+else !BUILD_COMMONMARK_READER
+
+EXTRA_DIST += skribilo/reader/commonmark.scm
+
+endif !BUILD_COMMONMARK_READER
+
 engines =                                              \
   skribilo/engine/base.scm skribilo/engine/context.scm \
   skribilo/engine/html.scm skribilo/engine/html4.scm   \
diff --git a/src/guile/skribilo/reader/commonmark.scm 
b/src/guile/skribilo/reader/commonmark.scm
new file mode 100644
index 0000000..266dfa8
--- /dev/null
+++ b/src/guile/skribilo/reader/commonmark.scm
@@ -0,0 +1,221 @@
+;;; commonmark.scm  --  Reader for the CommonMark syntax.
+;;;
+;;; Copyright © 2024 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2015 David Thompson <davet@gnu.org>
+;;;
+;;;
+;;; This file is part of Skribilo.
+;;;
+;;; Skribilo is free software: you can redistribute it and/or modify
+;;; it under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation, either version 3 of the License, or
+;;; (at your option) any later version.
+;;;
+;;; Skribilo is distributed in the hope that it will be useful,
+;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with Skribilo.  If not, see <http://www.gnu.org/licenses/>.
+
+(define-module (skribilo reader commonmark)
+  #:autoload   (commonmark) (commonmark->sxml)
+  #:autoload   (sxml match) (sxml-match)
+  #:use-module (skribilo reader)
+  #:use-module (skribilo utils syntax)
+  #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-11)
+  #:use-module (srfi srfi-34)
+  #:use-module (srfi srfi-35)
+  #:use-module (ice-9 match)
+  #:use-module (ice-9 rdelim)
+  #:export (reader-specification
+            make-commonmark-reader))
+
+;;; Commentary:
+;;;
+;;; This is a reader for the CommonMark syntax, a subset of what's usually
+;;; referred to as "Markdown".
+;;;
+;;; Code:
+
+(define (sxml->skribilo sxml)
+  (define sectioning
+    `((h1 . chapter)
+      (h2 . section)
+      (h3 . subsection)
+      (h4 . subsubsection)))
+
+  (define (heading? tag)
+    (assq tag sectioning))
+
+  (define (tagged? tag)
+    (match-lambda
+      ((head . _) (eq? head tag))
+      (_ #f)))
+
+  (let loop ((sxml sxml)
+             (result '()))
+    (match sxml
+      (()
+       (reverse result))
+      ((((? heading? tag) . title) . rest)
+       (let ((section (assoc-ref sectioning tag)))
+         (let-values (((body rest)
+                       (break (tagged? tag) rest)))
+           (loop rest
+                 (cons `(,section #:title (list ,@title)
+                                  ,@(sxml->skribilo body))
+                       result)))))
+      ((('p . body) . rest)
+       (loop rest
+             (cons `(paragraph ,@(sxml->skribilo body))
+                   result)))
+      ((('pre ('code ('@ _ ...) . body)) . rest)
+       (loop rest
+             (cons `(pre (prog ,@body)) result)))
+      ((('em . body) . rest)
+       (loop rest
+             (cons `(emph ,@(sxml->skribilo body))
+                   result)))
+      ((('ul ('li . items) ...) . rest)
+       (loop rest
+             (cons `(itemize ,@(map (lambda (body)
+                                      `(item ,@(sxml->skribilo body)))
+                                    items))
+                   result)))
+      ((('ol ('li . items) ...) . rest)
+       (loop rest
+             (cons `(enumerate ,@(map (lambda (body)
+                                        `(item ,@(sxml->skribilo body)))
+                                      items))
+                   result)))
+      ((('a ('@ ('href url)) . body) . rest)
+       (loop rest
+             (cons `(ref #:url ,url
+                         #:text (list ,@(sxml->skribilo body)))
+                   result)))
+      ((('code . body) . rest)
+       (loop rest
+             (cons `(tt ,@(sxml->skribilo body))
+                   result)))
+      ((('blockquote . body) . rest)
+       (loop rest
+             (cons `(blockquote ,@(sxml->skribilo body))
+                   result)))
+      ((('strong . body) . rest)
+       (loop rest
+             (cons `(bold ,@(sxml->skribilo body))
+                   result)))
+      ((('img ('@ ('src url) ('alt text) . _) . _) . rest)
+       (loop rest
+             (cons `(image #:url ,url #:text ,text)
+                   result)))
+      ((('hr) . rest)
+       (loop rest result))
+      (((? string? str) . rest)
+       (loop rest (cons str result)))
+      ((lst ...)
+       `(list ,@(fold loop result lst))))))
+
+(define (string-split-at str char-pred)
+  "Split STR at the first character that matches CHAR-PRED and return
+a list of one or two strings.  Two strings are returned if the string
+was able to be split, with the character matching CHAR-PRED removed.
+A list containing only STR is returned if CHAR-PRED does not match any
+charcter."
+  (let ((i (string-index str char-pred)))
+    (if i
+        (list (string-take str i)
+              (string-drop str (1+ i)))
+        (list str))))
+
+(define (read-headers port)
+  "Read Haunt-style headers from PORT.  Return a list of key/value pairs."
+  (let loop ((headers '()))
+    (match (read-line port)
+      ((? eof-object?)                            ;premature end-of-file
+       (reverse headers))
+      ("---"
+       (reverse headers))
+      (line
+       (match (map string-trim-both (string-split-at line #\:))
+         (((= string->symbol key) value)
+          (loop (alist-cons key value headers)))
+         (_
+          (raise (condition
+                  (&message
+                   (message (G_ "invalid CommonMark header")))))))))))
+
+(define (starts-with-headers? port)
+  "Return true if PORT starts with Haunt-style headers.  Do not actually
+consume those bytes."
+  (let ((line (read-line port 'concat)))
+    (and (not (eof-object? line))
+         (begin
+           (unread-string line port)
+           (string-prefix? "title:" (string-trim line))))))
+
+(define (raise-sectioning body)
+  "Raise sectioning in BODY: sections become chapters, subsections become
+section, and so on."
+  (define mapping
+    '((subsubsection . subsection)
+      (subsection . section)
+      (section . chapter)
+      (chapter . chapter)))
+
+  (define (section? tag)
+    (assq tag mapping))
+
+  (map (lambda (item)
+         (match item
+           (((? section? tag) #:title title . body)
+            (let ((tag (assq-ref mapping tag)))
+              `(,tag #:title ,title
+                     ,@(raise-sectioning body))))
+           (_
+            item)))
+       body))
+
+(define (read-commonmark-document port)
+  "Read a CommonMark document from PORT.  Return Scheme code as an 
s-expression."
+  (set-port-encoding! port "UTF-8")
+  (match (peek-char port)
+    ((? eof-object? eof)
+     eof)
+    (_
+     (let* ((headers (if (starts-with-headers? port)
+                         (read-headers port)
+                         '()))
+            (sexp (sxml->skribilo (commonmark->sxml port)))
+            (title (or (assoc-ref headers 'title)
+                       (match sexp
+                         ((('chapter #:title title _ ...) _ ...)
+                          title)
+                         (_ #f))))
+            (body (if (assoc-ref headers 'title)
+                      sexp
+                      (match sexp
+                        ((('chapter #:title _ body ...))
+                         body)
+                        (_
+                         sexp)))))
+       `(document
+         #:title ,title
+         #:author (list ,@(map (lambda (name)
+                                 `(author #:name ,name))
+                               (if (assoc-ref headers 'author)
+                                   (string-split (assoc-ref headers 'author)
+                                                 #\,)
+                                   '())))
+
+         ,@(if (assoc-ref headers 'title)
+               body
+               (raise-sectioning body)))))))
+
+(define make-commonmark-reader
+  (const read-commonmark-document))
+
+(define-reader commonmark "0.1" make-commonmark-reader)
diff --git a/tests/readers/commonmark.test b/tests/readers/commonmark.test
new file mode 100644
index 0000000..8047d1f
--- /dev/null
+++ b/tests/readers/commonmark.test
@@ -0,0 +1,147 @@
+;;; Exercise CommonMark reader.                  -*- Scheme -*-
+;;;
+;;; Copyright © 2024 Ludovic Courtès <ludo@gnu.org>
+;;;
+;;; This file is part of Skribilo.
+;;;
+;;; Skribilo is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU Lesser General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; Skribilo is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser
+;;; General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU Lesser General Public License
+;;; along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+(define-module (tests commonmark)
+  #:use-module (ice-9 match)
+  #:use-module (skribilo reader)
+  #:use-module (srfi srfi-64))
+
+(unless (false-if-exception (resolve-interface '(commonmark)))
+  (exit 77))
+
+(define make-commonmark-reader
+  (reader:make (lookup-reader 'commonmark)))
+
+
+(test-begin "commonmark")
+
+(test-equal "document with headers"
+  '(document #:title "CommonMark is cool"
+             #:author (list (author #:name "Charlie Doe"))
+             (chapter #:title (list "First")
+                      (paragraph "Hi there" "!"))
+             (chapter #:title (list "Second")
+                      (paragraph "Ooh.")
+                      (section #:title (list "Sub One")
+                               (paragraph "Something here."))
+                      (section #:title (list "Sub Two")
+                               (paragraph "Not much to add."))))
+  (call-with-input-string "\
+title: CommonMark is cool
+author: Charlie Doe
+---
+
+# First
+
+Hi there!
+
+# Second
+
+Ooh.
+
+## Sub One
+Something here.
+
+## Sub Two
+Not much to add."
+    (make-commonmark-reader)))
+
+(test-equal "document without headers"
+  '(document #:title (list "Title coming from first chapter")
+             #:author (list)
+             (paragraph "Hi there" "!")
+             (chapter #:title (list "Section becomes a chapter")
+                      (section #:title (list "Subsection")
+                               (paragraph "Something here.")))
+             (chapter #:title (list "Conclusion")
+                      (paragraph "Done.")))
+  (call-with-input-string "
+# Title coming from first chapter
+
+Hi there!
+
+## Section becomes a chapter
+
+### Subsection
+Something here.
+
+## Conclusion
+Done."
+    (make-commonmark-reader)))
+
+(test-equal "ornaments"
+  '(document #:title (list "The Title")
+             #:author (list)
+
+             (paragraph "Here's a " (bold "bullet") " list:")
+             (itemize
+              (item "one")
+              (item "two"))
+             (paragraph "Here's a " (emph "numbered") " list:")
+             (enumerate
+              (item "one")
+              (item "two")))
+  (call-with-input-string "
+# The Title
+
+Here's a **bullet** list:
+
+  - one
+  - two
+
+Here's a *numbered* list:
+
+  1. one
+  2. two"
+    (make-commonmark-reader)))
+
+(test-equal "references and images"
+  '(document #:title (list "The Title")
+             #:author (list)
+
+             (paragraph "The " (ref #:url "https://guix.gnu.org";
+                                    #:text (list (emph "awesome") " Guix"))
+                        " project.")
+             (paragraph (image #:url "https://guix.gnu.org/favicon.ico";
+                               #:text "The favicon.")))
+  (call-with-input-string "
+# The Title
+
+The [_awesome_ Guix](https://guix.gnu.org) project.
+
+![The favicon.](https://guix.gnu.org/favicon.ico)"
+    (make-commonmark-reader)))
+
+(test-equal "code block"
+  '(document #:title (list "The Title")
+             #:author (list)
+             (pre (prog "int main () {
+  return 0;
+}")))
+  (call-with-input-string "
+# The Title
+
+```c
+int main () {
+  return 0;
+}
+```"
+    (make-commonmark-reader)))
+
+(test-end)
-- 
2.41.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]