Using HaXml to make a PDF slideshow from an Inkscape SVG

I recently got a tablet to input handwritten math for slideshow presentations, but instead of using a note-taking program (Jarnal, Xournal, Gournal) I decided that I wanted the full power of image manipulation of a program like Gimp or Inkscape. Neither of these, though, has the level of support for multi-page documents that you find in note-taking software. But Inkscape uses SVG as its native file format, so I wrote this Haskell script to transform the layers of an Inkscape SVG file into the slides of a PDF presentation. I use the HaXml library to manipulate the SVG, the Inkscape command-line interface to convert each page to PDF, and pdftk to glue the whole thing back together.

slide001.png slide002.png

slide003.png slide004.png

As usual, this post is a literate Haskell file, so you can try it out by saving it to Inkscape.lhs, compiling with ghc --make Inkscape, grabbing the source file for the images above, and running ./Inkscape < demo.svg. The output will appear in Slides.pdf (and your directory will be polluted with temp files, so be aware).

For the record, multi-page documents have been on the Inkscape feature request tracker for many versions, so I presume it is a significant change. I do grok C and C++, thanks to the legacy-oriented education system, but take little enough pleasure from them that I would rather hack around the issue in Haskell.

> import Text.XML.HaXml
> import Text.XML.HaXml.Pretty
> import Text.XML.HaXml.Posn
> import Text.PrettyPrint.HughesPJ
> import Text.Printf
> import Data.List
> import System.IO
> import System.Cmd

HaXml is based on a combinator library for CFilters to filter, search, output, etc XML content. It is a little crufty in some ways -- many datatypes are transpararent, and you have to do a lot of your own set up and tear down. The expected way to use it seems to be via processXmlWith :: CFilter -> IO () which is not sufficient for today's task. The Hackage documentation pointed to an old version of the API, so I used the current version of the source code for documentation. I'd love any criticism like "you didn't have to do X" or "here is an easier, safer way to do Y".

I couldn't think of a better way to narrate this code, so I'll start with main for a high-level read, and then later fill in all the helper functions. Naturally we start with a call to xmlParse; the "-" is a required filename for error reporting.

> main = do input <- getContents
>           let xml = xmlParse "-" input

Then I grab the names of all the layer objects in the order they appear in the file, except for the special layer "Background" which I'll include behind every slide. The call to verbatim spits them out as Strings instead of XML Content, and the "-" is yet another required filename for error reporting.

>           let names = delete "Background" 
>                       $ map verbatim 
>                       $ filterElem "-" getLayerNames
>                       $ xmlElem 
>                       $ xml
>           putStrLn $ "Making slides from layers:" 
>                        ++ concatMap ("\n\t"++) names ++ "\n"

Then for each layer, make a new version of the file with just that layer visible.

>           let outXmls = map (flip selectLayer xml) names
>               usedSlides = take (length names) slideNames
>           mapM_ (uncurry writeFile) 
>                 (zip (map (++".svg") slideNames) 
>                      (map (renderStyle xmlStyle . document) outXmls))

And some shell scripting done in Haskell. I didn't even try to find a scripting library or anything to e.g. prevent me from building a malformed command.

>           mapM_ (\slide -> do 
>                    system $ "inkscape --export-text-to-path --export-pdf='" 
>                             ++ slide ++ ".pdf' '" ++ slide ++ ".svg'")
>                 usedSlides
>           
>           system $ "pdftk " 
>                      ++ concat (intersperse " " (map (++".pdf") usedSlides)) 
>                      ++ " cat output Slides.pdf"

So now to the little details:

Grabbing the layer names

Here is the first helper I wrote, wrapping HaXml's attrval for a common case. This filter returns every tag whose attr attribute has the string value val.

> matchAttrString :: String -> String -> CFilter i
> matchAttrString attr val = attrval (attr, AttValue [Left val])

The next helper is one that maps a tag to its attribute value, otherwise discards anything else it sees. The HaXml function iffind will pass the attr attribute value of a tag to literal which just returns it. If the attribute isn't found, or the XML data isn't a tag, then none will discard it.

> showAttr :: String -> CFilter i
> showAttr attr = iffind attr literal none

The Inkscape layers are contained in <g inkscape:groupmode='layer' ...> tags. The name of the layer is in the inkscape:label attribute. I imagine this will change as Inkscape evolves. The o is the composition operator for CFilters.

> isLayer = matchAttrString "inkscape:groupmode" "layer"
> getLayerNames = showAttr "inkscape:label" `o` isLayer `o` children

Isolating the layers

Again proceeding from the outside of my program inwards, a layer is isolated with this helper, using iffind to match either the layer name or the layer "Background" which I'm going to leave in all the output files. The final keep argument to iffind says to keep parts of the XML that don't have the "inkscape:label" attribute.

> selectLayer :: String -> Document Posn -> Document Posn
> selectLayer layer doc = onContent "-" (chip (visible `o` onlyLayer)) doc
>     where onlyLayer = iffind "inkscape:label" layerOrBG keep
>           layerOrBG l = if l == layer || l == "Background" then keep else none

In writing visible I was surprised that there was a combinator to set all attributes for a tag, but none to set a single attribute.

> visible = setAttr "style" "display:inline"
> setAttr key val (CElem (Elem tag attrs cs) i) = [CElem (Elem tag newattrs cs) i]
>     where newattrs = (key, AttValue [Left val]) : filter ((/= key) . fst) attrs
> setAttr key val other = [other] -- Hackish?

As I mentioned before, there is no way that I see to directly apply this filter to an XML file using HaXml. The type CFilter = Content -> [Content] needs wrapping to apply to an XML Element directly. Notice how I have to pass in a file for error reporting; it feels like I'm doing things I'm not supposed to.

> filterElem :: FilePath -> CFilter Posn -> Element Posn -> [Content Posn]
> filterElem file f e = f (CElem e (posInNewCxt file Nothing))

> xmlElem (Document _ _ e _) = e

And now the function to actually apply a filter to an XML document. This is straight from the body of processXmlWith in the HaXml source, with filterElem pulled out.

> onContent :: FilePath -> (CFilter Posn) -> Document Posn -> Document Posn
> onContent file filter (Document p s e m) =
>     case filterElem file filter e of
>              [CElem e' _] -> Document p s e' m
>              []           -> error "produced no output"
>              _            -> error "produced more than one output"

Bits and pieces

I also used a modified style for the HughesPJ pretty printer

> xmlStyle = style { mode = LeftMode }

And a big list of slide names with three digits, for this one-off job. Better would be to use an API for generating fresh temporary files.

> slideNumbers = map (printf "%03d") ([1..999] :: [Int])
> slideNames = map ("Slide"++) slideNumbers

4 Responses to “Using HaXml to make a PDF slideshow from an Inkscape SVG”

  1. I was just today thinking about making animated slides from an Inkscape SVG file by generating a set of PDF files with only a subset of layers. This seems like a pretty good place to start.

    For this project, when you don't need to examine or update the top-level element itself, I found it useful to use

    myFilterElem cfilt (Elem _ _ contents) = concatMap cfilt contents layerNames = map verbatim . myFilterElem getLayerNames . xmlElem

    and similarly

    filterLayers layerPred doc = myOnContent (visible o onlyLayer) doc where onlyLayer = iffind "inkscape:label" layerOrBG keep layerOrBG l = if layerPred l then keep else none

    myOnContent cfilt (Document p s (Elem en ea contents) m) = Document p s (Elem en ea (concatMap cfilt contents)) m

  2. See also: resources from the Inkscape Wiki

  3. Nice. What tablet model did you get? I'd like to be able to embed images (in particular free-form drawings from a tablet) in my literate haskell code, perhaps as XML or some mime-encoded binary format(?). Would of course need a decent editor to go with it…

  4. I got this Wacom Bamboo mostly because it was so cheap. (Student budget)

Leave a Reply

You must be logged in to post a comment.