Why are publishing workflows so hard to fix?

An old friend contacted me the other day. They have a new client project, and their goal is to improve the entire process of publishing reports in a big institution. Knowing that I worked in scientific publishing for many years he asked me: “What is it that typically goes wrong?”

If I had to answer that question in one sentence, it would be this: Institutions try to fix the symptoms rather than identifying and addressing the root cause and acknowledging the consequences that come with it.

The dream

There’s this dream of the perfect digital publishing system. One where you can create new documents and go through the entire process from authoring to review, copy editing, and doing the final touches (known as typesetting) before an authorized person hits a big “Publish” button. Then, the publication gets not only posted on the institution's website in a beautifully rendered web version, there’s also a PDF version generated. Charts in the web version are interactive, and contain the source data, so that you can easily reproduce the results stated in the report.

The reality

In reality most institutions work with their own hand-crafted publishing workflows built around popular tools from the print era: Microsoft Word and PDF. As a consequence sophisticated converters need to be developed and maintained to get content into and out of the system. Some of these solutions are optimized to produce web-native HTML publications, and offer terrible looking PDFs as a fallback. Other systems focus on the quality of the print output, and have horrible websites. Or they do it like some newspapers, and split their organization into separate “print” and “web” departments.

Let’s look at a few problems that very typically happen in today’s publishing workflows:

Converters: Often the automatic result of an import of a submitted Manuscript needs so much fixing, it would have been faster to copy and paste the text manually into the publishing system.
Adoption: Authors don’t accept new tools if it takes them any effort to learn them or if they don’t support certain features. For instance, they keep sending a DOCX file instead of using a specialized authoring tool your institution provides. Or they may use their preferred tools for creating charts (e.g. R-Studio, Mathlab) instead of your provided integrated solution.

Review: You have implemented a sophisticated review system, which tracks all activities for transparency. But people don’t use it. They prefer to send an email with their review as they always have done.

Changes: Some corrections need to be made, and the author sends an updated DOCX file. There is no way to apply those changes easily to the published version. Either you manually change the updated passages, or you run through the whole publishing process again.
System Fragmentation: A subsystem goes down and blocks the workflow. That happens when your institution relies on an external system that provides metadata (in science this was Crossref or ORCID among other systems).
Interactivity: There are impressing examples of interactive web publications that include data and visualizations that you can explore. There’s the wish to do that at scale at institutions to add extra value to the publication and allow reproducibility. However, in my almost 10 years I’ve seen every attempt of doing that fail for reasons that are too complex to describe here.

The way out?

I believe the solution lies in simplicity. Only a simple system will be able to make writing, editing and publishing reports easy, maybe even fun?

Before I offer my view on possible solutions, let’s look at the underlying goals that I believe most public institutions share:

Transparency: You want to inform the public about your activities. That’s why you publish in the first place.
Ease of use: You want authors, and reviewers to not even think about the publishing system they are using. They should be able to spend 99% of their time doing research and documenting their results.
Speed: Using an integrated publishing system should take as much time as editing text passage in MS Word, not more. You want to enable authors to publish at the click of a button. Submitting new revisions should be equally simple.

I believe that particularly in tech minimalistic solutions are the only sustainable ones. Why? Because you can actually make changes without being locked in for years in a complex technology with lots of dependencies. I’ve written about it here and here.

To move towards a future-proof digital publishing workflow, I believe, we must be ready to get rid of the print legacy. It might be easy to argue that if not for simpler publishing systems, then at least for a cleaner environment we should get rid of printed reports.

It might feel like a great loss at first. I usually enjoy reading a book more than scrolling through some text on the screen. However, if the digital version is “well-done” with respect to information presentation and accessibility it provides an equally pleasant reading experience at the fraction of the cost.

If I’d be hired by an institution to lead the digital transformation of the publishing workflow this would be roughly what I’d do:

I’d stop further advancing the existing print-first publishing workflow. It’d utilize it as it is, it will be there for a while longer and keep our back free for the modernization process.
Next I’d pick a small subset of publications, where you have 5-10 people responsible for the whole process. This includes authors, reviewers, copy editors and typesetters.
I’d start a pilot project for the new publishing system with them and select an author, who hasn’t used anything but Microsoft Word till now. I want to make sure a person like them would be satisfied with using the new system (it’s crucial that it feels like Microsoft Word). Then I’d invite more people, which represent the “typical” way of doing things right now. The pilot is not successful until these people assure me that they’d be happy to start using this new software.
I’d limit the available content types to an absolute minimum. Basically there would be just text (paragraphs, headings, lists, quotations), images (photos, illustrations, charts) and files (supporting content such as datasets, source code etc.).
For review iterations, I’d implement a simple comment system. The idea would be that authors can do reviews like they are used to. To me, quoting a piece of text and commenting on it has been more efficient than complex tools such as track changes, inline comments etc. It’s also much easier to implement.
At a gut-feeling level I’d design it as a single-user system (only one person can edit content). The friction that comes with introducing collaborative editing (authorize, decline changes) usually outweighs the benefits (working with multiple people on the same document in real time). I think it’s very natural that one person edits the document from start to finish. Of course they’d get help from all the other people (co-authors, reviewers, copy editors) but that would happen in separate conversations attached to the document.
I’d try to eliminate multi-stage workflows where possible. At best the document is accessible from all contributors all the time. It just evolves over time, the author has the control, all the other people contribute with comments. That way we wouldn’t need to setup a complicated permission system. And we can avoid phone calls like: “Hey Frank, could you give me access to stage 7 in our document workflow? I need to attach some more data.”
To ensure reproducibility go with a low-tech solution. First, I’d request a dedicated section in each document, which basically provides short written recipes on how to reproduce creating the charts provided with the report. There’s no silver bullet for solving reproducibility, as there are so many different tools for specific disciplines. The optimal case is when you can attach everything to the document itself (e.g. the dataset and a Matlab project file). In some cases links are useful (e.g. to an interactive web page to explore the data, such as a project on observablehq.com). However, you have to be aware that those internet resources could break any time, so local files are always more robust. At best the publication is 100% self-contained.

I’m convinced that with the described approach there will be results (the first purely digital publications) within a year and the potential to move the whole organization to a fully digital workflow within another.

PS: This is a quick draft, and needs much more reflection. I’m aware my proposed approach is very opinionated, and I don’t claim to be right on every point. This document is meant as the starting point for an open conversation. Add your questions, remarks, corrections by starting a conversation below.

On Ken, we're trying to figure out how the world works — through written conversations with depth and substance.