Back To The Books

1 Back To The Books

Toward the end of my PhD I was struggling to learn new skills such as data analysis and programming. It was at that time that I rediscovered books (duh!) and discovered a very special kind of them, known as open books or open source books. This kind of books helped me in my professional development, so I feel particularly attached to them.

By now I have found three types of open source books:

  • Books that are simply distributed by the author on their own website.
  • Books that are written through wikis, known as wikibooks.
  • Books that are written through Github.

Here I will discuss the last type, because I have the impression that Github is a powerful tool for writing books that also fits perfectly to academics.

A disclaimer first: I come here as a beginner. My area of expertise is biology and I have never written a book about it, but I would really like to do so. Meanwhile I collected the necessary tools and some idea on why we should do it. And I would like to share them with you.

2 Tools For Writing Open Books on Github

It all revolves around two tools: Markdown and, you have guessed it, Github.

2.1 Markdown

Markdown is a simple markup language; we can use it for annotating plain text, so that we know which part is the title, which part should be bold or italic, which part makes up paragraph, etc.

The most widely used markup languages are HTML and LATEX. Compared to them markdown is less versatile, but incredibly simple and easy to use.

For example, in markdown:

# This is a title

## This is a subtitle

This is a paragraph.

This text is **bold**, and this text is *italic*

This is a bullet list:

- item1
- item2

[This is a link](to/your/favourite/URL)

etc.

A big chunk of the web, for example this blog and many others, is written in Markdown.

Once you have written markdown text what can you do with it?

Markdown is a universal source that can be easily converted into nicely formatted docx and pdf, but also into HTML and epub/mobi. You can convert any markdown file into any of those formats with pandoc , no edits required.

Since a markdown file is plain text, you can open it on any computer and any platform, you just need a text editor (I generally use atom). It is not even worth mentioning that pandoc and atom are open source.

Also, since markdown is plain text, it can be tracked with Git and hosted on Github. (if you already know what they are, skip the next section)

2.2 Git and Github

Git is a version control software. It was designed for software developers that need to keep track of what they write and delete while they develop their software; and also to collaborate while doing this. Git is a tool that is somehow complementary to a backup, it keeps track of changes to specific files on a project-wise schedule.

By tracking the modifications to a file, Git assigns exact authorship to each line of such file. So Git, plus its online hub Github allows you to publish, update and track any modification (and authorship of those modifications) of files, online and in an open way. This features make Git a great tool for open but controlled collaboration.

Since the source code of programs is indeed text, Git works perfectly also for writing books (and blog, websites, manual etc.), as long as they are written in plain text formats, such as markdown.

2.3 Putting it together: Books, authorship and contributions

A combination of Markdown (universal source) and Github (publish, author, track and collaborate) is great for publishing open source books. Many academics such as Hadley Wickham are already using it.

2.3.1 Authorship and contributions

We academics care a great deal that our work gets accredited to us, and only because we are proud of it: this is very important for us in order to keep our job or to the next one. So, I would like to take one of Hadley Wickham’s books as example to discuss authorship and contributions.

As you can see, this book is tracked on Github. Github does not only tracks and keeps records of any file that constitute the book, it also keeps records of the authorship of every line of those files. As you can see, Hadley Wickham authored most of the book himself, but many other people provided small contribution. For every contributor, you can tell exactly what they wrote or deleted in the various files that constitute the source for that book. What you can’t see is that each of these contribution had to be approved from the main author.

So, a combination of markdown and Github allows you to self publish your own books. Moreover, this system provides strong authoring tools: anybody can propose changes / contributions to your book, and every contribution must be approved by you (or by any of the authors that have such privilege).

Be aware, since your book now is an open source project, it can be forked

2.4 Bonus: Bookdown

An experienced user can rely on its own skills to convert all the markdown files that constitute a book into a nicely formatted web page, the good news is that we don’t have to do this by ourself: instead we can use Bookdown

Bookdown is an R package that renders markdown books into HTML pages and many other formats. Bookdown’s manual is written in Bookdown itself, so you can peek how the rendering looks like. Since Bookdown is written in R, it is most widely known to the R community. Indeed most of the book written in Bookdown are about R, but they don’t need to, using Bookdown requires very little knowledge of R.

Another powerful tool is Gitbook, it was used to author this collection of technical books

3 Publishing

Open source books can be published in many ways, often one way does not exclude the others.

3.1 Own website

Markdown and Pandoc provide an HTML output, so your book can be easily integrated into any website, manually or automatically. For example, travis-ci can build your book automatically from Github and push it to your website; but I have not checked the details of how to do this, and setting this up might not be easy.

In this way, you fully control the copyright and the license to your book (although it is reasonable to use Github together with open source or share-alike licenses).

3.2 Rstudio connect, through Bookdown

Bookdown provides an integrated system to publish your book through Rstudio connect to its website. This is detailed on the “Publishing” chapter of bookdown’s manual together with other extensive information on publishing.

3.3 Publishers

It is also possible to sell open source books through publishers. Among those Leanpub does a good job, I personally purchased books there. Leanpub accepts to sell open source books written in markdown directly from Github. Their publishing process is fast; indeed they accept to distribute your book in real time while you write it, providing updates at anytime to the customers.

Most of the books on Leanpub are about programming, but it does not have to be like this, we could publish there books on biology, or about any of your fields of interest.

If your open source book is incredibly good, you can also sell printed versions through traditional publishers. Indeed Hadley Wickham sells his books through Chapman & Hall.

4 Impact

A small note: I haven’t written or published an open source book yet; I wrote this section starting from my personal observations and experience while studying and working in academia in the last years.

4.1 Paywall

We scientists often complain about how people don’t rely on the knowledge that we produce during years. But how much of this knowledge is locked behind a paywall that has questionable reasons to exist? Writing open source books, at least while we are funded with public money, could have an ethical side.

What about the practical side? We academics do not earn huge amounts of money for writing books that are often sold at high prices. If we would distribute open source books instead, we would not lose much. And we could gain on impact: an open source book can be accessed worldwide, and it could be also easily translated.

Beside, remember that open source books can always be sold through Leanpub.

4.2 Updates and crowdsourcing

Imagine the impact of a book that can be freely:

  • accessed online
  • downloaded on a e-reader
  • printed

And that can also be updated in real time. When we specialize we sometimes feel that books gets useless because they quickly become obsolete. Well, open source books do not.

Moreover they are less of a burden on the author, because readers can contribute to the book and improve it, from fixing typos to writing whole sections. For example, the author of Bookdown, (Yihui Xie), is also writing an R package to make website (that, by the way, I am using to make this blog). The manual for this new package is being written with Bookdown and many people are already contributing to it.

What better peer review than the readers themselves?

5 Wrapping it up and what’s next

Markdown and Github are powerful tools to publish open source books. Markdown books can be easily compiled to very nicely formatted HTML books using packages such as Bookdown. Github enables open but controlled collaboration and provides strong authorship tools.

With this system we could publish beginner-level books or also short high level reference books on the our topics of interest. Our readers could be University students or any person that want to learn such topics.

If academics would engage in this activity they could have a great impact on society.

I don’t know if I am expert enough to write an open source book on any of my area of interest. But it sounds like a good long term project and I look forward to do it. The tools are all there.

Avatar
Otho Mantegazza
Biologist, Data Scientist
comments powered by Disqus