The LWP’s handling of mathematical formulae evolves

The LWP's Blog · Categories: LWP meta, Technology and infrastructure

The LWP’s handling of mathematical formulae evolves

By Michele Lavazza · 21 January 2025

The way mathematical and logical notation is handled on the website of the Ludwig Wittgenstein Project recently changed. The new method allows for faster loading of pages and better overall stability, but it required some updates to our ebook export feature.

Maintenance is key

A website requires a lot of maintenance to remain functional and secure.

Even the simplest, static HTML websites run on servers which need to stay up-to-date. This means for them to receive updates, at the very least, to the operating system (for example, Ubuntu) and to the web server software (for example, Apache).

The Ludwig Wittgenstein Project is powered by MediaWiki, the same piece of software which also powers Wikipedia and its sister projects.

MediaWiki is a relatively complex and very powerful system that is optimised to create and manage collaborative knowledge bases and open content repositories. It is written in PHP and relies on a database to store its textual content. Its development is coordinated by the Wikimedia Foundation, but its free and open-source nature entails that many independent programmers also work on maintaining and expanding its functionalities.

As a sophisticated piece of software which is developed by a lively community, MediaWiki is constantly updated: a new version is released every six months, and one in four of those is a “long-term support” (LTS) version, which is guaranteed to receive security updates for three years. Updating MediaWiki from one version to the next is not entirely trivial, and so we at the LWP only use LTS versions: the website currently runs MediaWiki 1.39, the legacy LTS version, which will be supported until November 2025, and it will be upgraded to MediaWiki 1.43, the current LTS version, before then.

The “Math” extension

One of the distinctive features of MediaWiki is its extensibility. There is a rich ecosystem of plugins, or extensions, which expand the functionalities of the MediaWiki core. Some ship together with the “vanilla” package, while others can be installed by the administrators of an individual wiki so as to add this or that feature to their website and customise it in a modular way.

One of the extensions the LWP’s wiki relies on is “Math”, which is used to accurately and elegantly display complex mathematical and logical formulae. This is vital to producing high-quality editions of some of the texts which are available on this website, such as the Notes on Logic and the Tractatus Logico-Philosphicus.

If extension “Math” is installed on a wiki, its editors can use LaTeX syntax in the page’s source code, and their formulae will be displayed in a beautiful and accessible format. For example,

<math>
a + \sqrt{b} = c^2
</math>

produces the following output:

[math]\displaystyle{ a + \sqrt{b} = c^2 }[/math]

Extension “Math” has evolved significantly between MediaWiki 1.39 (November 2022) and MediaWiki 1.42 (June 2024).

Up until 1.39, in the configuration which was adopted on our website, the extension relied on Wikimedia’s so-called “Mathoid” service to convert the LaTeX source code into the visual rendition of the formula. In other words, the MediaWiki site (for example, the LWP’s site) would send a request to the Wikimedia servers via an API; the Mathoid service, running on the Wikimedia servers, would perform the rendering; the rendered formula would be sent back to the MediaWiki site as an SVG image; and it would then simply be displayed as an image in the user’s browser.

MediaWiki Math extension with Mathoid.png

For the LWP, this system mostly worked. Sometimes, however, the communication between our website and the remote Wikimedia servers would fail, and an ugly error message would be displayed instead of the formula. This outcome would be likelier in the case of pages with relatively many formulae (for example, the Tractatus editions). Moreover, even when the communication was successful, it would increase the loading time of the page significantly. In the case of long pages with very many formulae (for example, the multilingual side-by-side view of the Tractatus), the process would often timeout and the page would then fail to load altogether.

The “SimpleMathJax” extension

This made us realise we needed to find a better solution even before we realised that MediaWiki developers were themselves planning to move away from Mathoid. We then decided to switch to a different MediaWiki extension to handle mathematical formulae, that is, SimpleMathJax.

The way SimpleMathJax works is different from the process I described above in that there is no communication between the LWP’s site and the Wikimedia servers. When the page is loaded, the rendering of the formula is performed client-side, in the user’s web browser, thanks to the “MathJax” JavaScript library.

MediaWiki Math extension with MathJax.png

This solution is very stable, and the output is as pretty and as accessible as the Mathoid-generated version. Moreover, the loading times are much faster than they were before.

There was still, however, one problem…

The LWP’s ebook export feature

Back in summer 2023, thanks to the help of our good friend and developer extraordinaire Frederic Kettelhoit, we released an ebook export feature on the LWP’s website.

As I often repeat when I talk about the LWP’s technical infrastructure and distribution strategy, the website is king. This means that the LWP’s editions are best consulted directly on the website, where they are:

  • Always up-to-date (and thus reflect the latest improvements or corrections);
  • Accessible to screen readers;
  • And fully mobile-responsive.

However, there are clearly some contexts where it is important for the users to be able to read the texts offline.

The simplest scenario—printing the web page on paper or to a PDF file—was taken care of since day one through a very simple implementation: all desktop browsers (and most mobile browsers) have a native print feature which allows users to export a cleanly formatted, dynamic PDF version where the user can customise the size of the page, the font, and the margins, as well as other parameters.

Frederic set off to design an application to take care of the more complex scenarios and, in particular, to allow users to download:

  • A static PDF with fixed page, font, and margin sizes, which would allow for fixed pagination;
  • Files in the EPUB and MOBI e-reader formats, which would allow for a comfortable reading experience through specialised hardware.

The result of Frederic’s work was the LWP’s ebook export feature, which is not a MediaWiki extension but is also free and open-source software (the code is hosted on GitHub). It is mostly written in Python and works as follows: it uses the Beautifulsoup web scraping library to retrieve the list of books to be converted from our “All texts” page; for each of those, it retrieves the plain HTML version which MediaWiki generates when the string ?action=render is appended to the page’s URL; it uses a custom-made parser to convert that HTML code into MarkDown code, a simple markup language which is designed to only preserve the formatting which is semantically relevant; it then uses the Pandoc library to convert the MarkDown code into PDF, EPUB, and MOBI files. In addition to those, the MarkDown file remains available as a very clean, plain-text version of the ebook.

Wittgenstein texts HTML to ebook.png

The procedure runs automatically every 24 hours through GitHub Actions; the output files are hosted on GitHub, but they can be downloaded through direct links from the LWP’s website.

At the time this system was first deployed, the website used the “Math” extension and the Mathoid service to process the formulae; therefore, in Frederic’s version of the export feature, the formulae were handled as images.

With the website moving away from extension “Math” and towards extension “SimpleMathJax”, however, this was not going to work anymore. In Frederic’s code, the HTML of individual books was expected to contain images where the formulae were:

<span class="mwe-math-element">
  <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/27496729a92504f4c31ffcae34a1adf369dc5749" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.838ex; width:22.276ex; height:3.009ex;" alt="{\displaystyle (\Omega ^{\nu })^{\mu \prime }x=\Omega ^{\nu \times \mu \prime }x{\text{ Def.}}}">
</span>

After the adoption of SimpleMathJax, however, the HTML would instead contain LaTeX markup, because the rendering of the visual appearance of the formula would be performed after loading the HTML and directly in the user’s browser, via client-side JavaScript.

<span style="opacity:.5" class="smj-container">
  [math]\displaystyle{ ( \Omega^{ \nu} )^{\mu \prime} x = \Omega^{ \nu \times \mu \prime} x \text{ Def.} }[/math]
</span>

Therefore, in order to safely migrate to SimpleMathJax without breaking the ebook export feature, we had to update the latter.

In order to do this, we were helped with great professionalism and kindness by another friend and LWP volunteer, Liu Ruoxiao. She managed to integrate the Matplotlib Python library into Frederic’s code, so that, upon receiving the raw LaTeX markup as it is embedded in the new version of the HTML source code (the one generated by MediaWiki with extension “SimpleMathJax”), the application could perform the rendering of the formulae itself, save them locally as images, and include these in the PDF, EPUB, and MOBI files.

Conclusion

I thought all of this was worth sharing for two reasons. First, this blog post may serve as a piece of technical documentation, for us to track some of the changes in the website’s infrastructure and for others to potentially build upon our code to create a similar export feature for their own website. Second, it tells the story of a successful cooperation between several people who didn’t necessarily know each other very well, but were brought together by the common goal of making a valuable collection of texts more widely available and easily accessible—something that always warms my heart.

About the author

Michele Lavazza

Michele Lavazza is a translator, digital humanities enthusiast, learning technologies specialist, and free culture advocate. He holds a Master’s Degree in philosophy from the University of Milan; his thesis on Wittgenstein and transcendental philosophy was partly written in Milan and partly at the Paris 1 Panthéon-Sorbonne University. He founded the Ludwig Wittgenstein Project in 2020.

Michele Lavazza 2024.jpg

More from the LWP's blog

Cover image: "Ludwig Wittgenstein Skjolden Norge 2024" by Vadim Chuprina, CC BY-SA 4.0