6,305
edits
No edit summary |
No edit summary |
||
Line 10: | Line 10: | ||
MediaWiki is a relatively complex and very powerful system that is optimised to create and manage collaborative knowledge bases and open content repositories. It is written in PHP and relies on a database to store its textual content. Its development is coordinated by the Wikimedia Foundation, but its free and open-source nature entails that many independent programmers also work on maintaining and expanding its functionalities. | MediaWiki is a relatively complex and very powerful system that is optimised to create and manage collaborative knowledge bases and open content repositories. It is written in PHP and relies on a database to store its textual content. Its development is coordinated by the Wikimedia Foundation, but its free and open-source nature entails that many independent programmers also work on maintaining and expanding its functionalities. | ||
As a sophisticated | As a sophisticated piece of software which is developed by a lively community, MediaWiki is [[mediawikiwiki:Version lifecycle|constantly updated]]: a new version is released every six months, and one in four of those is a “long-term support” (LTS) version, which is guaranteed to receive security updates for three years. Updating MediaWiki from one version to the next is not entirely trivial, and so we at the LWP only use LTS versions: the website [[Special:Version|currently runs]] MediaWiki 1.39, the legacy LTS version, which will be supported until November 2025, and it will be upgraded to MediaWiki 1.43, the current LTS version, before then. | ||
==The “Math” extension== | ==The “Math” extension== | ||
Line 33: | Line 33: | ||
Extension “Math” has evolved significantly between MediaWiki 1.39 (November 2022) and MediaWiki 1.42 (June 2024). | Extension “Math” has evolved significantly between MediaWiki 1.39 (November 2022) and MediaWiki 1.42 (June 2024). | ||
Up until 1.39, in the configuration which was adopted on our website, | Up until 1.39, in the configuration which was adopted on our website, the extension relied on Wikimedia’s so-called “Mathoid” service to convert the LaTeX source code into the visual rendition of the formula. In other words, the MediaWiki site (for example, the LWP’s site) would send a request to the Wikimedia servers via an API; the Mathoid service, [[mediawikiwiki:RESTBase|running on the Wikimedia servers]], would perform the rendering; the rendered formula would be sent back to the MediaWiki site as an SVG image; and it would then simply be displayed as an image in the user’s browser. | ||
[[File:MediaWiki Math extension with Mathoid.png|center|thumb|500px]] | [[File:MediaWiki Math extension with Mathoid.png|center|thumb|500px]] | ||
Line 42: | Line 42: | ||
This made us realise we needed to find a better solution even before we realised that MediaWiki developers were themselves planning to move away from Mathoid. We then decided to switch to a different MediaWiki extension to handle mathematical formulae, that is, [[mediawikiwiki:Extension:SimpleMathJax|SimpleMathJax]]. | This made us realise we needed to find a better solution even before we realised that MediaWiki developers were themselves planning to move away from Mathoid. We then decided to switch to a different MediaWiki extension to handle mathematical formulae, that is, [[mediawikiwiki:Extension:SimpleMathJax|SimpleMathJax]]. | ||
The way SimpleMathJax works is different from the process I described above in that there is no communication between the | The way SimpleMathJax works is different from the process I described above in that there is no communication between the LWP’s site and the Wikimedia servers. When the page is loaded, the rendering of the formula is performed client-side, in the user’s web browser, thanks to the “[https://www.mathjax.org/ MathJax]” JavaScript library. | ||
[[File:MediaWiki Math extension with MathJax.png|center|thumb|500px]] | [[File:MediaWiki Math extension with MathJax.png|center|thumb|500px]] | ||
Line 53: | Line 53: | ||
Back in summer 2023, thanks to the help of our good friend and developer extraordinaire Frederic Kettelhoit, we released an ebook export feature on the LWP’s website. | Back in summer 2023, thanks to the help of our good friend and developer extraordinaire Frederic Kettelhoit, we released an ebook export feature on the LWP’s website. | ||
As I often repeat when I talk about the LWP’s | As I often repeat when I talk about the LWP’s technical infrastructure and distribution strategy, the website is king. This means that the LWP’s editions are best consulted directly on the website, where they are: | ||
*Always up-to-date (and thus reflect the latest improvements or corrections); | *Always up-to-date (and thus reflect the latest improvements or corrections); | ||
*Accessible to screen readers; | *Accessible to screen readers; | ||
Line 66: | Line 66: | ||
*Files in the EPUB and MOBI e-reader formats, which would allow for a comfortable reading experience through specialised hardware. | *Files in the EPUB and MOBI e-reader formats, which would allow for a comfortable reading experience through specialised hardware. | ||
The result of Frederic’s work was [[Project:Downloading, exporting, and manipulating the texts|the LWP’s ebook export feature]], which is not a MediaWiki extension but is also free and open-source software ([https://github.com/wittgenstein-project/wittgenstein-project.github.io the code is hosted on GitHub]). It is mostly written in Python and works as follows: it uses the [https://pypi.org/project/beautifulsoup4/ Beautifulsoup] web scraping library to retrieve the list of books to be converted from our | The result of Frederic’s work was [[Project:Downloading, exporting, and manipulating the texts|the LWP’s ebook export feature]], which is not a MediaWiki extension but is also free and open-source software ([https://github.com/wittgenstein-project/wittgenstein-project.github.io the code is hosted on GitHub]). It is mostly written in Python and works as follows: it uses the [https://pypi.org/project/beautifulsoup4/ Beautifulsoup] web scraping library to retrieve the list of books to be converted from our “[[Project:All texts|All texts]]” page; for each of those, it retrieves the plain HTML version which MediaWiki generates when the string <code>?action=render</code> is appended to the page’s URL; it uses a custom-made parser to convert that HTML code into [[wikipedia:Markdown|MarkDown]] code, a simple markup language which is designed to only preserve the formatting which is semantically relevant; it then uses the [https://pandoc.org/ Pandoc] library to convert the MarkDown code into PDF, EPUB, and MOBI files. In addition to those, the MarkDown file remains available as a very clean, plain-text version of the ebook. | ||
[[File:Wittgenstein texts HTML to ebook.png|center|thumb|600px]] | [[File:Wittgenstein texts HTML to ebook.png|center|thumb|600px]] | ||
Line 72: | Line 72: | ||
The procedure runs automatically every 24 hours through GitHub Actions; the output files are hosted on GitHub, but they can be downloaded through direct links from the LWP’s website. | The procedure runs automatically every 24 hours through GitHub Actions; the output files are hosted on GitHub, but they can be downloaded through direct links from the LWP’s website. | ||
At the time this system was first deployed, the website used the “Math” extension and the Mathoid service to | At the time this system was first deployed, the website used the “Math” extension and the Mathoid service to process the formulae; therefore, in Frederic’s version of the export feature, the formulae were handled as images. | ||
With the website moving away from extension “Math” and towards extension “SimpleMathJax”, however, this was not going to work anymore. In Frederic’s code, the HTML of individual books was expected to contain images where the formulae were: | With the website moving away from extension “Math” and towards extension “SimpleMathJax”, however, this was not going to work anymore. In Frederic’s code, the HTML of individual books was expected to contain images where the formulae were: | ||
Line 92: | Line 92: | ||
Therefore, in order to safely migrate to SimpleMathJax without breaking the ebook export feature, we had to update the latter. | Therefore, in order to safely migrate to SimpleMathJax without breaking the ebook export feature, we had to update the latter. | ||
In order to do this, we were helped with great professionalism and kindness by another friend and LWP volunteer, Liu Ruoxiao. She managed to integrate the [https://matplotlib.org/ Matplotlib] Python library into Frederic’s code, so that, upon receiving the raw LaTeX markup as it is embedded in the new version of the HTML source code (the one | In order to do this, we were helped with great professionalism and kindness by another friend and LWP volunteer, Liu Ruoxiao. She managed to integrate the [https://matplotlib.org/ Matplotlib] Python library into Frederic’s code, so that, upon receiving the raw LaTeX markup as it is embedded in the new version of the HTML source code (the one generated by MediaWiki with extension “SimpleMathJax”), the application could perform the rendering of the formulae itself, save them locally as images, and include these in the PDF, EPUB, and MOBI files. | ||
==Conclusion== | ==Conclusion== | ||
I thought all of this was worth sharing for two reasons: first, this blog post may serve as a piece of technical documentation | I thought all of this was worth sharing for two reasons: first, this blog post may serve as a piece of technical documentation, for us to track some of the changes in the website’s infrastructure and for others to potentially build upon our code to create a similar export feature for their own website; second, it tells the story of a successful cooperation between several people who didn’t necessarily know each other very well, but were brought together by the common goal of making a valuable collection of texts more widely available and easily accessible—something that always warms my heart. |