Blog:The LWP’s handling of mathematical formulae evolves: Difference between revisions

no edit summary
(Created page with "'''The way mathematical and logical notation is handled on the website of the Ludwig Wittgenstein Project recently changed. The new method allows for faster loading of pages and better overall stability, but it required some updates to our ebook export feature.''' ==Maintenance is key== A website requires a lot of maintenance to remain functional and secure. Even the simplest, static HTML websites run on servers which need to stay up-to-date. This means for them to rec...")
 
No edit summary
Line 6: Line 6:
Even the simplest, static HTML websites run on servers which need to stay up-to-date. This means for them to receive updates, at the very least, to the operating system (for example, Ubuntu) and to the web server software (for example, Apache).
Even the simplest, static HTML websites run on servers which need to stay up-to-date. This means for them to receive updates, at the very least, to the operating system (for example, Ubuntu) and to the web server software (for example, Apache).


The Ludwig Wittgenstein Project is powered by [[Mw:MediaWiki|MediaWiki]], the same piece of software which also powers Wikipedia and its sister projects.
The Ludwig Wittgenstein Project is powered by [[mediawikiwiki:MediaWiki|MediaWiki]], the same piece of software which also powers Wikipedia and its sister projects.


MediaWiki is a relatively complex and very powerful system that is optimised to create and manage collaborative knowledge bases and open content repositories. It is written in PHP and relies on a database to store its textual content. Its development is coordinated by the Wikimedia Foundation, but its free and open-source nature entails that many independent programmers also work on maintaining and expanding its functionalities.
MediaWiki is a relatively complex and very powerful system that is optimised to create and manage collaborative knowledge bases and open content repositories. It is written in PHP and relies on a database to store its textual content. Its development is coordinated by the Wikimedia Foundation, but its free and open-source nature entails that many independent programmers also work on maintaining and expanding its functionalities.


As a sophisticated system which is developed by a lively community, MediaWiki is [[Mw:Version lifecycle|constantly updated]]: a new version is released every six months, and one in four of those is a “long-term support” (LTS) version, which is guaranteed to receive security updates for three years. Updating MediaWiki from one version to the next is not entirely trivial, and so we at the LWP only use LTS versions: the website [[Special:Version|currently runs]] MediaWiki 1.39, the legacy LTS version, which will be supported until November 2025, and it will be upgraded to MediaWiki 1.43, the current LTS version, before then.
As a sophisticated system which is developed by a lively community, MediaWiki is [[mediawikiwiki:Version lifecycle|constantly updated]]: a new version is released every six months, and one in four of those is a “long-term support” (LTS) version, which is guaranteed to receive security updates for three years. Updating MediaWiki from one version to the next is not entirely trivial, and so we at the LWP only use LTS versions: the website [[Special:Version|currently runs]] MediaWiki 1.39, the legacy LTS version, which will be supported until November 2025, and it will be upgraded to MediaWiki 1.43, the current LTS version, before then.


==The “Math” extension==
==The “Math” extension==
One of the distinctive features of MediaWiki is its extensibility. There is a rich ecosystem of plugins, or [[Mw:Manual:Extensions|extensions]], which expand the functionalities of the MediaWiki core. Some ship together with the “vanilla” package, while others can be installed by the administrators of an individual wiki so as to add this or that feature to their website and customise it in a modular way.
One of the distinctive features of MediaWiki is its extensibility. There is a rich ecosystem of plugins, or [[mw:Manual:Extensions|extensions]], which expand the functionalities of the MediaWiki core. Some ship together with the “vanilla” package, while others can be installed by the administrators of an individual wiki so as to add this or that feature to their website and customise it in a modular way.


One of the extensions the LWP’s wiki relies on is “[[mediawikiwiki:Extension:Math|Math]]”, which is used to accurately and elegantly display complex mathematical and logical formulae. This is vital to producing high-quality editions of some of the texts which are available on this website, such as the ''[[Notes on Logic]]'' and the [[Tractatus Logico-Philosphicus|''Tractatus Logico-Philosphicus'']].
One of the extensions the LWP’s wiki relies on is “[[mediawikiwiki:Extension:Math|Math]]”, which is used to accurately and elegantly display complex mathematical and logical formulae. This is vital to producing high-quality editions of some of the texts which are available on this website, such as the ''[[Notes on Logic]]'' and the [[Tractatus Logico-Philosphicus|''Tractatus Logico-Philosphicus'']].
Line 34: Line 34:


Up until 1.39, in the configuration which was adopted on our website, it relied on Wikimedia’s so-called “Mathoid” service to convert the LaTeX source code into the visual rendition of the formula. In other words, the MediaWiki site (for example, the LWP’s site) would send a request to the Wikimedia servers via an API; the Mathoid service, [[mediawikiwiki:RESTBase|running on the Wikimedia servers]], would perform the rendering; the rendered formula would be sent back to the MediaWiki site as a PNG image; and it would then simply be displayed as an image.
Up until 1.39, in the configuration which was adopted on our website, it relied on Wikimedia’s so-called “Mathoid” service to convert the LaTeX source code into the visual rendition of the formula. In other words, the MediaWiki site (for example, the LWP’s site) would send a request to the Wikimedia servers via an API; the Mathoid service, [[mediawikiwiki:RESTBase|running on the Wikimedia servers]], would perform the rendering; the rendered formula would be sent back to the MediaWiki site as a PNG image; and it would then simply be displayed as an image.
[[File:MediaWiki Math extension with Mathoid.png|center|thumb|485x485px]]


[[File:MediaWiki Math extension with Mathoid.png|center|thumb|500px]]


For the LWP, this system mostly worked. Sometimes, however, the communication between our website and the remote Wikimedia servers would fail, and an ugly error message would be displayed instead of the formula. This outcome would be likelier in the case of pages with relatively many formulae (for example, the ''Tractatus'' editions). Moreover, even when the communication was successful, it would increase the loading time of the page significantly. In the case of long pages with very many formulae (for example, the [[Tractatus Logico-Philosophicus (multilingual side-by-side view)|multilingual side-by-side view]] of the ''Tractatus''), the process would often timeout and the page would then fail to load altogether.
For the LWP, this system mostly worked. Sometimes, however, the communication between our website and the remote Wikimedia servers would fail, and an ugly error message would be displayed instead of the formula. This outcome would be likelier in the case of pages with relatively many formulae (for example, the ''Tractatus'' editions). Moreover, even when the communication was successful, it would increase the loading time of the page significantly. In the case of long pages with very many formulae (for example, the [[Tractatus Logico-Philosophicus (multilingual side-by-side view)|multilingual side-by-side view]] of the ''Tractatus''), the process would often timeout and the page would then fail to load altogether.
Line 43: Line 43:


The way SimpleMathJax works is different from the process I described above in that there is no communication between the local site and remote servers. When the page is loaded, the rendering of the formula is performed client-side, in the user’s web browser, thanks to the “[https://www.mathjax.org/ MathJax]” JavaScript library.
The way SimpleMathJax works is different from the process I described above in that there is no communication between the local site and remote servers. When the page is loaded, the rendering of the formula is performed client-side, in the user’s web browser, thanks to the “[https://www.mathjax.org/ MathJax]” JavaScript library.
[[File:MediaWiki Math extension with MathJax.png|center|thumb]]


[[File:MediaWiki Math extension with MathJax.png|center|thumb|500px]]


This solution is very stable, and the output is as pretty and as accessible as the Mathoid-generated version. Moreover, the loading times are much faster than they were before.
This solution is very stable, and the output is as pretty and as accessible as the Mathoid-generated version. Moreover, the loading times are much faster than they were before.
Line 67: Line 67:


The result of Frederic’s work was the LWP’s ebook export feature, which is not a MediaWiki extension but is also free and open-source software ([https://github.com/wittgenstein-project/wittgenstein-project.github.io the code is hosted on GitHub]). It is mostly written in Python and works as follows: it uses the [https://pypi.org/project/beautifulsoup4/ Beautifulsoup] web scraping library to retrieve the list of books to be converted from our “All texts” page; for each of those, it retrieves the plain HTML version which MediaWiki generates when the string <code>?action=render</code> is appended to the page’s URL; it uses a custom-made parser to convert that HTML code into [[wikipedia:Markdown|MarkDown]] code, a simple markup language which is designed to only preserve the formatting which is semantically relevant; it then uses the [https://pandoc.org/ Pandoc] library to convert the MarkDown code into PDF, EPUB, and MOBI files. In addition to those, the MarkDown file remains available as a very clean, plain-text version of the ebook.
The result of Frederic’s work was the LWP’s ebook export feature, which is not a MediaWiki extension but is also free and open-source software ([https://github.com/wittgenstein-project/wittgenstein-project.github.io the code is hosted on GitHub]). It is mostly written in Python and works as follows: it uses the [https://pypi.org/project/beautifulsoup4/ Beautifulsoup] web scraping library to retrieve the list of books to be converted from our “All texts” page; for each of those, it retrieves the plain HTML version which MediaWiki generates when the string <code>?action=render</code> is appended to the page’s URL; it uses a custom-made parser to convert that HTML code into [[wikipedia:Markdown|MarkDown]] code, a simple markup language which is designed to only preserve the formatting which is semantically relevant; it then uses the [https://pandoc.org/ Pandoc] library to convert the MarkDown code into PDF, EPUB, and MOBI files. In addition to those, the MarkDown file remains available as a very clean, plain-text version of the ebook.
[[File:Wittgenstein texts HTML to ebook.png|center|thumb|485x485px]]


[[File:Wittgenstein texts HTML to ebook.png|center|thumb|600px]]


The procedure runs automatically every 24 hours through GitHub Actions; the output files are hosted on GitHub, but they can be downloaded through direct links from the LWP’s website.
The procedure runs automatically every 24 hours through GitHub Actions; the output files are hosted on GitHub, but they can be downloaded through direct links from the LWP’s website.
Line 76: Line 76:
With the website moving away from extension “Math” and towards extension “SimpleMathJax”, however, this was not going to work anymore. In Frederic’s code, the HTML of individual books was expected to contain images where the formulae were:
With the website moving away from extension “Math” and towards extension “SimpleMathJax”, however, this was not going to work anymore. In Frederic’s code, the HTML of individual books was expected to contain images where the formulae were:


[example?]
<syntaxhighlight lang="html">
<span class="mwe-math-element">
  <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/27496729a92504f4c31ffcae34a1adf369dc5749" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.838ex; width:22.276ex; height:3.009ex;" alt="{\displaystyle (\Omega ^{\nu })^{\mu \prime }x=\Omega ^{\nu \times \mu \prime }x{\text{ Def.}}}">
</span>
</syntaxhighlight>


After the adoption of SimpleMathJax, however, the HTML would instead contain LaTeX markup, because the rendering of the visual appearance of the formula would be performed after loading the HTML and directly in the user’s browser, via client-side JavaScript.
After the adoption of SimpleMathJax, however, the HTML would instead contain LaTeX markup, because the rendering of the visual appearance of the formula would be performed after loading the HTML and directly in the user’s browser, via client-side JavaScript.


[example?]
<syntaxhighlight lang="html">
<span style="opacity:.5" class="smj-container">
  [math]\displaystyle{ ( \Omega^{ \nu} )^{\mu \prime} x = \Omega^{ \nu \times \mu \prime} x \text{ Def.} }[/math]
</span>
</syntaxhighlight>


Therefore, in order to safely migrate to SimpleMathJax without breaking the ebook export feature, we had to update the latter.
Therefore, in order to safely migrate to SimpleMathJax without breaking the ebook export feature, we had to update the latter.