Blog:The LWP’s handling of mathematical formulae evolves: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 64: Line 64:
Frederic set off to design an application to take care of the more complex scenarios and, in particular, to allow users to download:
Frederic set off to design an application to take care of the more complex scenarios and, in particular, to allow users to download:
*A static PDF with fixed page, font, and margin sizes, which would allow for fixed pagination;
*A static PDF with fixed page, font, and margin sizes, which would allow for fixed pagination;
*The EPUB and MOBI e-reader formats, which would allow for a comfortable reading experience through specialised hardware.
*Files in the EPUB and MOBI e-reader formats, which would allow for a comfortable reading experience through specialised hardware.


The result of Frederic’s work was the LWP’s ebook export feature, which is not a MediaWiki extension but is also free and open-source software ([https://github.com/wittgenstein-project/wittgenstein-project.github.io the code is hosted on GitHub]). It is mostly written in Python and works as follows: it uses the [https://pypi.org/project/beautifulsoup4/ Beautifulsoup] web scraping library to retrieve the list of books to be converted from our “All texts” page; for each of those, it retrieves the plain HTML version which MediaWiki generates when the string <code>?action=render</code> is appended to the page’s URL; it uses a custom-made parser to convert that HTML code into [[wikipedia:Markdown|MarkDown]] code, a simple markup language which is designed to only preserve the formatting which is semantically relevant; it then uses the [https://pandoc.org/ Pandoc] library to convert the MarkDown code into PDF, EPUB, and MOBI files. In addition to those, the MarkDown file remains available as a very clean, plain-text version of the ebook.
The result of Frederic’s work was the [[Downloading, exporting, and manipulating the texts|LWP’s ebook export feature]], which is not a MediaWiki extension but is also free and open-source software ([https://github.com/wittgenstein-project/wittgenstein-project.github.io the code is hosted on GitHub]). It is mostly written in Python and works as follows: it uses the [https://pypi.org/project/beautifulsoup4/ Beautifulsoup] web scraping library to retrieve the list of books to be converted from our “All texts” page; for each of those, it retrieves the plain HTML version which MediaWiki generates when the string <code>?action=render</code> is appended to the page’s URL; it uses a custom-made parser to convert that HTML code into [[wikipedia:Markdown|MarkDown]] code, a simple markup language which is designed to only preserve the formatting which is semantically relevant; it then uses the [https://pandoc.org/ Pandoc] library to convert the MarkDown code into PDF, EPUB, and MOBI files. In addition to those, the MarkDown file remains available as a very clean, plain-text version of the ebook.


[[File:Wittgenstein texts HTML to ebook.png|center|thumb|600px]]
[[File:Wittgenstein texts HTML to ebook.png|center|thumb|600px]]
Line 92: Line 92:
Therefore, in order to safely migrate to SimpleMathJax without breaking the ebook export feature, we had to update the latter.
Therefore, in order to safely migrate to SimpleMathJax without breaking the ebook export feature, we had to update the latter.


In order to do this, we were helped with great professionalism and kindness by another friend and LWP volunteer, Liu Ruoxiao. She managed to integrate the [https://matplotlib.org/ Matplotlib] Python library into Frederic’s code, so that, upon receiving raw LaTeX markup as it is embedded in the HTML source of the books, the application could perform the rendering of the images itself, save them locally, and include them in the PDF, EPUB, and MOBI files.
In order to do this, we were helped with great professionalism and kindness by another friend and LWP volunteer, Liu Ruoxiao. She managed to integrate the [https://matplotlib.org/ Matplotlib] Python library into Frederic’s code, so that, upon receiving the raw LaTeX markup as it is embedded in the new version of the HTML source code (the one generate by MediaWiki with extension “SimpleMathJax”), the application could perform the rendering of the formulae itself, save them locally as images, and include these in the PDF, EPUB, and MOBI files.


==Conclusion==
==Conclusion==
I thought all of this was worth sharing for two reasons: first, this blog post may serve as a piece of technical documentation (for us to track some of the changes in the website’s infrastructure, and for others to potentially build upon our code to create a similar export feature for their own website); second, it tells the story of a successful cooperation between several people who didn’t necessarily know each other very well, but were brought together by the common goal of making a valuable collection of text more widely available and easily accessible—something that always warms my heart.
I thought all of this was worth sharing for two reasons: first, this blog post may serve as a piece of technical documentation (for us to track some of the changes in the website’s infrastructure, and for others to potentially build upon our code to create a similar export feature for their own website); second, it tells the story of a successful cooperation between several people who didn’t necessarily know each other very well, but were brought together by the common goal of making a valuable collection of texts more widely available and easily accessible—something that always warms my heart.