Blog:The LWP’s handling of mathematical formulae evolves: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 66: Line 66:
*Files in the EPUB and MOBI e-reader formats, which would allow for a comfortable reading experience through specialised hardware.
*Files in the EPUB and MOBI e-reader formats, which would allow for a comfortable reading experience through specialised hardware.


The result of Frederic’s work was the [[Downloading, exporting, and manipulating the texts|LWP’s ebook export feature]], which is not a MediaWiki extension but is also free and open-source software ([https://github.com/wittgenstein-project/wittgenstein-project.github.io the code is hosted on GitHub]). It is mostly written in Python and works as follows: it uses the [https://pypi.org/project/beautifulsoup4/ Beautifulsoup] web scraping library to retrieve the list of books to be converted from our “All texts” page; for each of those, it retrieves the plain HTML version which MediaWiki generates when the string <code>?action=render</code> is appended to the page’s URL; it uses a custom-made parser to convert that HTML code into [[wikipedia:Markdown|MarkDown]] code, a simple markup language which is designed to only preserve the formatting which is semantically relevant; it then uses the [https://pandoc.org/ Pandoc] library to convert the MarkDown code into PDF, EPUB, and MOBI files. In addition to those, the MarkDown file remains available as a very clean, plain-text version of the ebook.
The result of Frederic’s work was [[Project:Downloading, exporting, and manipulating the texts|the LWP’s ebook export feature]], which is not a MediaWiki extension but is also free and open-source software ([https://github.com/wittgenstein-project/wittgenstein-project.github.io the code is hosted on GitHub]). It is mostly written in Python and works as follows: it uses the [https://pypi.org/project/beautifulsoup4/ Beautifulsoup] web scraping library to retrieve the list of books to be converted from our “All texts” page; for each of those, it retrieves the plain HTML version which MediaWiki generates when the string <code>?action=render</code> is appended to the page’s URL; it uses a custom-made parser to convert that HTML code into [[wikipedia:Markdown|MarkDown]] code, a simple markup language which is designed to only preserve the formatting which is semantically relevant; it then uses the [https://pandoc.org/ Pandoc] library to convert the MarkDown code into PDF, EPUB, and MOBI files. In addition to those, the MarkDown file remains available as a very clean, plain-text version of the ebook.


[[File:Wittgenstein texts HTML to ebook.png|center|thumb|600px]]
[[File:Wittgenstein texts HTML to ebook.png|center|thumb|600px]]