Taking the fight to the establishment.

Overview

Throwdown

Taking the fight to the establishment.

Wat?

I wanted a simple markdown interpreter in python and/or javascript to output html for my website. Python does not have a bug-free official distribution, javascript only has things you install through npm and I don't want to have anything to do with node and the 100 MB of dependencies you end up uploading to your FTP server in order to do the most basic tasks.

So writing my own parser it is then eh? I tried to trudge through the commonmark markdown spec and had a heart attack at the complexity. 24722 words and 181 pages of complicated language explaining features I absolutely don't need.

I just want minimal, well defined, syntactical elements with maximum payoff, so here is throwdown. Taking the fight to the establishment to have a stupidly minimal markup language in both definition and capability.

Goals

Keep it a subset for markdown so we can use existing IDEs & plugins. Support HTML tags in line with text. Write a well defined language spec in the manual to ease creating new interpreters for it.

The spec

Tokenization

Given a piece of text we tokenize the following concepts (these are regular expressions using DOTALL and MULTILINE modifiers):

blank_line(s): (\r | \n | \r\n){2,}
html tag: <.*?>
code: ```.*?```
unescaped italic: ^_|(?

any characters inbetween matching tokens are flagged content.

Content itself gets an additional treatment where we replace this regex

\\(?)

for escaped characters, with whatever was in the matched group. I currently do this in the generation stage but it could move to any stage.

I am not sure if in the UTF8 2/3/4 byte characters any of these elements may match, so make sure to perform these single-characetr checks per unicode char, not per byte.

Parsing

We then have a parsing pass that tries to group matching tokens:

In this example:

This *word* is bold but this* is wrong.

We have the following tokens:

content, bold, content, bold, content, bold, content

The parser simply finds any content block surrounded by matching code|italic|bold neighbours, and then 'consumes' these neighbours so they can not be picked up more than once. Reading from left to right this means we get (note we search outwards from content recursively to support *_content_* notations, instead of holding on to the boundary tokens as soon as we encounter them):

content group content bold content

Then, any token outside of a group gets merged into it's content, any consecutive content gets merged into 1 content. The first step reduced the bold into it's left neighbour:

content group content content

The next step reduces the two content blocks into one:

content group content

The above step should include html tags.

A final step is to remove the blank line tokens, but first we must make sure to merge consecutive group and content blocks, because after this any consecutive content and/or group tokens are known unique paragraphs (or headers) so the blank lines are no longer necessary to imply this separation.

Generation

Then there is the generation step. We simply walk the resulting tokens and output a html document.

  • If a content group is preceded by a heading, the node gets wrapped into tags where n is the number of #.
  • Every other content node gets wrapped into

    tags.

  • Every group gets wrapped based on the first and last tokens (which are identical).
    • italic becomes In this case the wrapping is recursive, a bold group in an intalic group may exist.
    • bold becomes In this case the wrapping is recursive, an italic group in a bold group may exist.
    • code becomes

Write
to insert single line breaks manually.

TODO:

Consider bullet points and numbered lists, though the html is not super invasive.

Owner
Trevor van Hoof
I write tools & shaders TropicalTrevor in the Demoscene
Trevor van Hoof
Um sistema de llogin feito em uma interface grafica.

Interface-para-login Um sistema de login feito com JSON. Utilizando a biblioteca Tkinter, eu criei um sistema de login, onde guarda a informações de l

Mobben 1 Nov 28, 2021
Flask-built web application that simulates a time and cost calculator for charging Electric Vehicles.

ev_charging_calculator Flask-built web application that simulates a time and cost calculator for charging Electric Vehicles. The project aims to simul

1 Nov 03, 2021
The Open edX platform, the software that powers edX!

This is the core repository of the Open edX software. It includes the LMS (student-facing, delivering courseware), and Studio (course authoring) compo

edX 6.2k Jan 01, 2023
An end-to-end Python-based Infrastructure as Code framework for network automation and orchestration.

Nectl An end-to-end Python-based Infrastructure as Code framework for network automation and orchestration. Features Data modelling and validation. Da

Adam Kirchberger 15 Oct 14, 2022
B-Pkg is a simple tool in python for installing all basic package in termux

Basic-Pkg 👉🏻 Basic-Pkg 👈🏻 B-Pkg is a simple tool in python for installing all basic package in termux This is my first tool, I hope you will like

Macgaiver 3 Oct 21, 2021
Remote execution of a simple function on the server

FunFetch Remote execution of a simple function on the server All types of Python support objects.

Decave 4 Jun 30, 2022
A gamey, snakey esoteric programming language

Snak Snak is an esolang based on the classic snake game. Installation You will need python3. To use the visualizer, you will need the curses module. T

David Rutter 3 Oct 10, 2022
STAC in Jupyter Notebooks

stac-nb STAC in Jupyter Notebooks Install pip install stac-nb Usage To use stac-nb in a project, start Jupyter Lab (jupyter lab), create a new noteboo

Darren Wiens 32 Oct 04, 2022
The bidirectional mapping library for Python.

bidict The bidirectional mapping library for Python. Status bidict: has been used for many years by several teams at Google, Venmo, CERN, Bank of Amer

Joshua Bronson 1.2k Dec 31, 2022
Load, explore and analyse data from Scotland and rest of the world related to Covid19.

Streamlit Examples This is my first attempt with Streamlit. It is an open-source framework, free, Python-based and easy to use tool to build and deplo

Eyad Elyan 12 Mar 01, 2021
Verification of Monty Hall problem by experimental simulation.

Verification of Monty Hall problem by experimental simulation. |中文|English| In the process of learning causal inference, I learned about the Monty Hal

云端听茗 1 Nov 22, 2022
MeerKAT radio telescope simulation package. Built to simulate multibeam antenna data.

MeerKATgen MeerKAT radio telescope simulation package. Designed with performance in mind and utilizes Just in time compile (JIT) and XLA backed vectro

Peter Ma 6 Jan 23, 2022
A code to clean and extract a bib file based on keywords.

These are two scripts I use to generate clean bib files. clean_bibfile.py: Removes superfluous fields (which are not included in fields_to_keep.json)

Antoine Allard 4 May 16, 2022
A conda-smithy repository for boost-histogram.

The official Boost.Histogram Python bindings. Provides fast, efficient histogramming with a variety of different storages combined with dozens of composable axes. Part of the Scikit-HEP family.

conda-forge 0 Dec 17, 2021
The refactoring tutorial I wrote for PyConDE 2022. You can also work through the exercises on your own.

Refactoring 101 planet images by Justin Nichol on opengameart.org CC-BY 3.0 Goal of this Tutorial In this tutorial, you will refactor a space travel t

Kristian Rother 9 Jun 10, 2022
This is some simple code to scrape vistbook's system to get an overview of the different cabins availability.

DNT_cabin_availability_system This is some simple code to scrape visbook's system to get an overview of the different cabins availability. The system

Andreas Lorentzen 1 Sep 25, 2022
Reconhecimento de voz, em português, com python

Speech_recognizer Reconhecimento de voz, em português, com python O ato de falar nada mais é que criar vibrações no ar. Por meio de um conversor analó

Marcus Vinícius Ribeiro Andrade 1 Dec 14, 2021
NES development tool made with Python and Lua

NES Builder NES development and romhacking tool made with Python and Lua Current Stage: Alpha Features Open source "Build" project, which exports vari

10 Aug 19, 2022
Remote Worker

Remote Worker Separation of Responsibilities There are several reasons to move some processing out of the main code base for security or performance:

V2EX 69 Dec 05, 2022
Battery conservation Python script for ubuntu to enable battery conservation mode at 60% 80% or 90%

Description Batteryconservation is a small python script wich creates an appindicator for ubuntu which can be used to enable / disable battery conserv

3 Jan 04, 2022