Taking the fight to the establishment.

Overview

Throwdown

Taking the fight to the establishment.

Wat?

I wanted a simple markdown interpreter in python and/or javascript to output html for my website. Python does not have a bug-free official distribution, javascript only has things you install through npm and I don't want to have anything to do with node and the 100 MB of dependencies you end up uploading to your FTP server in order to do the most basic tasks.

So writing my own parser it is then eh? I tried to trudge through the commonmark markdown spec and had a heart attack at the complexity. 24722 words and 181 pages of complicated language explaining features I absolutely don't need.

I just want minimal, well defined, syntactical elements with maximum payoff, so here is throwdown. Taking the fight to the establishment to have a stupidly minimal markup language in both definition and capability.

Goals

Keep it a subset for markdown so we can use existing IDEs & plugins. Support HTML tags in line with text. Write a well defined language spec in the manual to ease creating new interpreters for it.

The spec

Tokenization

Given a piece of text we tokenize the following concepts (these are regular expressions using DOTALL and MULTILINE modifiers):

blank_line(s): (\r | \n | \r\n){2,}
html tag: <.*?>
code: ```.*?```
unescaped italic: ^_|(?

any characters inbetween matching tokens are flagged content.

Content itself gets an additional treatment where we replace this regex

\\(?)

for escaped characters, with whatever was in the matched group. I currently do this in the generation stage but it could move to any stage.

I am not sure if in the UTF8 2/3/4 byte characters any of these elements may match, so make sure to perform these single-characetr checks per unicode char, not per byte.

Parsing

We then have a parsing pass that tries to group matching tokens:

In this example:

This *word* is bold but this* is wrong.

We have the following tokens:

content, bold, content, bold, content, bold, content

The parser simply finds any content block surrounded by matching code|italic|bold neighbours, and then 'consumes' these neighbours so they can not be picked up more than once. Reading from left to right this means we get (note we search outwards from content recursively to support *_content_* notations, instead of holding on to the boundary tokens as soon as we encounter them):

content group content bold content

Then, any token outside of a group gets merged into it's content, any consecutive content gets merged into 1 content. The first step reduced the bold into it's left neighbour:

content group content content

The next step reduces the two content blocks into one:

content group content

The above step should include html tags.

A final step is to remove the blank line tokens, but first we must make sure to merge consecutive group and content blocks, because after this any consecutive content and/or group tokens are known unique paragraphs (or headers) so the blank lines are no longer necessary to imply this separation.

Generation

Then there is the generation step. We simply walk the resulting tokens and output a html document.

  • If a content group is preceded by a heading, the node gets wrapped into tags where n is the number of #.
  • Every other content node gets wrapped into

    tags.

  • Every group gets wrapped based on the first and last tokens (which are identical).
    • italic becomes In this case the wrapping is recursive, a bold group in an intalic group may exist.
    • bold becomes In this case the wrapping is recursive, an italic group in a bold group may exist.
    • code becomes

Write
to insert single line breaks manually.

TODO:

Consider bullet points and numbered lists, though the html is not super invasive.

Owner
Trevor van Hoof
I write tools & shaders TropicalTrevor in the Demoscene
Trevor van Hoof
Osu statistics right on your desktop, made with pyqt

Osu!Stat Osu statistics right on your desktop, made with Qt5 Credits Would like to thank these creators for their projects and contributions. ppy, osu

Aditya Gupta 21 Jul 13, 2022
Pardus-flatpak-gui - A Flatpak GUI for Pardus

Pardus Flatpak GUI A GUI for Flatpak. You can run, install (from FlatHub and fro

Erdem Ersoy 2 Feb 17, 2022
Gerador do Arquivo Magnético Sintegra em Python

pysintegra é uma lib simples com o objetivo de facilitar a geração do arquivo SINTEGRA seguindo o Convênio ICMS 57/95. Com o surgimento do SPED, muito

Felipe Correa 5 Apr 07, 2022
Small C-like language compiler for the Uxn assembly language

Pyuxncle is a single-pass compiler for a small subset of C (albeit without the std library). This compiler targets Uxntal, the assembly language of the Uxn virtual computer. The output Uxntal is not

CPunch 13 Jun 28, 2022
This code extracts line width of phonons from specular energy density (SED) calculated with LAMMPS.

This code extracts line width of phonons from specular energy density (SED) calculated with LAMMPS.

Masato Ohnishi 3 Jun 15, 2022
Cool Bioinformatics Scripts

Cool Bioinformatics Scripts qqplot You can use this script in two ways read tons of millions of P values from stdin # python zcat pval.txt.gz | qqplo

8 Oct 30, 2022
Wordle is fun, so let's ruin it with computers.

ruin-wordle Wordle is fun, so let's ruin it with computers. Metrics This repository assesses two metrics about each algorithm: Success: how many of th

Charles Tapley Hoyt 11 Feb 11, 2022
Transform Python source code into it's most compact representation

Python Minifier Transforms Python source code into it's most compact representation. Try it out! python-minifier currently supports Python 2.7 and Pyt

Daniel Flook 403 Jan 02, 2023
Convert long numbers into a human-readable format in Python

Convert long numbers into a human-readable format in Python

Alex Zaitsev 73 Dec 28, 2022
VHDL to Discrete Logic on PCB Flow

PCBFlow Highly experimental set of scripts to transform a digital circuit described in a hardware description language (VHDL or Verilog) into a discre

Tim 77 Nov 04, 2022
The official Repository wherein newbies into Open Source can Contribute during the Hacktoberfest 2021

Hacktoberfest 2021 Get Started With your first Contrinution/Pull Request : Fork/Copy the repo by clicking the right most button on top of the page. Go

HacOkars 25 Aug 20, 2022
Create Arrays (Working with For Loops)

DSA with Python Create Arrays (Working with For Loops) CREATING ARRAYS WITH USER INPUT Array is a collection of items stored at contiguous memory loca

1 Feb 08, 2022
Whole-day timezone comparison

Timezone Converter Compare a full day of your local timezone with foreign ones $ timezone-converter tijuana --zone $ timezone-converter tijuana new_yo

Iago Alonso 12 Nov 24, 2022
School helper, helps you at your pyllabus's.

pyllabus, helps you at your syllabus's... WARNING: It won't run without config.py! You should add config.py yourself, it will include your APIKEY. e.g

Ahmet Efe AKYAZI 6 Aug 07, 2022
Simple application that does transformation with HPF and LPFs.

Simple application that applies Butterworth, Gaussian & Ideal kernels on HPF and LPFs -aka Frequency Domain Filtering- Upload image from sidebar, set

Merve Noyan 3 Jul 06, 2022
Spooky Castle Project

Spooky Castle Project Here is a repository where I have placed a few workflow scripts that could be used to automate the blender to godot sprite pipel

3 Jan 17, 2022
Url-check-migration-python - A python script using Apica API's to migrate URL checks between environments

url-check-migration-python A python script using Apica API's to migrate URL chec

Angelo Aquino 1 Feb 16, 2022
A slapdash script to solve Wordle or Absurdle automatically

A slapdash script to solve Wordle or Absurdle automatically

Michael Anthony 1 Jan 19, 2022
Different steganography methods with examples and my own small image database

literally-the-most-useless-project [Different steganography methods with examples and my own small image database] This project currently contains thr

Kamyishka 1 Dec 09, 2022
A comprensive software collection for nmea manipulation

nmeatoolkit A comprensive software collection for nmea manipulation; it includes a library and a collections of command line tools. Library pipes: con

Davide Gessa 1 Sep 14, 2022