Markup for note taking

Overview

Subtext: markup for note-taking

Subtext is a text-based, block-oriented hypertext format. It is designed with note-taking in mind. It has a simple, people-friendly syntax with a passing resemblance to Markdown.

See the Speculative Specification.

We're experimenting with Subtext as part of Subconscious, a new tool for thought.

Warning to implementors: Subtext is currently experimental status. We'll be spending some time living with Subtext and building experimental tools on top of it before committing to anything. The language design is just a hypothesis! It might undergo radical breaking changes! This is work in progress, and shared in the spirit of working with the garage door open.

A bit of Subtext

Here’s an example:

# Heading

Plain text

- List item
- List item

> Quoted text

& example.csv
& https://example.com

Subtext is line-oriented. Each line in the file is treated as a discrete block of content. The type of a line is determined by a sigil character, like #, &, >, at the front of the line. If a line doesn’t have a sigil character, it is treated as plain text. This makes Subtext very easy to parse, and very easy to write. It is currently impossible to write broken Subtext, which is nice!

Subtext is for notes

Today the book is already… an outdated mediation between two different filing systems. For everything that matters is to be found in the card box of the researcher who wrote it, and the scholar studying it assimilates it into his own card index. (Walter Benjamin)

HTML comes in web pages. The analogy for an HTML document is quite literally a page. The image that springs to mind is of an 8.5x11” sheet, carefully typeset, with multiple fonts, headings, complex formatting, perhaps laid out across many columns. HTML is a publication format, designed to produce complete, indivisible artifacts, called pages.

The right mental analogy for Subtext is not the page. It is the the index card.

Subtext deliberately avoids the kind of complex presentation features offered by publishing formats like HTML, PDF, and LaTex. It has no opinions about fonts, colors, sizes.

Like a stack of index cards, there are many ways to use Subtext, beyond simple linear layout. It isn’t just for narrative. It’s hypertext montage.

Subtext is block-oriented

Subtext represents block-oriented documents as line-oriented markup.

A block-oriented document is made up of a list of blocks of different types (or occasionally, a tree of blocks). Each block type may be displayed differently. For example, a quote block may render as quote-formatted text, while an image block may render an image in-place.

Some of the earliest hypertext proposals were block-oriented, including Ted Nelson's ELF (Nelson, 1965). Block-oriented documents have also independently evolved within many contemporary tools-for-thought, including Notion, Roam, and Ward Cunningham's Federated Wiki.

Why does this pattern keep re-emerging? One reason might be that block-oriented editing is an easy way to express rich formatting. But more importantly…

Blocks are composable

Blocks are thought legos. A block-oriented document is composable (and decomposable). You can break it apart into component blocks, filter it down to blocks of a particular type, merge documents, pluck out blocks, link to specific blocks, etc.

In theory, this is true of any tree-based markup language, such as HTML. But try meaningfully merging two HTML files in practice... Yikes! Tag soup!

A linear block-oriented format resolves the problem by radically simplifying it. With a linear data model, the range of meaningful document structures is narrowed, and this means you can make complex, yet meaningful programmatic decisions, without much context about the specific document:

  • Excerpt a document by taking the first text block
  • Select all quotes from a collection of documents
  • Select all links, and generate a link graph for a collection of documents
  • Find all backlinks and append them to the document as links

Linear block-oriented documents are like shipping containers for discrete thoughts. Because blocks are structurally uniform, they can be automatically moved around and reorganized. Software can split, join, and merge documents easily and effectively, because the document structure is simple.

Subtext is hypertext

Link blocks (&) are the most important feature in Subtext. They let you reference other files, and URLs. You can link to any kind of file, including other Subtext files!

The plan is to have Subconscious display these links as transclusions. Rather than linked words in text, imagine something more like a quote tweet… Links to images display as literal images, links to videos display as playable videos with playback controls, links to documents display some or all of the content inside of the linked document. This lets you compose hypertext documents from many smaller documents.

This keeps Subtext simple. Rather than extending the syntax of Subtext to include a complex feature like tables, we might, for example, link to a .csv file, which then gets rendered as a table. This also means the data stays in its native file type, and can be used in other applications.

One of the many attempts of nature to evolve a Xanadu

By an accident of convergent evolution, Subtext happens to have some similarities to Ted Nelson's ELF format (Nelson, 1965).

Ted Nelson “A File Structure for the Complex, the Changing, and the Indeterminate”, 1965

Like ELF, Subtext documents are made up of a list of small blocks. Also like ELF, links are transcluded. Big documents can be composed by linking to small documents.

I discovered Ted Nelson’s ELF paper after writing up my first draft of Subtext. Uncovering this bit of convergent evolution was encouraging! It suggests I’m pulling on a worthwhile thread. Xanadu by way of Markdown? Something like that.

Why not Markdown?

I took a deep breath before thinking about the jump from Markdown. If you’re a programmer, Markdown is a de-facto standard for formatted text. For many, it is the first obvious choice for this kind of thing. So why Subtext?

Subtext has evolved out personal experiments with plain-text note-taking, spanning 10 years and 12k notes. Many of these notes are written in Markdown. However, over time, I noticed that my markup needs for note-taking were different from my markup needs for publishing. My note-taking style organically converged on a tiny subset of Markdown's features: text, links, lists, quotes, and one level of heading. To have more may be useful for publishing, but is often overkill for note-taking.

At the same time, I began to write small generative programs that worked with this collection of notes, little scripts that would combine ideas, remix notes, algorithmically generate new notes… these were the seeds that would later become Subconscious.

Here, I started to run into limitations with Markdown and HTML. As a complex publishing format, it is unclear how to meaningfully decompose or merge Markdown/HTML documents. When you combine documents, heading levels may need to be changed, lists may need to be flattened or nested. Because the document format is complex, foreknowledge of the meaning of the document is necessary to make meaningful changes. That limits what you can do with software.

Subtext is an attempt to resolve the problem by radically simplifying it. Paradoxically, by limiting the format to a flat list of blocks, we radically expand what software can usefully do with it. Blocks are easy to parse, easy to work with, and you can do all sorts of interesting generative algorithmic things with them.

The syntax is also simple, and hard to mess up, and I’m happy about that, too.

Project links

Owner
Gordon Brander
Building something new (prev @google, @mitmedialab, @mozilla).
Gordon Brander
Using Opencv ,based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching

Using Opencv ,this project is based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching ,it will just mask that image . This project ,if used in cctv

1 Feb 13, 2022
Toolbox for OCR post-correction

Ochre Ochre is a toolbox for OCR post-correction. Please note that this software is experimental and very much a work in progress! Overview of OCR pos

National Library of the Netherlands / Research 117 Nov 10, 2022
([email protected]) Boosting Co-teaching with Compression Regularization for Label Noise

Nested-Co-teaching ([email protected]) Pytorch implementation of paper "Boosting Co-tea

YINGYI CHEN 41 Jan 03, 2023
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c

jbarlow83 7.9k Jan 03, 2023
APS 6º Semestre - UNIP (2021)

UNIP - Universidade Paulista Ciência da Computação (CC) DESENVOLVIMENTO DE UM SISTEMA COMPUTACIONAL PARA ANÁLISE E CLASSIFICAÇÃO DE FORMAS Link do git

Eduardo Talarico 5 Mar 09, 2022
learn how to use Gesture Control to change the volume of a computer

Volume-Control-using-gesture In this project we are going to learn how to use Gesture Control to change the volume of a computer. We first look into h

Diwas Pandey 49 Sep 22, 2022
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 06, 2022
Image processing using OpenCv

Image processing using OpenCv Write a program that opens the webcam, and the user selects one of the following on the video: ✅ If the user presses the

M.Najafi 4 Feb 18, 2022
color detection using python

colordetection color detection using python In this color detection Python project, we are going to build an application through which you can automat

Ruchith Kumar 1 Nov 04, 2021
document image degradation

ocrodeg The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OC

NVIDIA Research Projects 134 Nov 18, 2022
Image Recognition Model Generator

Takes a user-inputted query and generates a machine learning image recognition model that determines if an inputted image is or isn't their query

Christopher Oka 1 Jan 13, 2022
list all open dataset about ocr.

ocr-open-dataset list all open dataset about ocr. printed dataset year Born-Digital Images (Web and Email) 2011-2015 COCO-Text 2017 Text Extraction fr

hongbomin 95 Nov 24, 2022
Character Segmentation using TensorFlow

Character Segmentation Segment characters and spaces in one text line,from this paper Chinese English mixed Character Segmentation as Semantic Segment

26 Aug 25, 2022
A program that takes in the hand gesture displayed by the user and translates ASL.

Interactive-ASL-Recognition Using the framework mediapipe made by google, OpenCV library and through self teaching, I was able to create a program tha

Riddhi Bajaj 3 Nov 22, 2021
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.

doc2text doc2text extracts higher quality text by fixing common scan errors Developing text corpora can be a massive pain in the butt. Much of the tex

Joe Sutherland 1.3k Jan 04, 2023
Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching This repository is an official implementation of

HKUST-KnowComp 13 Sep 08, 2022
A simple component to display annotated text in Streamlit apps.

Annotated Text Component for Streamlit A simple component to display annotated text in Streamlit apps. For example: Installation First install Streaml

Thiago Teixeira 312 Dec 30, 2022
A pkg stiching around view images(4-6cameras) to generate bird's eye view.

AVP-BEV-OPEN Please check our new work AVP_SLAM_SIM A pkg stiching around view images(4-6cameras) to generate bird's eye view! View Demo · Report Bug

Xinliang Zhong 37 Dec 01, 2022
Textboxes implementation with Tensorflow (python)

tb_tensorflow A python implementation of TextBoxes Dependencies TensorFlow r1.0 OpenCV2 Code from Chaoyue Wang 03/09/2017 Update: 1.Debugging optimize

Jayne Shin (신재인) 20 May 31, 2019
Pure Javascript OCR for more than 100 Languages 📖🎉🖥

Version 2 is now available and under development in the master branch, read a story about v2: Why I refactor tesseract.js v2? Check the support/1.x br

Project Naptha 29.2k Jan 05, 2023