Block elements in rich text CRDT (for Peritext 2)

Need to support features:

Block elements like <p>, <ul><li>, <ol><li>, <h1>, <section>, <aside>, <blockquote>, etc.
Nesting of those elements to arbitrary depth
Splitting and joining paragraphs without anomalies such as duplicated text
Converting one block type to another (e.g. paragraph to bullet point or heading)
Marks (e.g. comments) that cross a paragraph boundary
Tables that allow rows and columns to be added or removed

I propose handling tables differently from other block elements, as explained below.

Non-tabular block elements

New insight: we should not think about manipulating blocks; rather, the objects we're manipulating are the boundaries between blocks (where one block ends and the next one starts). Then the content of a block is just the text between one boundary and the next. This makes splitting and joining blocks easy: splitting means adding a boundary, and joining means removing a boundary.

I suggest that we keep the entire text of the document (excluding text in tables, see below) in a single plain text CRDT such as RGA. We add three a new CRDT operation types:

splitBlock is similar to an insert, except that instead of adding a character to the text, it adds a special block boundary marker element to the sequence of characters. The splitBlock operation has some additional fields to indicate the block type and other properties (discussed below).
joinBlock is similar to a remove operation; it marks a block boundary as deleted (identified by the opId of a prior splitBlock operation) without actually removing it from the RGA sequence.
updateBlock modifies the properties of an existing block boundary, identified by the opId of a prior splitBlock operation. For example, this can change a <p> block into a <h1> block, or change the attributes of a <p> block from text-align: left to text-align: justify. Multiple concurrent updates of the same property of the same block boundary are resolved by last writer wins using opIds.

To render the document in a text editor, the text from one non-deleted block boundary to the next non-deleted block boundary becomes the content of a block, and the type and properties of that block are determined by the properties of its starting boundary. The text between the beginning of the document and the first block boundary (if any) becomes a <p> block with default properties.

Why does splitBlock insert a marker element into the sequence? Because this makes it easy to handle blocks with empty text content. Moreover, if two users concurrently perform a splitBlock at the same position in the document, but with different properties (e.g. one user inserts a <h1> while the other inserts a <h2>), having a marker element ensures that all users end up with the same blocks in the same order.

This approach has no problems with a mark that spans block boundaries. For example, if someone hits enter in the middle of a bold span, that's just fine: there is still only one bold span, and that span just happens to now contain a block boundary. When the CRDT state is mapped to the text editor state, the bold span will get split into the portion that appears in one paragraph and the portion that appears in the next paragraph, but this split is just part of the rendering of the CRDT state. Conceptually there is just one bold span.

Bulleted/numbered lists

The description so far works for a document that is a flat sequence of block elements, but it does not explain how to handle nesting of elements. Let's look at that next. The challenge with nesting is how to handle operations that change the tree structure.

As example, take a list with three bullet points: