Developing a Beautiful and Performant Block Editor in Qt C++ and QML
How I developed Daino Notes block editor from scratch - a cross-platform and native-like application
Computers are fast, but modern software - especially web apps, are so bloated that they hold their full potential back. This is why I've decided to build my own block editor from scratch using Qt C++ and QML.
Native/Non-Native
When people refer to native apps, they typically mean applications developed using the GUI frameworks provided by the operating system. However, using native frameworks isn't always the best choice. For instance, Apple's SwiftUI is reportedly slow[1][2][3][4], and Microsoft tends to abandon each new UI framework every five years[5][6].
So, it's more useful to ask: what do we expect from good native apps? We expect them to:
1. Look visually appealing and consistent with the rest of the OS.
2. Behave consistently.
3. Perform well.
I will attempt to convince you that it's possible to achieve all three of these qualities using Qt, at least to some extent. In the Qt community, we refer to our applications as 'native-like' because they can perform and look like native apps, even though they aren't built with native frameworks.
*While Qt apps typically don't look or behave exactly like native apps, I'm going to argue that they can.
Block Editor
The previous version of Daino Notes[7] was a simple note-taking app built using Qt Widgets. It featured a familiar three-pane design. The text editor was a basic plain text editor[8] with Markdown syntax highlighting[9]. It was simple and worked fairly well.
But for a long time, I wanted something more. Daino Notes has been featured as one of the top results on Google searches for the keyword 'notes,' so many of its users are not technical. They aren't concerned with Markdown; they simply want standard formatting options and an app that just works. They want a WYSIWYG (What You See Is What You Get) experience. I wanted that, too.
At the same time, I've grown fond of Markdown. The idea that all my notes are formatted in a syntax that will essentially last as long as computers exist—plain text—is very reassuring. I don't want to change that, so Markdown is here to stay.
Around that time, I became interested in new types of editors that were gaining popularity, particularly those popularized by Notion. Notion's block editor is a brilliant concept—each piece of content, whether it's text, an image, a to-do list, or a complex view like Kanban, is treated as an individual block. This allows for great flexibility in organizing and manipulating content, such as dragging and dropping different types of blocks. Most importantly, it enables the integration of complex block types like Kanban and Timeline views within the same document. For instance, the Daino Notes Block Editor is so versatile that a block in the middle of a document could easily be a video game, if I so desired.
The main problems with Notion are that:
1. It's a resource hog, with high CPU and RAM utilization, leading to battery drain and inefficiency.
2. It's too complex.
Notion is a resource hog because it's built on multiple layers of abstraction, like many other web apps, preventing optimal utilization of computer resources. Although there are faster web-based alternatives, even the fastest one (MarkText) is 60 times slower than Daino Notes and uses six times more RAM (when it doesn't hang). I'll explore performance comparisons with other apps later.
The complexity of Notion might be both its strength and its weakness. In Notion, you can create databases that hold data in a table-like structure, allowing you to organize a list of tasks for your project under different categories. You can then use the same database across different pages in your workspace to visualize the data in various ways. This way, you can transform a table-like dataset into a Kanban view, Timeline view, Calendar, and more.
I don't believe the complexity lies inherently in the use of databases but rather in the overwhelming number of options and variations available to users. Too many options can lead to stress, and stress can cause procrastination. I want to liberate users from the stress of excessive choices. Daino Notes' solution is to eliminate the need for users to think about databases altogether. Want to create a Task Board (i.e., Kanban)? Just insert a Task Board. There's no need to worry about linking 'data sources' from different databases. It's an opinionated design, but the core goal of Daino Notes is to let users focus on their content[10].
So there I was, aiming to create a note-taking app where:
1.The underlying data is simple plain text (with Markdown syntax)
2.That editor is a flexible block editor.
3.The app is simple and intuitive, even for the most non-technical users.
In the next sections, I'll show you how I built it.
Architecture
I briefly mentioned Qt Widgets, which represent the classical approach to creating GUI apps imperatively using Qt in C++. While powerful, Qt Widgets lack some essential modern features, in my opinion, such as declarative UI, bindings, behaviors, anchors, and more. These features enable the creation of beautiful, animated UIs simply and quickly, as seen in QML.
Note: I won't elaborate on how Qt/C++/QML works but will provide an overview of the implementations.
I've been experimenting for a while with ways to incorporate active widgets—ranging from a simple checkbox for a to-do list item to an image or a Kanban board—inside QTextEdit, which is part of Qt Widgets. However, everything I tried proved cumbersome or too complex for my skill set. Enter QML. Although relatively new compared to Qt Widgets, QML is starting to mature. It's the new approach to creating modern UI interfaces in Qt. It's possible (and recommended) to write the logic in a compiled language like C++, which is officially supported. There are also numerous bindings to other languages, such as Rust, for this purpose. The view/UI is then written in QML, and that was my approach.
After experimenting and studying QML, I realized that my block editor could be a simple ListView where each delegate (the type of item that makes up a ListView) is a Block type. I could then load different components based on the current block type (e.g., Regular Text, Todo item, Kanban, Image, etc.). While this approach is feasible with Qt Widgets, I find it much simpler and better to use QML due to the modern features I described earlier.
Qt follows the MVC (Model-View-Controller) architecture. While it might seem complicated at first, it's actually simple, straightforward, and logical. Let's take my block editor as an example. Here's a quick overview of the Daino Notes architecture:
Data → This is the content you want to display. In the case of Daino Notes, it's the content of a note stored as a plain text string in a local SQLite database.
Model (QAbstractListModel) → This is the code that manages the underlying structure of the document. It takes the data and organizes it in a way that the view can understand. I named mine BlockModel. It is written in Qt C++ and communicates with the view, which is written in QML.
View (QML ListView) → This is where the code for the rendered components resides—the elements the user actually sees, such as checkboxes, text, images, etc. The view receives its structured data from the model. It is written in QML and uses Qt Quick, a library of components provided by Qt and accessible via QML.
Within the BlockModel each item is a QObject of the Block class type. A Block can hold simple information such as blockType (e.g. Todo, Heading, Kanban, Image, etc., using C++ enums), indentationLevel, text, and more. It also contains pointers to other objects that represent more complex blocks, such as Kanban* and Image*. These objects are initially NULL and are only created if the block is of a type that requires them (e.g., Kanban or Image).
The fact that each item in the ListView is the same Block component (which loads different types of components), gives me the flexibility to put very complex and different components within the same document, to drag and drop between them, convert one type into another, etc.
The first thing I did after coming up with this idea was to test whether I could implement text selection across different blocks. I knew that successfully achieving this would enable me to build the entire editor from scratch. That was my proof-of-concept. After trying various approaches, I found an excellent blog post[11] by Shantanu Tushar along with his source code[12] which laid the groundwork for my proof of concept.
The main idea behind Shantanu Tushar's code for text selection between different blocks (or discrete delegates in a ListView) is to:
1. Instruct all the currently visible delegates to check if they should select their text by verifying whether the delegate's index falls within selectionStartIndex and selectionEndIndex,
2. when a signal selectionChanged() is called,
3. each time the cursor position changes during a mouse press-and-move event.
It worked very well. I sent a Pull Request[13] with some improvements including: backward selection, word/line selection, smooth accelerated scrolling, and more, and then continued the development.
Basics of A Text Editor
There are many interactions I took for granted when using a text editor. For example, consider the simple action of moving the cursor up or down. In most text editors, the x position of the cursor at the start of the operation is saved, and the editor tries to maintain that x position as you navigate.
I discovered that I needed to implement many operations I had taken for granted, such as cursor movements (up/down, left/right), copy and paste, undo and redo, and one of the most challenging tasks—displaying the raw Markdown when the cursor is inside a Markdown-formatted text—from scratch.
Undo & Redo
When implementing the undo/redo stack, I opted for the simplest solution: using simple structs to hold the information for each operation. For each operation (a singleAction struct), I save both the old underlying plain text and the new one. This approach might seem redundant, as a diff algorithm could be more efficient, but I chose to stick with simplicity.
Another surprising aspect I hadn't considered before is the merging of single operations (or SingleActions). For example, when you type some characters in a text editor and then press 'undo,' you expect all the characters you just inserted to be removed at once. What happens in the background is that each OneCharOperation of a SingleAction (such as inserting or removing a character) is merged into a single CompoundAction. This way, if a user types a long text and then presses undo/redo, the user doesn't have to wait for each individual character operation to undo/redo.
There are many other examples of merging single operations in Daino Notes. For instance, when you select multiple blocks and indent them, each block is indented separately, but all indent operations are merged into one CompoundAction. This means that when the user presses undo, all these blocks will unindent together. Another example is moving a task from one place in the Task Board (or Kanban) to another by dragging and dropping. In the background, this involves two different single operations merged into one: first, removing the current task, and second, inserting it in the desired location.
When dealing with advanced blocks like Kanban boards and images, I initially implemented their own undo/redo stacks as private members within their respective classes. However, I quickly realized that once these objects were deleted, their undo/redo stacks were deleted along with them. The solution was to store all advanced blocks' undo/redo stacks within the BlockModel. Here's how it looks inside the class:
Each advanced block object class defines its own SingleAction and all the other data necessary for undoing/redoing its operations. Below are the undo/redo structs for the Kanban and Image:
Markdown Under Cursor
One of my primary goals was to make Daino Notes accessible to even the most non-technical users by creating a WYSIWYG (What You See Is What You Get) editor—similar to how a Word document displays content as it will appear when printed. If a user bolds a word in Daino Notes, the word is displayed in bold.
In Markdown format (**like this**), the choice of how to display or render the text is left to the editor. Some editors do nothing, some render the word in bold but keep the asterisks, and others, like Daino Notes, hide the asterisks and display the word in bold. However, whenever the cursor is inside the formatted text, the Markdown syntax becomes visible. This approach bridges the gap between WYSIWYG (what the user sees) and showing the underlying Markdown-formatted text.
Getting this right—rendering the underlying Markdown when the cursor is inside a Markdown-formatted text—was quite challenging. For example, if the cursor is inside this bold and italicized text it will show as ***bold and italicized text***. Once the cursor moves outside of it, the text will render normally.
The main problem was determining the underlying Markdown from the cursor's position in the rendered text. I scoured the web for implementation ideas, but found nothing useful for Qt and QML, so I devised my own.
Obtaining the underlying plain text from the rendered text at the cursor position is crucial for almost any interaction in the editor. Consider this example: you have the text "The quick brown fox jumps over the lazy dog" and you want to select and copy exactly "ick brown fox jump" from the text, which corresponds to the underlying Markdown text: ick brown_** fox ***jump. How do we achieve that? Let's begin with what we know
1. We know that the entire underlying Markdown text is: The **_quick brown_** fox ***jumps*** over the lazy dog
2. We know that the selection starts at position 6 and ends at position 23, in the rendered text.
String Manipulator: A small program I developed to visualize strings
However, since each piece of text in the editor is a QML TextArea with the property textFormat: TextArea.RichText, which is represented in HTML, it's not straightforward to determine the cursor or selection positions in the underlying Markdown. This is because we're dealing with rendered text. After much experimentation, the most obvious solution emerged:
1. Obtain the HTML of the entire TextArea and convert it to Markdown.
2. Extract the HTML from position 0 to the desired position (start/end selection) using text.getFormattedText(0, selectionStartPos) and convert it to Markdown.
3. Identify the longest common prefix between the two Markdown strings.
4. This process gives us the underlying Markdown from position 0 to selectionStartPos. With this, we can determine the starting position of the selection in the rendered text. We can then repeat the process for selectionEndPos if we are selecting text.
However, there were some issues. Using QTextDocument to convert the HTML of the TextArea didn't work well due to a bug in its .toMarkdown() function, which doesn't correctly close Markdown characters. So, could we simply use a different library to convert HTML to Markdown? Not quite. It turns out that Qt uses a very unusual inline syntax for its HTML which standard HTML-to-Markdown parsers don't understand. Fortunately, the person who reported the issue also provided a solution by creating QBasicHtmlExporter , a library that converts QTextDocument's HTML into a standard HTML syntax.
With that in place, all that was left was to use an HTML-to-Markdown parser (another great open-source tool!), and voilà, we could extract the underlying Markdown text from the rendered text positions.
Well, there are still some more things to do. Now that we know the positions of the underlying Markdown from the rendered text we need to check if the cursor is within a Markdown syntax. If it is, we send the TextArea HTML with the part where the cursor is in plain text to display the Markdown, while keeping the rest rendered. If the cursor isn't within Markdown syntax, we simply show the rendered text. This is accomplished using some regular expressions (QRegularExpression) and some Voodoo spell.
I try to avoid using regular expressions as much as possible, as they are often slow and inefficient (I'm not using any in any other part of the editor's code). I usually try to break down what I need, and write a custom algorithm that is much faster. However, since cursor position changes are not CPU-intensive operations, it is safe to use regular expressions here.
Advanced Blocks
Every block must be saved in plain text, including 'advanced blocks' like Kanban boards and images. I aimed to make these blocks as useful in plain text as they are when rendered.
Syntax
The syntax for an advanced block is as follows:
{{blockType "parmater1":value,"paramater2":value}}
{{/blockType}}
1. It starts with an opening tag of two curly braces, followed by
2. The block type
3. The metadata in JSON format, followed by a closing curly brace
4. A closing tag of the same block type
That's it.
Let's take the syntax of a Kanban board as an example. In plain text, it appears as normal headers and a list of to-dos, encapsulated by the minimal syntax described earlier:
If you paste this exact plain text into Daino Notes, you'll see it rendered as a Task Board:
Even in plain text, it is simple enough to read and closely resembles a regular Markdown-formatted list of to-dos.
Drag & Drop
One of the neatest features in Daino Notes is to be able to drag and drop any block inside the editor just like drag and drop should always been - the dragged object simply "pops" from its place and the surrounding blocks make room for it, as one would intuitively expect.
If you remember, all blocks are contained within a ListView—a type of virtualized list that renders only what the user needs to see, optimizing memory usage. Can you guess the issue this might cause for dragging items? It means that if we attempt to drag a delegate while scrolling, the delegate (and thus the dragged item) could be destroyed because the ListView no longer needs to display it.
One potential solution is to increase the cacheSize of the ListView while dragging an item, which instructs Qt to allocate more memory for items not currently visible. However, this approach can be memory-intensive, slow, and inefficient.
The solution I came up with is create a copy of the block being dragged at its exact location when the user initiates the drag action. Then, display the replica, hide the original block, and drag the replica. This way, the replica can be moved anywhere without being destroyed by the ListView. No one notices it's a replica because a) it happens so quickly, and b) it's an exact replica at the exact location of the original block. As they say, magic is skillful deception—and sometimes programming is much the same.
External Drag & Drop
Daino Notes supports the external drag and drop of images from local files or other applications, such as browsers. Achieving this functionality the way I envisioned was not trivial. Initially, I wanted the dragged image from an external source to quickly hide the image provided by the OS and display a replica at the cursor's location, allowing control over size and effects. However, it seemed nearly impossible to remove or hide the OS-provided image using Qt in a cross-platform manner, so I opted for a different approach.
When an image is dragged from an external source into Daino Notes, the first step—after detecting and creating a copy of the image (more on that shortly)—is to create an invisible replica block of the image. This block acts as an object that other blocks must accommodate during the drag, giving the user the impression that the OS-provided image is interacting with the application.
In order to detect an external drag I had to jump through some hoops. Unfortunately, the QML DragEvent doesn't support retrieving image data and unlike QDropEvent, there's no direct access to QMimeData from QML DragEvent. That's annoying. Fortunately, KDE wrote their own declarative drag and drop functionality by subclassing QDropEvent and exposing it via QML and open sourced it. I had to make some changes to it - for some reason QMimeData::setData didn't set the imageData or didn't do it correctly, and I added a Q_PROPERTY image (a variable that can be exposed to QML) that was lacking. I'll publish these changes in the Daino Notes public repository where I also share common files with my previous open-source note-taking app, Notes (FOSS), to comply with its MPL license.
NOTE: Currently, Daino Notes doesn't support 'natural' drag and drop, like that provided by react-beautiful-dnd. Implementing this requires two hotspots for collision detection, which is not straightforward in QML. However, it's definitely on my to-do list.