blog Markdown Virtual Table: Implementing SELECT

Tags:
django rust sqlite

I have been continuing my earlier virtual table prototype and it is slowly coming along. I uploaded a version to codeberg while I was debugging python packaging. I do not have it quite working right, but I think I will be using setuptools-rust for now (pr ).

This post is partly to help document what I have discovered over time so will not be complete code examples in every case. In the future I will be uploading the project to Codeberg and later to PyPi.

Most of my rust code is likely not great, so would appreciate a mentor!

Django Updates

While digging into some of the Django classes a bit more, I found some methods under the schema editor I could use. This results in a more full table definition generated by Django itself that we can use.

@receiver(connection_created)
def define_tables(connection: DatabaseWrapper, **kwargs):
    with connection.schema_editor() as schema_editor:
        for app_model in apps.get_models():
            # We only want to process models that require our specific vendor
            # otherwise we assume it's a regular sqlite model
            if app_model._meta.required_db_vendor == connection.vendor:
                # We can use django to generate a sqlite ready create table statement
                sql, _params = schema_editor.table_sql(app_model)
                # We use our base directory setting plus the models label to
                # as the search path for our markdown files
                model_path = settings.CONTENT_DIR / app_model._meta.label
                create_table = f"""
        CREATE VIRTUAL TABLE IF NOT EXISTS temp.{app_model._meta.db_table}
        USING {connection.vendor}(schema='{sql}', path='{model_path}');
        """.strip()
                schema_editor.execute(create_table)
class MyModel(models.Model):
    ...

    class Meta:
        required_db_vendor = "markdowndb"
        managed = False

In general, there is not much that Django needs to know about our specialized database.

Implementing SELECT in Rust

xConnect

One of the first things we need to implement, is the xConnect method. This is where we parse our schema and path arguments

Internally, most things use numeric IDs, but our driver will be using column names. We can use the sqlite3-parser crate to parse out the schema and give us a mapping that. Currently I’m using a Vec<String> but I may do a different structure later that maintains more type information.

#[derive(Debug, Clone)]
pub struct Schema(Vec<String>);

impl Schema {
    /// Given a schema declration, return a mapping of columns
    pub fn parse(schema: &str) -> Result<Self> {
        ...
    }

xOpen, xFilter, xNext

xOpen is called for many of our queries. This is where we create an instance of our cursor object to track where we are in a SELECT statement.

xFilter is part of the start of our SELECT operation. This is where we would figure out what our cursor needs to point to and do filtering. In our case, we create a walker object that will search our path for all *.markdown files that we can find.

xNext will then be called each time we’re ready to read the next row.

Our basic operation looks like this now.

  • xOpen to create an instance of our Cursor
  • xFilter to get our list of files and open it
  • xNext each time we advance and open the next file

xColumn and reading data

Instead of returning a row at a time, the virtual table driver returns a column at a time. Our xFilter and xNext are used to point us at the current markdown file. Within the xColumn method, we need to interpret the request for a numeric column, to a name field that we understand.

    fn column(&self, ctx: &mut Context, i: c_int) -> Result<()> {
        // Given our schema, convert the numeric column index to a
        // key that we can look up
        let key = self.vtab().schema.index(i as usize);
        // Then use this key to look up the specific field in our open
        // file handle
        let value = self.handle()?.get(key);

        // Then lastly we set this value in a sqlite context that will be
        // Returned to the client
        ctx.set_result(value);
    }

Looking up a value in frontmatter

I am currently using the grey_matter crate to parse a files markdown frontmatter and return it in a format that we can use.

Most of the time, any of our key lookups will be for the markdown frontmatter itself, but there are also a few internal keys that I am implementing.

impl Post {
    pub fn new(path: PathBuf) -> Self {
        let contents = fs::read_to_string(&path).expect("Read File");
        let mut matter = Matter::<YAML>::new();
        matter.excerpt_delimiter = Some("<!--more-->".to_string());

        Self {
            path,
            parsed: matter.parse(&contents),
        }
    }
    pub fn get(&self, key: &String) -> Option<Pod> {
        // First we check to see if we have a valid value in our parsed ata
        match &self.parsed.data {
            Some(Pod::Hash(hash)) => hash.get(key).cloned(),
            _ => None,
        }
        // Otherwise, we look for a builtin value
        // I am using inline functions here for readability
        .or(match key.as_str() {
            // Full Path to the markdown file
            "path" => self.pod_path(),
            // Relative directory path to our base directory
            "dir" => self.pod_dir(),
            // date of the file in case there is no date in frontmatter
            "date" => self.pod_date(),
            // This is the post body of document
            "content" => self.pod_content(),
            // A useful slug which is the filename without extension
            "slug" => self.pod_slug(),
            // This is anything before a <!--more--> tag
            "excerpt" => self.pod_excerpt(),
            // This returns the full frontmatter data
            "metadata" => self.parsed.data.clone(),
            // Returns the inode of a file as a potentially useful primary key
            "inode" => self.pod_inode(),
            _ => None,
        })
    }
}

The grey_matter crate uses a custom Pod type to represent a dynamic structure (similar to a Value type in a json crate), so in our xColumn we need to convert this to a type that sqlite can handle.

match value {
    // Send this back as a json list, but we might do a separated
    // list or something else later
    Some(Pod::Array(array)) => {
        let json_str: Value = Pod::Array(array).deserialize().unwrap();
        ctx.set_result::<String>(&json_str.to_string())
    }
    // Hash is easiest to send back as json
    Some(Pod::Hash(hash)) => {
        let json_str: Value = Pod::Hash(hash).deserialize().unwrap();
        ctx.set_result::<String>(&json_str.to_string())
    }
    // Anything else we can typically use regular types
    Some(Pod::Boolean(bool)) => ctx.set_result(&bool),
    Some(Pod::Float(float)) => ctx.set_result(&float),
    Some(Pod::Integer(integer)) => ctx.set_result(&integer),
    Some(Pod::Null) => Ok(()),
    Some(Pod::String(string)) => ctx.set_result(&string),
    None => Ok(()),
}

Back in Django

Now that I have most of the initial bits implemented in my driver, I can now make a model like this in Django.

from django.db import models


class FrontmatterModel(models.Model):
    inode = models.PositiveIntegerField(primary_key=True, editable=False)
    path = models.FilePathField(unique=True, editable=False)
    slug = models.SlugField()

    # Full text body. Might use a MarkdownField in the future
    content = models.TextField()
    # Anything before the <!--more-->
    excerpt = models.TextField(blank=True)

    metadata = models.JSONField()
    # I could also rename this if I wanted
    # frontmatter = models.JSONField(db_column='metadata')

    # as implemented in my Post::get in rust, any lookups will
    # check the frontmatter first, and will fallback to other
    # internal values if they exist.

    class Meta:
        managed = False
        required_db_vendor = "markdowndb"

Future: Implementing DELETE, INSERT, DELETE

I am planning on adding in the rest of these later, and while I have made some initial progress, I will likely document those in a future post.

Similar to my demo project, I will likely upload my work-in-progress version to Codeberg in the near future, though it will likely be more time until I’m ready to upload something to PyPi.