MVC + SQL = IMF (Interface Micromanagement Frustration)

•January 31, 2012 • Leave a Comment

I’ve loved to hate Qt’s model-view controllers ever since I started using them.  While they do hide some UI nitty-gritty that I don’t really want to deal with and let me use snazzy things like selectable views and tables, their interface to my code is disgustingly baroque, and really using them requires an astounding number of completely different objects with very similar names.  There ItemViews, ItemModels, SelectionModels (which actually have nothing to do with the ItemModels), ProxyModels, and ModelIndexes.  And actually, the last time I tried to use a ProxyModel, the program crashed in Windows, but not in linux.

For example, suppose I have this UI and would like to know what text is selected:

Actual CD Phonemes Tab

this is the code I have to write: phonemeListView->selectionModel ()->currentItem ().data ().toString ();

And of course, the “currentItem” is not necessarily the same as the selection, and is determined by some arcane logic known only to the selectionModel.  If you want to get a list of all the stuff that is selected, you have to call a completely different function.  Nine times out of ten when my MVC code hasn’t worked, it’s been because I was referencing the currentItem instead of the selection, or vice versa.

Really, though, I’d gotten used to all this.  I’d gotten a recipe for how to whip up one of these things and actually make it work the way I want it to, and resolved to just stay away from the ProxyModels, and generally stuff works.  However, this was all back when I was still storing all the data in in-memory data structures full of pointers.  Now that I’m rewriting the backend of Conlang Dictionary to use an actual SQLite database instead, I have go back to wrangling with the MVC again, and it’s actually worse.

You see, Qt has anticipated that people might actually want to use their MVC architecture with SQL, so they included model classes that automagically populate themselves from SQL tables and views (due to an unbelievably terrible terminology clash, SQL views really have nothing at all to do with MVC views, just in case you weren’t confused yet).  The good thing about this is that a controller will handle the model-populating micromanagement for you.  The bad thing about this is also that a controller will handle the model-populating micromanagement for you.  All of the data micromanagement processes I’ve encoded in my existing MVC recipe will be replaced by Qt micromanagement processes, which are, of course, subtly and irritatingly different from what I’m used to dealing with.  If you ever wonder why I don’t spend more time working on CD these days, this is the reason.

The recipe I’ve been using for in-memory data storage goes roughly like this:

  • an in-memory data structure (usually a list of pointers) to store all of the actual data
  • an item model to store the data to be displayed by the view (often just the name of each object stored in the data structure)
  • an item view to place as a widget in the GUI layout and hook up to the item model
  • Whenever certain widgets on the right-hand pane are manipulated while something is selected, the selected item must be updated appropriately.  This requires some fighting with the currentItem vs. selection distinction and translating the currentItem/selection from the view into the correct item in the model and then into the correct object in the data structure.
  • Whenever any of the data in the data structure is updated (usually due to the above manipulations), the model is updated, triggering the view to be updated as well.  This obliterates the current selection in the view, which is not always what I want, so I sometimes have to do a bit of micromanagement to re-select the things that should be selected after this operation.  Fortunately, the data-modifying operations are always handled in functions written by me, so preserving the selection is easy (if rather tedious).
  • Whenever a new item is selected, the widgets on the right pane have to be populated with data from the object in the data structure.  Once again, this requires dealing with the currentItem or the selection and translating it from view to model to data structure.

But now, instead of a data structure, I have a database.  Accessing data (such as the names of all of the phonemes) from the database requires executing an SQL query at runtime, rather than simply stripping it out of an easily-accessible list of pointers.  The database is undeniably better, since it is faster, but the problem is that the SQL isn’t evaluated at all until is actually run; I won’t find out about typos or syntax errors, much less flawed logic, in the SQL statements until I actually run the program and attempt to populate the model.  The error messages from Qt’s SQLite driver are oracular enough that I really don’t want to sprinkle complicated SQL queries all over my project, so I have quarantined all of the SQL to a special class which handles all direct interaction with the database and simply provides a rational, non-SQL interface with the rest of the code, allowing me to write something sensible like database.addPhoneme ("a"); instead of a potentially error-throwing SQL query.

It’d be silly to pull data out of the database into an in-memory data structure and then manually stuff it into an item model, as it really just adds an extra step to the already overcomplicated chain of data manipulation.  Qt has special model classes specifically designed for sucking data directly out of databases, and I would be an idiot not to use them.  However, using them takes some of the jurisdiction of direct database manipulation away from my SQL containment class; I need direct access to the database to create these models, but I need the models to be the property of the same collection of GUI architecture as the item views (which are front-end UI widgets).  The compromise I’ve decided on involves creating the models with new inside the SQL ghetto and then passing them to the front-end GUI classes, who are responsible for deallocating them.

These models come in two flavors – a QueryModel, which is read-only (it only sucks data out of the database, it never writes it back), and a TableModel, which is fully read-write and functions as a model version of an entire database table.  The TableModel is actually really attractive, because I can hook it up directly to all those widgets on the right pane and have them automagically populated from the database, updated when the database is updated, and also have the database updated whenever the user twiddles the widgets.  The problem?  All of this updating happens behind my back, as it were; there is now direct congress between the capricious and unpredictable user and the database, without actually being mediated by any of my own code!  Which means I can’t perform the second step of my tried-and-true recipe – preserving the view’s selection.  If I use TableModels associated with the same database table for the right-hand pane and the list view on the left, twiddling the widgets on the right will cause an automatic, totally unauthorized update of the data model for the list view, obliterating its current selection and thereby actually resetting the right-hand widgets to their default states in a hellish feedback loop.  It may sound silly to quibble about preserving the selection, but trust me when I say this makes the application completely unusable.  It’s just another one of those software design issues that don’t seem like they’d be dealbreakers until you screw them up.

Try as I might, all of that convenient, automatic, concurrently running data management prevented me from ever being able to preserve the selection across an update.  I wound up using a simple QueryModel for the selectable list views, and turning off the automatic updates between both the right-hand pane widgets and the TableModel, and the TableModel and the database.  Now that I can manually trigger the updates, I can also manually preserve the selection, but I really hate feeling like I have to fight all of these incredibly convenient features which seemed so shiny and exciting at first blush.

Just another reason to irrationally hate SQL, I suppose.

Making basic file processing in C++ less painful

•January 23, 2012 • 2 Comments

After being singularly unproductive for a while, I got back to work on MythMuse.  As with basically everything I’ve ever written in vanilla C++ without the immensely helpful Qt libraries, the hardest, most annoying part of it is the text processing.  I love C++, and it’s kind of like my native computer language, but I hate std::string only slightly less than I hate null-terminated character arrays.  I can’t link Qt into everything, though, and it’d really just be overkill for this project.

Past experiences with releasing conlanging-related programs (mostly SCAs) on the ZBB has shown one thing pretty conclusively: everyone hates having the use the command line.  Also, Windows text editors tend to append “.txt” to text files that don’t already end in “.txt”, so it’s pretty useless to have a program read user-edited text files that end in something other than .txt, because it will only cause mass confusion among non-technical Windows users.  In an attempt to be user-friendly, I am designing MythMuse to read a text file called “options.txt” out of the current directory, from which it will intuit the location of the motifs file(s), other options files, the files it should write outputs and error messages to, and user-designed weighting schemes.  It doesn’t have to do much, so its syntax will be dead simple – tokens separated by spaces, statements separated by lines.  The command line is not necessary – it should be possible to simply create the text file(s), double-click on the program, and the output files will appear in the same directory.

By rights, the options file should be easy to parse, too, but one of the things I hate most about std::string is that it has no tokenizing function.  Even character arrays have a tokenizing function!  It’s a pain in the ass to use it for strings, though, because you have to convert the strings into character arrays, and then convert the character arrays back into strings again afterwards.  I usually just wind up setting up some int variables to keep track of the last and next matched delimiter, and walking down the string with them, copying out substrings as I go.  It’s a bit annoying, because it’s very easy to make off-by-one errors which screw everything up, but I have done this so many times now that I can open up almost any one of my non-Qt projects and just copy the code out of it.

And that made me think – if what I want to do 90% of the time that I’m reading files with vanilla C++ is tokenize each line into a vector of strings based on varying sets of delimiters, why haven’t I written a single function to do that yet?

So, after looking up the syntax for functions with variable numbers of arguments (which I’ve never actually used before), I did.  Here is the code on pastebin.  Basically, the idea is that you pass the tokenize function an open istream (so, usually a file, but you could also use it to tokenize stdin), a bool that specifies whether you want more than the minimum number of tokens, the minimum number of tokens, and each of the delimiters, in order.  The number of delimiters should be one less than the number of tokens.  If cont is true, the final delimiter will be used to continue tokenizing the line until it runs out of characters.  Every time the function is called, it will tokenize the next acceptable line in the file, and return a vector containing the tokens.  It skips blank lines, lines containing only whitespace, and lines whose first non-whitespace character is “#” (i.e. comments).  So, for example, you could do this:

vector<string> tok = tokenize (myFile, true, 3, ':', ',');
for (; !tok.empty (); tok = tokenize (myFile, true, 3, ':', ','))  {
// stuff
}

In the body of this loop, you would get the tok variable which contains the next line of tokens read from a file with this format:

P0:2,3,3
P1:4,10
P2:5,1,9,3

(This is from a process scheduling simulation written for an OS class – the numbers represent successive CPU burst times for different processes.)  It should be noted that it will also tokenize lines with only one number (or even no numbers), despite the fact that the minimum number of tokens is three – if it doesn’t find a delimiter, it returns what it did parse with no fuss.

I can use this to parse not only MythMuse’s options file, but also the motifs file, and any file that uses a simple delimiter system like this.  It could use some improvement – notably, I should probably have it store its arguments in static variables like with the character array tokenizer, so that it can be called successively without needing the arguments passed again.  But from now on, instead of vaguely copying the huge array of while loops that I normally use to parse files like this, I can just copy this shiny new function.

MythMuse and the Thompson Motif Index

•January 5, 2012 • Leave a Comment

I’ve got a recent project in C++ called MythMuse, which is meant to be a program to create randomly-generated inspiration for writing conmyths, though I imagine it could also be used for general creative writing purposes as well.  When it’s done, it’ll be quite flexible, but not especially powerful in and of itself – the program provides the randomness, but inspiration isn’t made of pseudo-random numbers.

Here’s how the idea came about:

When I was in high school, I developed a kind of obsession with British fairy folklore, once I discovered that popular depictions of fairies are not really accurate, and most of them are more like weird-looking tricksters than magical paragons of niceness in the form of tiny teenage girls with wings.  At some point I discovered this book (which I distinctly remember being out-of-print and nigh unpurchasable at the time), and managed to get it as a birthday present somehow.  Anyway, the most interesting thing about this book is that all of the entries are indexed in the back in a kind of Dewey-decimal system for categorizing folktales called the Aarne-Thompson Tale Type Index.  So, for example, there is a classification for Cinderella-like stories, and a classification for Rumpelstiltskin-like stories (which I believe is Type 500), and so on.  There’s some argument that this isn’t the best classification system for folktales (What if Cinderella is a boy?  What if there aren’t any evil stepsisters?  What if it was an evil aunt, not an evil stepmother?  What if the coach turned into a butternut squash?  Is it still Cinderella?), but the type index is accompanied by a motif index which doesn’t have these problems – instead of indexing the whole story, it indexes individual story elements.  So, there will be a motif for glass slippers, for transmogrified pumpkins, for enchantments that break at midnight, for people being identified by their ability to wear certain magical shoes, etc.  This is all me making stuff up (Cinderella was not actually terribly involved in this book), but from the real index you get, e.g.:

B81: Mermaid.
B81.2.2: Mermaids tear their mortal lovers to pieces.
B81.3.1: Mermaid entices people into water.
B81.7: Mermaid warns of bad weather.
B81.13.2: Mermaid is washed up on beach.
B81.13.4: Mermaid gives mortals gold from sea-bottom.
B81.13.11: Mermaid captured.
B81.13.13: Mermaid rewards man who puts her back under water.

(Clearly, Disney are lying liars who lie.)  Anyway, it’s missing a lot of intermediate classifications, but you can see that there’s a method to it – B81.13 is probably about personal mermaid encounters with humans, and so forth, and all of the B motifs are about animals of various kinds.  My idea was, if you had the whole, comprehensive index, you could pick a random collection of these motifs and get an inspiration for a folktale that incorporated all of those elements.  Maybe there would be a mermaid and a glass slipper.  Make that one work!  Anyway, if, instead of using this motif index, you made your own based on the kinds of motifs that tend to pop up in your conculture’s mythology, you could use this method to write stories that feel like they are all from the same tradition.  For example, to test the basic algorithm I made the following tree:

1 Shapeshifter
1.1 Shapeshifter impersonates animal
1.1.1 Shapeshifted animal distracts hunters
1.1.1.1 Shapeshifted animal leads hunters to their deaths
1.1.1.2 Shapeshifted animal leads hunters to game
1.1.2 Shapeshifted animal steals food
1.1.3 Shapeshifter only killable in animal form
1.1.4 Real animals detect imposter
1.1.5 Shapeshifted animal understands human language
1.1.6 Shapeshifted domestic animal aides masters
1.2 Shapeshifter impersonates murdered human
1.2.1 Shapeshifter impersonates dead community leader
1.2.2 Shapeshifter impersonates dead religious leader
1.2.3 Shapeshifter identified when body is located
1.3 Shapeshifter impersonates sleeping/absent human
1.4 Shapeshifter adopts imperfect form
1.4.1 Shapeshifted human missing facial features
1.4.2 Shapeshifted human missing identifying characteristics
1.4.3 Shapeshifted form is crippled
1.5 Shapeshifter marries human
1.5.1 Half-human child of shapeshifter
1.5.1.1 Child of shapeshifter has limited shapeshifting abilities
1.5.1.2 Child of shapeshifter understands animal language
1.6 Shapeshifter impersonates foreign trader
1.6.1 Shapeshifted trader sells magical items
1.6.1.1 Items traded by shapeshifter do not function when stolen
1.6.2 Shapeshifted trader cheats humans
1.7 Shapeshifter abducts child
1.7.1 Human child raised by shapeshifter

The principle behind this classification system is simple: Every motif classified as x.y includes motif x, but with extra detail.  Every time you add another detail, you add another branch.  So, when choosing randomly, you follow branches and get clusters of similar motifs (but the clusters themselves aren’t necessarily closely related), and you have to include motif x if you included motif x.y.  And there will be options to change the random weighting or favor certain trees, and make different motifs count for a greater percentages of the story, keep certain sets of motifs from showing up in the same stories, and so forth.

But, in order to test the thing thoroughly, I needed a large system with multiple trees, since you’re not going to get anything interesting if you have only one available subject.  (This is what I meant by saying that inspiration is not made of pseudo-random numbers.)  I could spend a long time making a complete motif index for my conculture (of which the Shapeshifter tree is a part), but I just want to play around with the tool right now, and confirm that it works.  If it doesn’t work well enough, I probably won’t use the motif index system anyway.  It’ll probably work fine, but it’s the principle of the thing.

So, I went to see if I could find the entire Thompson motif index anywhere online.  It turns out that you can.  And it is massive.  It has motifs from many, many, different folklore traditions; it has “Brahma as creator” next to “Angel Michael created from fire.”  It has Coyote listed next to the Judeo-Christian Devil.  If you really wrote a myth based on a random selection of these tropes, you’d probably get a massive hodgepodge that would probably seem most at home in a webcomic.

It also doesn’t follow my idealized classification system very well.  It does have trees like the mermaid tree, but it also has trees that are semantically obvious as trees, but which aren’t structurally marked in a way that MythMuse can read them correctly.  For example, still sticking to the A section, motif A110 is “Origin of the gods” and A120 is “Nature and appearance of the gods”, and all the motifs in between are sub-motifs of A110, while the motifs immediately following A120 are sub-motifs of A120.  So to actually form it into a tree, you’d want to reclassify them as A.1.1 and A.1.2 (A.1 being about gods in general), with the intermediate motifs being A.1.1.1, etc.  But this isn’t consistent either, because A130 is not a new heading, and the A13X motifs are more sub-motifs of A120.  This is easy for a human to figure out, but it would have to be completely reorganized for the computer program.  And it’s a bit repetitious – early in the As, there’s a section for the Creator, which including motifs about the creation of the universe, and then later there is a section about the Universe that itself has a section for the creation of the universe.

It’s certainly possible to reorganize the whole index into a stricter tree structure, but it would take ages.  I’ll probably just steal a few trees from the parts that look most interesting, and build up a relatively broad set of trees for testing purposes.  Funny how the thing that inspired the program in the first place won’t actually work well with it.

First Post

•January 3, 2012 • Leave a Comment

Pretty much what it says.

This is a blog for musing on my hobbies, which are generally programming, languages and linguistics, and constructing fictional worlds.  It’ll probably be fairly technical, but I don’t aim to be confusing.  I often find that if I try to explain a how a project should work as if I were talking to someone who had no idea how it was supposed to work or what it was supposed to do, I start turning up all kinds of overlooked bugs, so this blog will serve as an outlet for some of that.  Regardless of whether anyone else reads or comments, I figure keeping this up will keep me moving.

An interesting design issue occurred to me while setting the theme up.  WordPress has a set of theme “features” that you can check or leave blank, which change the set of themes you’re looking at when browsing.  It’s interesting (to me) because it’s not immediately obvious whether checking a feature has the effect of restricting all your results to only those that have that feature, or expanding the results to include any theme that has that feature, regardless of what other features it has.  That is, will checking both “blue” and “black” show me only themes that have both blue and black, or all the themes that have at least one of either?  With the color options, the former seems slightly more likely to me, but other categories are explicitly exclusive (one column vs. two column vs. three column); if I check two-column and three-column both, I hardly want to see only the results that have both features (which I’m going to assume is 0).  I ran into the issue in a program I wrote, and I wound up doing it a different way – all the options are checked initially, and you uncheck options to exclude elements with certain features.  What you want depends entirely on what you’re listing and how exclusive your categories are, but I think it’s properly polite and user-friendly to explain how works in any case.

On another note, WordPress’s color categories failed to have a distinction for one of the few things that really matter to me in terms of color: whether the blog has dark text on a light background, or light text on a dark background.  I guess it’s probably obvious which I prefer, now that I’ve got this reasonably set up.

I have actual non-Wordpress-related things to talk about later, but this will do for now; I have to see if I can get a stuck window fixed.  It’s not Chicago, but it’s still pretty cold here in January.