OpenCabinet was born out of frustration with the difficulties involved in storing data structures in relational database tables. While there are plenty of object persistance solutions out there, we do not feel comfortable with the complexities involved in using them, especially when there are requirements for efficient storage and fast access paths.
We hope that by limiting ourselves to very specific, yet versatile data structures (lists, sets and maps of known data types) we can create a system that is easy to use and still very powerful in terms of storage efficiency, scalability and query performance.
Imagine that we have to implement an issue tracking system. We might store the bug reports in a relational database table like the following TBL_BUGS (inspired by the schema used in Bugzilla):
| Field | Type | Remarks |
| bug_id | int | The bug ID |
| bug_severity | varchar(64) | The bug severity level |
| bug_status | varchar(64) | The bug's status |
| product_id | int | Foreign key into products table |
| assigned_to | int | The current owner of the bug (user account number) |
| reporter | int | The user who reported this (user account number) |
| creation_ts | datetime | The time of the bug's creation. |
| description | text | The bug report and follow-up discussions |
As it turns out, a single bug can affect more than one product. Also, we may want to assign more than one person to it. How do we handle that? The product_id and assigned_to columns can unfortunately store only a single value. Database 101 tells us that we need child tables.
A natural approach is to create TBL_BUGS_TO_PRODUCTS with the bug_id as a foreign key to link back into TBL_BUGS which holds one row for every every affected product per bug.
| Field | Type | Remarks |
| bug_id | int | The bug ID |
| product_id | int | Foreign key into products table |
The same goes for assigned_to and TBL_BUGS_ASSIGNED.
We prefer an alternative approach: The schema of the child tables is the same, but instead of using the bug_id as a foreign key we introduce an 'artificial' foreign key (the set_id) that has nothing to do with the bug_id. Since the set_id has to be stored in the parent table TBL_BUGS we keep the columns product_id and assigned_to. The meaning of their contents has changed, though. Instead of storing a product's id, product_id now contains a set_id for TBL_PRODUCT_SETS.
| Field | Type | Remarks |
| set_id | int | The set ID |
| product_id | int | Foreign key into products table |
Accordingly, we also have TBL_USER_SETS instead of TBL_BUGS_ASSIGNED.
For reasons that will become clear very soon, we also limit our activities with the child tables to only inserting new data. Once a row has been committed to TBL_PRODUCT_SETS we will never update it.
The main advantage of this is that a set_id (and its associated data) can be reused: All bug reports that are assigned to the same group of people can share the same value for assigned_to.
Depending on your application, this may lead to a garbage collection problem. On the other hand, if you are looking into implementing a system that keeps track of previous versions of its data, you never want to delete anything anyway.
OpenCabinet makes sure that the search for existing data is fast (since this data never changes, it can also be cached easily), and that new rows in the child tables are created and committed before handing out the set_id.
TBL_BUGS stores the bug_severity and bug_status in varchar columns. The strings used there are examples of what we call an IdString: They come from a fixed (or at least very finite) set of possible options (such as Critical or Minor) and it is acceptable to impose a length limit on them. In the Java programming language, we would probably use a String constant or an Enumeration to represent them. Since there are only a few possible values, a great number of bug reports share the same IdString.
The opposite scenario for a string is a free text field, such as description: The content can be anything and quite long, and it is very unlikely that two separate bug reports share the same description (Although, if we support versioning, it is very likely that two versions of the same report have descriptions that are identical or only slightly different).
In OpenCabinet, all IdStrings are encoded as integers. It also keeps track of the mapping, so that you do not have to do it in your program.
We have already seen in the example of TBL_PRODUCT_SETS how to store a set of integers. Let us investigate some other collection types.
The difference between a list and a set is that elements can occur more than once and that there is an ordering of elements that needs to be preserved. On a relational database table, this can be implemented by adding a position column:
| Field | Type | Remarks |
| list_id | int | The list ID |
| position | int | the list index of this element (0-based) |
| product_id | int | Foreign key into products table |
A map is like a lookup table that associates a key to a value (in this case an integer). OpenCabinet requires that the key be an IdString, which seems like a natural fit anyway, so that the key can also be representated as an integer:
| Field | Type | Remarks |
| list_id | int | The list ID |
| key | int | the key (an IdString) |
| product_id | int | Foreign key into products table |