By: Brian Rosenthal and Zvi Boshernitzan
Copyright 2006 Brian
Rosenthal and Zvi Boshernitzan
Reducing the size of sourcecode is essential to create applications that are quicker to develop and easier to manage, and efficient data-tier programming is one of the most important factors to reducing code size.
In this article, we describe three techniques to reduce the size of data-tier programming within an application.
We describe the following:
(a) A database abstraction syntax, for encapsulating the data tier.
(b) A structured method of building database queries to avoid string parsing.
(c) A methodology of inference-based database programming, whose goal is to eliminate most of the code to manage database entities.
The strategies described in this article are designed for PHP 5, but they are described in a sufficiently abstract way intended to be applied to other languages as well.
This article assumes a familiarity with database programming and also exposure to principles of data modeling – normalization concepts and ER diagrams. It also assumes a basic knowledge of PHP.
1. Javascript Object Notation is used to specify associative and sequential arrays:
Associative Arrays are specified with curly-braces. Thus, the following is a syntax for an associate array:
{‘first_name’:’Zvi’, ‘last_name’:’Boshernitzan’}
Sequential Arrays are specified with square-braces. Thus, the following is a syntax for an sequential array:
[‘one’, ‘two’, ‘three’]
The concepts in this article build on the following general concepts in data modeling, commonly used to organize and communicate the structure of the data tier of an application.
1. Entities
We will use the concept of entity type in this article as a type of data used by the application. Each entity type may have persistent instances stored by the application, that we will call entities.
For example, a book is an entity with entity type “book”.
2. Schemas
A schema is set of definitions of entity types. For example, a schema might look like this:
authors :=
first_name char(32)
last_name char(32)
books :=
title char(128)
Notice that none of
the schemas above have “id” as a field.
We will assume that every entity has a field called “id”, probably
implemented as an unsigned integer.
We will build up a signature for a class “db” which has a useful syntax for database programming.
Please note: the implementation of this would use PEAR::DB underneath, and would not be meant to replace PEAR::DB. The purpose of this is mainly to simplify syntax.
We define a “where spec” is an associatve array which specifies conditions on a SQL statement.
For example, consider the following dictionary:
{‘author_id’:1, ‘last_name’:’Rosenthal’}
This is the “where_spec” for:
‘author_id = 1 and last_name = ‘Rosenthal’
For the sake of simplifying common use-cases, we allow the following alternative syntaxes:
1. Just a number (assumes you mean “id”)
db::where(n) to be ‘id = 3’.
2. Just a string (assume you mean a SQL condition)
db::where(“last_name like ‘Smith%’ ”)
3. A sequence of conditions:
db::where([‘id > 5’, ‘id < 10’])
1. db::insert($table_name, $dictionary)
Inserts an entity into the database. Returns the id of the new entity.
Example:
db::insert(‘authors’, {‘first_name’:’Brian’, ‘last_name’:’Rosenthal’})
2. db::update($table_name, $dictionary, $where_spec)
Updates the entities in the specified table with the dictionary, matching the conditions in the where_spec.
Example:
db::update(‘authors’, {‘first_name’:’Brian’}, 3)
“Updates the first name of the author with id = 3 to be ‘Brian’”
db::update(‘authors’, {‘first_name’:’Brian’}, {‘last_name’:’Rosenthal’})
“Updates the first name of the authors with the last_name as ‘Rosenthal’ to be ‘Brian’”
3. db::delete($table_name, $where_spec)
Deletes entities from the specified table matching the conditions in the where-spec.
Example:
db::delete(‘authors’, 3)
“Deletes author with id = 3”
db::delete(‘authors’, {‘last_name’:’Rosenthal’})
“Deletes all authors with last-name Rosenthal.
4. db::update_or_insert($table_name, $dictionary, $where_spec)
Tries to update the entries as above, with the matching where_spec condition, and inserts the entry on failure.
Parameters: $table_name, $where_spec
1. db::count($table_name, $where_spec)
Returns the number of rows in the specified table matching the conditions in the where_spec
Example:
How many books are there with author_id = 1?
db::count(‘books’, {‘author_id’:1})
2. db::exists($table_name, $where_spec)
Returns true if there exists a row in the specified table matching the where_spec. Otherwise, returns false.
Example:
Are there any books with author_id = 1?
db::exists(‘books’, {‘author_id’:1})
Parameters: (a) sql OR (b) $table_name, $column_name, $where_spec
1. db::string(...)
Returns the first field from the first row of the query, or null if there are no rows.
Example:
db::string(‘select title from books where id = 3’)
db::string(‘books’, ‘title’, 3)
2. db::strings(...)
Returns a sequence of strings, the first field of each of the result-set.
Example:
db::strings(“select id from customers where last_name like ‘Smith%’ ” );
db::strings(‘customers’, ‘id’, “last_name like ‘Smith%’ ”);
3. db::sum(...)
The sum of the specified column, across rows that match the condition.
Example:
db::sum(‘line_items’, ‘total’, ‘sales_order_id = 3’)
Parameters: (a) SQL or (b) table_name, where_spec
1. db::object, db::assoc, db::row
Returns a structure containing the first row of the result-set.
Examples:
“Get customer id 3 as an object”
db::object(‘select * from customers where id = 3’);
db::object(‘customers’, 3)
... OBJECT:{‘first_name’:’Zvi’, ‘last_name’:’Boshernitzan’}
“Get customer id 3 as an associative array”
db::assoc(‘customers’, 3)
...{‘first_name’:’Zvi’, ‘last_name’:’Boshernitzan’}
“Get customer id 3 as an sequence of fields”
db::row(‘customers’, 3)
...[3, ‘Zvi’, ’Boshernitzan’]
2. db::objects, db::assocs, db::rows
Returns a sequence of structures containing the rows of the result set.
Example:
“Get a sequence of sales orders for customer id 3”
db::assocs(‘select * from sales_orders where customer_id = 3’)
Many people program SQL as they might treat HTML – as a big string that they build up and then execute. There are a number of problems with this approach. Namely, (1) it makes it more difficult to protect against SQL injection, (2) it makes it more difficult to have many different criteria build up vastly different SQL (mainly because you might add the same inner join more than once).
The following describes a way of treating SQL as a structured object, until execution that has worked elegantly for the authors.
The idea is that you instantiate a new “dbquery” object, alter it in different ways, and then export it to either a string or an associative array.
The standard use case is initializing a new dbquery instance from an associative array:
$q = new dbquery({‘select’:[‘firstname’, ‘lastname’]
, ‘from’: ‘customers c’
, ‘ijoin’:{‘s’:{‘table’:’sites’, ‘on’:’s.id = c.site_id’}}
, ‘where’:[“c.name like ‘%brian%’”, “c.id > 10”]
, ‘limit’:10
, ‘offset’:5
});
Methods that add or set parts of the SQL statement:
$q->add_select(…), $q->set_select(…)
$q->set_from(...)
$q->add_ijoin(…)
$q->add_lojoin(…)
$q->add_where(...)
$q->set_limit(...)
$q->set_offset(...)
$q->add_order_by(...),$q->set_order_by(...)
$q->add_group_by(...), $q->set_group_by(...)
$q->load_dict({‘select’:’c.address_id’, ‘where’:…})
Methods that allow the dbquery to be exported:
$q->to_sql()
$q->to_dict()
Get a set of customers:
$q = new dbquery({‘select’:[‘firstname’, ‘lastname’]
, ‘from’: ‘customers c’
, ‘where’:[“c.name like ‘%brian%’”, “c.id > 10”]
, ‘limit’:10
, ‘offset’:5
});
// only include the site ijoin if the condition is specified.
if ($site_name) {
$q->add_ijoin({‘s’:{‘table’:’sites’, ‘on’:’s.id = c.site_id’}});
$q->add_where({‘s.name’:$site_name}
}
$customers = db::assocs($q->to_sql())
A lot of information is stored in the database schema. The idea is to use that information whenever possible. We’ve been using many of these practices for a long time.
For example, take the following schema:
products (foreign keys for category and brand)
keywords (many-to-many relation between product and keyword)
If you were writing classes to manage these database entities, you really should get a lot of functions for free.
products::get(product_id)
products::create({‘name’: ‘12 inch valve’, ‘category_id’:2, ‘brand_id’:100})
products::mget({‘category_id’:2})
products::mget_by_category(...)
products::mget_by_brand(...)
products::mget_by_keyword(...)
products::map_keyword(product_id, keyword_id)
keywords::mget_by_product(...)
products::get_name(product_id)
products::colnames()
We asked the following question:
What if we could write a class that would allow all of these functions to exist without ever being written?
1. Call handler override.
PHP 5 allows a class to override the default call handler, in the case that a function is not defined. Therefore, it is possible to accept function names that use words that are specific to the entity.
2. MYSQL reflection
The following information is useful to obtain from the database:
1. column names
2. column types
3. foreign key detection
4. required versus not required columns
5. primary key name
6. unique column names
7. enums for the entity
3. Singleton / factory semantics
Because excess database queries may slow down an application, it makes sense to use singleton classes for the management of database entities.
Singleton semantics is the idea of using instantiated classes, but only having one class per page request.
This is implemented by using a function (say, we call it “mg” for management) which itself keeps a static variable which is an associative array of classes.
So, mg(‘products’) would instantiate a “products” class the first time it is called, but the second time it is called, it would just return the same instance.
4. Autoloads
The autoloads feature of PHP5 is particularly useful with this architecture. PHP 5 exposes a special function called “__autoload($classname)” which is called if a class cannot be found.
This function can check for the class in a particular directory so that you don’t have to include every class in each PHP script; they can be dynamically bound at runtime.
We have found this fast and efficient, and a substantial savings in our own coding time.
The idea here is to have a class that is inherited by other entities that may customize the implementation of the class. The dbentity class contains the logic for common database queries and manipulation.
class dbentity {
....
}
A few notes:
1. All functions of the dbentity base class can be overwritten, and there are some functions which are more specifications than functions.
2. There is no reason at all to define a class for any table. The idea here is to have mg(‘products’) generate an instance of a class for which function calls use the table products. If the behavior is very predictable, there should be no need to define the products class at all.
However, you may want to specify relationships, or have custom functions so the idea is that you would define a class:
class products extends dbentity {
function references() {...}
function mappings() {...}
function sort_key() { ... }
function custom_function(...) {...}
}
3. Most functions that accept a single dictionary can also accept a list of arguments, and create a dictionary intelligently.
For example, create takes a dictionary, and could look like all of these:
Just a list of
arguments, key,value,key,value...
create(‘first_name’, ’Brian’,
‘last_name’, ‘Rosenthal’, ‘hometown’, ’
Just a dictionary:
create({‘first_name’:’Brian’,
‘last_name’: ‘Rosenthal’, ‘hometown’:’
A combination:
create({‘first_name’:’Brian’},
‘last_name’, ‘Rosenthal’, {‘hometown’:’
Takes an id or associative array with a set of unique columns and returns an object with the entity or null if no entity exists matching the condition.
Example
Get product id 3
mg(‘products’)->get(3)
Args can pass in a dbquery structure that would include joins:
get(id, ‘dbquery’, {ijoin:...})
Create takes an associative array and returns the inserted id.
It also:
(a) Enforces referential integrity
(b) Takes care of any “sorting” requirements (assuming there is a sort column)
(c) Validates the arguments
(d) automatically adds mappings if specified
(e) Provides hooks for create_validate...
Example: Create a product
$product_id = mg(‘products’)->create({‘name’:’Valve’})
Takes an id and an associative array and executes an update statement with the args on the id.
This can be called as follows:
mg(‘products’)->edit(3, ‘name’, ‘Valve’);
mg(‘products’)->edit(3, {‘name’:‘Valve’});
mg(‘products’)->edit({‘sku’:’RB001’}, ‘name’, ‘Valve’);
(notice the unique column sku can specify a product, also, but this requires a unique constraint on sku)
Deletes the entity.
It also:
(a) Fails if any dependent entities exist.
(b) Deletes any mapping relationships
(c) Adjusts sort keys
(d) May cascade, if specified.
Example:
mg(‘products’)->delete(3);
Alternate syntax:
mg(‘products’)->delete(‘name’, ‘Valve’)
The point of the function “references” is to specify the entity-type of foreign references within the table.
The default implementation is returning an empty array. You override this:
function references() {
return array('brand_id' => 'brands', 'category_id' => 'categories');
}
Foreign references specify columns in other entities that reference this entity type.
Example:
// here, customers is referenced by sales orders, in the “customer_id” column.
class customers extends dbentity {
function foreign_references() {
return array(
'sales_orders' => array('sales_orders','customer_id')
);
}
...
}
The point of the function is to specify the sort keys that should be re-sorted on insertions, updates, and deletes.
This function specifies that there is a column “sort_order”, and that products are sorted for a particular category_id. (so within each category_id, the “sort_order” of products will be numbered 0,1,2,3)
Example:
// there is a sort key on “sort_order” for particular categories
class products extends dbentity {
function sort_key() {
return array('sort_order', array('category_id'));
}
...
}
A specification-type function defined by the class, if there are mapping relationships.
function mappings() {
return array(
'related_products'
=> array('product_relations', 'from_product_id', 'to_product_id', 'products')
);
}
specifies an array of column names for which changes are logged.
Function specified by the instance class which may rename inputs.
This allows arguments to be passed in with multiple names.
“category” or “category_id” should really mean the same thing.
Rename keys is called during most function calls, so its main purpose is to be specified if necessary.
function rename_keys($arr) {
rba::rename_keys($arr, {‘category’ => ‘category_id’})
return $arr;
}
Function specified by the instance class which takes an id or object and returns a string that may be printed to the user.
As described above, PHP 5 allows a class to specify a call handler to resolve all functions not defined.
function __call($m, $args) {
}
The following patterns can detected and handled:
“get_...”
products::get_name(...) [get_field...]
products::get_category(...) [get_entity]
categories::get_products(...) [get_entities]
“mget_by_...”
products::mget_by_category(...) [mget_by_field]
“get_by_...”
products::mget_by_name(...) [get_by_unique_field]
“has_...”
products::has_keyword(...) [has_entity]
products::has_keywords(...) [has_entities]
“n_...”
categories::n_products(...) [n_entities]
“map_...”
products::map_keyword(...) [map_entity]
“set_...”
products::set_keywords(...) [set_entities]
“unmap_...”
products::unmap_keyword(...) [unmap_entity]
“unmap_all_...”
products::unmap_all_keywords(...) [unmap_all_entities]
“..._name” / “..._id”: enum lookups
We expose the following functions:
mg(‘products’)->colnames()
mg(‘products’)->cols() (which returns more information than just the names)
mg(‘products’)->enum_get_as_dict
mg(‘products’)->enum_name_to_id
mg(‘products’)->enum_id_to_name
mg(‘products’)->required_colnames
mg(‘products’)->unique_colnames
mg(‘products’)->col_type
The idea behind this programming is to have one class that will allow you to have a programming interface to all of your data entities that is readable, flexible, and powerful, while writing the minimal amount of code for each entity.
We have described three ways of programming databases which may help to structure and reduce your data-tier. Please send comments to data_tier_article AT robocommerce.com.