A Little Database Theory - Part 3

At last! My original intention was to write an article about the difference between relational calculus and relational algebra. In order to get here we needed to see what databases looked like before the relational model and what the relational model actually means. Now we can get to the nub of the grist of the nitty, and/or gritty.

Lets look at a simple mathematical problem: quadratic equations. Just in case you can’t remember back to your 1st week at big school, a quadratic equation is an equation of the form:

ax² + bx + c = 0

There are lots of approaches that you can take to solve these equations. The Egyptians didn’t have algebra and so they solved these problems in a geometric way, by drawing. I have a fantastic book on the history of numbers which explains how to do this. If I can ever find it I will post the explanation. It also explains how Sumerians did crosswords and Sudoku (that’s actually a lie, it does nothing of the sort). You might have learnt how to solve these equations by factorisation (I never got the hang of that) or by using a formula:

quad

Clearly, in the cat-skinning department, the number of solutions > 1. Well, the same goes for set-theoretic problems. The 2 cat-skinning approaches I learnt (yonks ago), vis-a-vis set theory, were relational algebra and relational calculus.

As I mentioned in the last post, the database world has pretty much decided on a language called SQL (Structured Query Language) to process relational data. This language is a practical implementation of some features of both the algebra and calculus.

Relational Algebra

I find this the easiest approach to conceptualise, largely because the notation is much closer to English (well, if not English, then at least closer to the kind of algebra that we all understand).

Just as normal algebra combines and manipulates operators symbolically, so does relational algebra. The relational algebra operators are things like:

SELECTION – σ

PROJECTION – π

RENAME – ρ

UNION – ∪

CARTESIAN PRODUCT – x

Selection is about finding stuff. To find Tom in the People relation we do:

σ _{Name = Tom}(People)

Projection is about choosing the attributes (columns) that we want to see.

π _{Name, Age, Height} (People)

It is (sometimes) easy to see how relational algebra maps to SQL statements:

π _{Name, Age, Height} (σ _{Name = Tom}(People))

Becomes:

SELECT Name, Age, Height

FROM People

WHERE Name = Tom

Very often, as in the example of finding the employer for each person, there are multiple steps which have to be done in a particular order. Think back to the last post where we identified who worked for whom. This took several steps:.

Take the Cartesian Product of People and Employers (x)
Ignore tuples (rows) where the People(Employer) is not the same as the Employers(Employer) (SELECTION σ)
Choose the columns that we want from the results (PROJECTION π)

The key point here is that relational algebra is procedural. In this example there are 3 steps and they have to be carried out in the correct sequence.

Another important point is that with relational algebra you specify what you want and how to get it.

Relational Calculus

Relational calculus is a formal logic. There are many examples of formal logics, such as Predicate Calculus. Had I paid more attention at university I would be able to tell you the difference between 1st-order logics and higher-order logics, but I didn’t and so I can’t. This does not seem to have had an adverse effect on my day-to-day life. If my wife has noticed my shortcomings in this department then she has never mentioned this (at least not in my presence).

The thing that puts people off formal logic is the notation – it doesn’t look like English. For example, the meaning of the expression below doesn’t exactly leap of the page:

{t.Name, t.Age, t.Height | People(t) AND t.Name = ‘Tom’}

This is the art of jargon – invent a language that very few people can understand and then all pat yourselves on the head because you understand it. This is perhaps a little unkind to computer scientists. If you are going to describe things in very precise, unambiguous ways then you need some special notation that perhaps the man on the Clapham omnibus might not understand.

Things get a bit simpler once you know how to “pronounce” the special, magic words and symbols. The “|” symbol is pronounced as “such that“. ∃ symbol means ‘there exists‘ and ∀ means “for all“.

If you want to skip the complicated stuff, the key points about relational calculus are:

It is not procedural
The order in which expressions are evaluated in is not important

Put simply, while in relational algebra you specify a) what you want and b) how to get it, in the calculus you just specify what you want. There are different types of relational calculus (well, there would be, wouldn’t there?). The types I am aware of are Tuple Relational Calculus (TRC) and Domain Relational Calculus (DRC). Again, perhaps I should have paid more attention at school. I don’t really know much about DRC and so I’ll do the polite (and sensible) thing and skip over it. Incidentally, don’t bother Googling DRC – you will find lots of references to what, when I was a youngster, was called the Belgian Congo.

The TRC expression above translates into SQL as:

SELECT Name, Age, Height

FROM People

WHERE Name = ‘Tom’

A TRC expression is generally of the form:

{t | COND(t)}

i.e. define a new tuple made up of all members of the tuple t which satisfy the condition COND. The condition can be made up of multiple sub-conditions separated by AND and OR

Here’s a slightly more complex example. It finds all people who work for Smiths Ltd.

{p.Name, p.Age, p.Height | PEOPLE(p) AND (∃e) (EMPLOYERS(e) AND e.Name = ‘Smiths Ltd’ AND p.Employer = e.Employer) }

Which means (informally):

Create a new tuple containing Name, Age & Height from the PEOPLE tuple (i.e. relation/table) to include only a) people p who are in the PEOPLE relation/table, b) there exists an employer e in the EMPLOYERS relation whose Name is ‘Smiths Ltd’ and whose identifier matches the Employer identifier in PEOPLE.

In SQL this translates into:

SELECT Name, Age, Height

FROM People

INNER JOIN Employers

ON People.Employer = Employers.Employer

I hope that you have found this article to be moderately interesting. It’s curious that writing this kind of article is an almost cathartic experience. You know full-well that nobody is ever going to read it, but you somehow feel better for having written it. I hope at least that I have successfully (and correctly) explained the difference between relational algebra and the relational calculus, which is what I set out to do.

Hey-ho! It’s better than working for a living.

A Little Database Theory – Part 3

2 Comments

Submit a Comment Cancel reply

Recent Posts

A Little Database Theory – Part 3

2 Comments

Submit a Comment Cancel reply

Recent Posts

Tag Cloud