Monthly ArchiveJanuary 2017

ByGraham

The Footprint Manager

If you haven’t already, it is strongly recommended that you read this blog article about groups & permissions before you look into group footprints. Just to recap, a footprint defines the parts of the database that a group of users can see.

venn

The “Hospital A” group can maybe just see their own data, and likewise the “Hospital B” group might be only able to see theirs, but an administrator would be able to see data across both sites. This is not the same as a permission: all three groups have read access to equipment (for example); the footprint defines the data that they can actually see.

Discriminators

If you are going to split a dataset up into different chunks that can be seen by different people, then you will need some discriminator. i.e. some way of identifying which users can see which chunks. For example, the e-Quip database might be used by several completely independent sets of users; such as: Medical Physics, EBME, Facilities etc. How is it best to say, “this supplier is here for the Facilities users“, “this model is here for Medical Physics users“, etc? In some cases this is easy. You might, for example, be able to say, “this  data should only be visible to users at Hospital A“. However, it can rapidly get quite complex. It might be that a spare part must be visible to users at both “Hospital A” and “Hospital B”, but the individual bins should only be visible to the appropriate site.

Far and away the most commonly-used discriminator in e-Quip is the team. It is for this reason that almost every screen in e-Quip has a team field. In the simplest case we simply define teams for “Hospital A” and “Hospital B“. We will dig into how this works, why it sometimes doesn’t and then some more complex alternatives.

How is a Footprint Different from a Filter?

Would it be sufficient to just create a base filter (i.e. a filter that the user can’t remove) for each team? Could you just add a filter of “… AND Team = ‘Hospital A” onto every search that “Hospital A” users make? Well, you could, but you would have to do this for every single screen. e-Quip allows you to create a base filter for every screen for a group but this would be quite time-consuming to manage. This would work, but wouldn’t handle lookups properly, since lookups can work from any screen. For example, you might have an equipment filter that says “… AND Team = ‘Hospital A” so you can only see “Hospital A” equipment, but if you were selecting the location for a device using the location lookup, the location lookup isn’t aware of equipment filters and so the lookup would show locations at all hospitals.

What we really need is an “entity-specific” mechanism rather than a “screen-specific” approach. This means that when we specify “you can only see information for Hospital A“, this restriction applies not only to every screen, but also to lookups. This, essentially, is the purpose of the Footprint Manager. While filters are screen-specific (and do not apply to lookups), footprints are entity-specific (and do apply to lookups).

Inclusions & Exclusions

The Footprint Manager takes discriminators and says “you can only see …” (an inclusion) or “you can never see …” (an exclusion). If you use a footprint to say “members of this group will only see data where the Team = Hospital A“, then this applies across all screens and also applies to lookups. These users will only see equipment, jobs, contracts etc., which have been assigned to the team “Hospital A“.

You might also want to specify a footprint by saying what users can’t see. A footprint might say “members of this group will never see decommissioned equipment“. If we combined these this might mean that “members of this group will only see data where Team = Hospital A but never show decommissioned equipment“. This would apply to every screen, so if the user clicked Show All Records on the equipment screen they would only see equipment assigned to the team Hospital A which was not decommissioned. The same reasoning would apply when raising a job; it would not be possible for a user with this footprint to create a job for a device for a team other than Hospital A, or for a decommissioned device.

So, an inclusion means “you can only ever see …”, while an exclusion means “you will never see …”.

Would you ever have an Inclusion and Exclusion on the Same Data type?

Basically, no! Is it meaningful to say “You can only ever see data from Hospital A and you will never see data from Hospital B“? If you can only see data from Hospital A there is no need to say that you cannot see data from Hospital B.

Using the Footprint Manager

The Footprint Manager is extremely simple to use. It is an external application under the Tools program files menu (depending on your operating system). When you run it you will be asked to enter a username and password. Once you have logged in you will see a list of groups. To edit a group footprint just double-click on the group.

At the top of the screen you will see a drop-down list which allows you to choose the discriminator that you have decided to use.

footprint1

In order to specify a team inclusion on “Hospital A“, click the lookup at the top of the screen and select team “Hospital A“. For a manager you might also select an inclusion on “Hospital B“.

footprint3

This means that these users can only see data which relates to teams “Hospital A” and “Hospital B“.

You might, alternatively, choose to say that these users can see all data except that which relates to “Hospital C“. This would be achieved with an exclusion.

footprint4

The differences between these choices can be very subtle.

What about NULLs?

If you specify an exclusion, that is fairly unambiguous. i.e. “you will never see any data where the Team = Hospital A“. Any data at all, equipment, job, contract, spare part, location, anything at all; if the team is “Hospital A” then you won’t see it.

Inclusions are a bit more problematic. Let’s say that we have a spare part which is used in both Hospital A & B. However, there are bins for this part in Hospital A and Hospital B. The bins are no problem, we just assign the appropriate team, but what about the spare part itself? If the part is associated with only a single team (maybe it is a boiler spare part for the Facilities team) then there is no problem, but in this example the part needs to be visible to several teams.

This forces us to interpret inclusions as “you will only ever see data where either the Team = Hospital A or the Team is empty“. You can also see why this is the case when you look at entity types which appear several times within one record. For example, on the equipment property page there is not a single supplier field. rather, there are fields for Original Purchase supplier, New Item Purchase Supplier, Spare Part supplier, etc.

suppliers

This is a device which was originally purchased from Beaver Medical but which is now purchased from Cardiac Services. Contracts, however, are purchased from Philips. Could we just rely on inclusions for Beaver Medical, Cardiac Services and Philips Healthcare? No, because the spare part supplier, callout supplier, training supplier and loan supplier are all blank. In order to display this record the inclusions would have to be interpreted as:

Supplier = Cardiac Services

OR

Supplier = Beaver Medical

OR

Supplier = Philips Medical

OR

Supplier is Empty

This will cause some headaches when we are looking at complex (or realistic) scenarios.

Alternative Discriminators

The team entity is the clear winner when it comes to choosing a discriminator which helps in “sharing out” your e-Quip database between different functional groups of users. However, there are alternative candidates.

Sites are commonly used for this purpose, i.e. “you can only see data where the Site = Hospital A“. For our non-NHS users the customer entity can also be useful, i.e. “you can only see data where the Customer = BUPA“.

e-Quip can handle many complex scenarios, but sometimes you may need to be very imaginative in how you decide to chop up your data.

ByGraham

Group Permissions

You may have come across the series of articles about configuring e-Quip using the Role Manager. This is a really big subject and even when tackled in small chunks the articles are tending to become long which might give the impression that this is more complex than it actually is. For that reason we have moved some of the discussions about groups and permissions, along with the Footprint Manager, into separate posts.

In order to use e-Quip you must have a user account and that account will belong to a single group. Essentially a group defines your permissions or rights. i.e. what you are allowed to do within e-Quip. Almost always a group will have multiple members (i.e. user accounts) but remember, a user account is a member of a single group.

Item Permission Masks

Every data item in e-Quip is protected by a permissions mask which can be any combination of:

Read –  This data can be seen

Write – This data can be edited

Add – New records can be created

Delete – Records can be deleted

Control – This grants the read, write, add & delete permissions and also allows records to be archived

Is the term data item the same as saying screen? For example, do permissions on the Location data type just refer to the location screen? No – these permissions apply across the whole of e-Quip. If a group has Add rights on locations, for example, then not only will the Create New Location menu on location screen be enabled, but the Add New button on the location Lookup control will be enabled everywhere it occurs, even in places like the QBE (Query by Example) utility.

Why is it called a mask? This is a bit of IT terminology used to describe something where the overall effect is defined by adding individual items, often called flags. Suppose that the read flag was represented by the number 1, write by 2, add by 4 and delete by 8. Using these flags, the mask 15 represents: 1 + 2 + 4 + 8. Thus 15 = read + write + add + delete. They are held this way (as ascending powers of 2) because computers can compare such masks very quickly.

Setting Permissions

The User Manager is responsible for defining the permissions associated with a group. To run this utility, click the Office menu (the round menu button in the top left-hand corner) on the dashboard screen and select Manage Users.

A screen will appear with two tabs, one to manage groups and another to manage individual users.

grpadmin

Group creation is trivial: simply click the New button and enter a code, name & description for the group. The code is largely unimportant and is used sometimes for advanced system customisations.

To open or edit a group, simply double-click on it in the grid. When a group is displayed, click the Permissions tab to see the rights mask for every data type.

mask

Editing permissions is as easy as you would think; simply tick the boxes for the rights that you want to set. You can set the rights for multiple data types by selecting them in the list and then clickign Edit Selection. This will display the following screen:

mask2

Just tick the appropriate rights then click the Apply button and those rights will be applied to every selected data type.

Delete Rights

When is it appropriate to delete a record from a database? I would start from “never” and then argue back very vociferously from there! My personal view is that if you create a job by mistake, then you should just set its status to Cancelled, not delete it, but that’s just my personal view. Incidentally, it isn’t possible in e-Quip to delete something that is referenced by something else.

Links Between Entity Types

Let’s suppose that a group has write & add permissions on jobs, so they can create new and edit existing jobs. What happens if those users need to a) add parts to a job and b) remove parts from a job that they may have added accidentally.

There are three separate data types involved here:

Jobs

Spare Parts

Part / Job Links

You can’t add parts to a job unless you can actually edit jobs in the first place. You cannot select from a list of spare parts if you don’t have at least read access to spare parts. But, to add a spare part to a job you also need write access to the Spare Part / Job Link entity. These are three distinct permissions. You might not be able to add new jobs, or create new spare parts, but you can still add spare parts to a job. There are separate rights masks for all three. have a look at the permissions below:

masklinks3

This group can see (i.e. read) spare parts but has read + write + add + delete rights to spare part job links, spare part model links and spare part supplier links.

This means that these users can:

Add a spare part to a job (“this spare part was used on this job”)

Add a spare part to a model (“this spare part is applicable to this model”)

Add a spare part to a supplier (“this spare part can be purchased from this supplier”)

It should be reasonably obvious that adding (i.e. creating) a spare part is not the same as linking an existing spare part to a job, model or supplier.

Deleting Links

Notice that this group has permission to delete spare part links. This is not the same as being able to delete spare parts. If a user adds a part by mistake to a job then they need to be able to remove the spare part from the job. Similarly, if a supplier stops selling a particular part, removing it from the supplier is not the same as deleting the spare part itself. It will almost always be the case that if a group has the rights to add a link then they will also be able to delete that link.

Footprints

 

Although groups primarily exist as a permissions mechanism, each group also has a Footprint, which defines the data which is visible to that group. If you’ve ever seen a Venn diagram then you’ll have a good idea what this means.

venn

The “Hospital A” group can maybe just see their own data, and likewise the “Hospital B” group might be only able to see theirs, but an administrator would be able to see data across both sites. This is not the same as a permission: all three groups have read access to equipment (for example); the footprint defines the data that they can actually see.

This blog article explains how the Footprint Manager operates.

 

ByGraham

A Little Database Theory – Part 3

At last! My original intention was to write an article about the difference between relational calculus and relational algebra. In order to get here we needed to see what databases looked like before the relational model and what the relational model actually means. Now we can get to the nub of the grist of the nitty, and/or gritty.

Lets look at a simple mathematical problem: quadratic equations. Just in case you can’t remember back to your 1st week at big school, a quadratic equation is an equation of the form:

ax2 + bx + c = 0

There are lots of approaches that you can take to solve these equations. The Egyptians didn’t have algebra and so they solved these problems in a geometric way, by drawing. I have a fantastic book on the history of numbers which explains how to do this. If I can ever find it I will post the explanation. It also explains how Sumerians did crosswords and Sudoku (that’s actually a lie, it does nothing of the sort). You might have learnt how to solve these equations by factorisation (I never got the hang of that) or by using a formula:

quad

Clearly, in the cat-skinning department, the number of solutions > 1. Well, the same goes for set-theoretic problems. The 2 cat-skinning approaches I learnt (yonks ago), vis-a-vis set theory, were relational algebra and relational calculus.

As I mentioned in the last post, the database world has pretty much decided on a language called SQL (Structured Query Language) to process relational data. This language is a practical implementation of some features of both the algebra and calculus.

Relational Algebra

I find this the easiest approach to conceptualise, largely because the notation is much closer to English (well, if not English, then at least closer to the kind of algebra that we all understand).

Just as normal algebra combines and manipulates operators symbolically, so does relational algebra. The relational algebra operators are things like:

SELECTION – σ

PROJECTION – π

RENAME – ρ

UNION – ∪

CARTESIAN PRODUCT – x

Selection is about finding stuff. To find Tom in the People relation we do:

σ Name = Tom(People)

Projection is about choosing the attributes (columns) that we want to see.

π Name, Age, Height (People)

It is (sometimes) easy to see how relational algebra maps to SQL statements:

π Name, Age, HeightName = Tom(People))

Becomes:

SELECT Name, Age, Height

FROM People

WHERE Name = Tom

Very often, as in the example of finding the employer for each person, there are multiple steps which have to be done in a particular order. Think back to the last post where we identified who worked for whom. This took several steps:.

  1. Take the Cartesian Product of People and Employers (x)
  2. Ignore tuples (rows) where the People(Employer) is not the same as the Employers(Employer) (SELECTION σ)
  3. Choose the columns that we want from the results (PROJECTION π)

The key point here is that relational algebra is procedural. In this example there are 3 steps and they have to be carried out in the correct sequence.

Another important point is that with relational algebra you specify what you want and how to get it.

Relational Calculus

Relational calculus is a formal logic. There are many examples of formal logics, such as Predicate Calculus. Had I paid more attention at university I would be able to tell you the difference between 1st-order logics and higher-order logics, but I didn’t and so I can’t. This does not seem to have had an adverse effect on my day-to-day life. If my wife has noticed my shortcomings in this department then she has never mentioned this (at least not in my presence).

The thing that puts people off formal logic is the notation – it doesn’t look like English. For example, the meaning of the expression below doesn’t exactly leap of the page:

{t.Name, t.Age, t.Height | People(t) AND t.Name = ‘Tom’}

This is the art of jargon – invent a language that very few people can understand and then all pat yourselves on the head because you understand it. This is perhaps a little unkind to computer scientists. If you are going to describe things in very precise, unambiguous ways then you need some special notation that perhaps the man on the Clapham omnibus might not understand.

Things get a bit simpler once you know how to “pronounce” the special, magic words and symbols. The “|” symbol is pronounced as “such that“. ∃ symbol means ‘there exists‘ and ∀ means “for all“.

If you want to skip the complicated stuff, the key points about relational calculus are:

  1. It is not procedural
  2. The order in which expressions are evaluated in is not important

Put simply, while in relational algebra you specify a) what you want and b) how to get it, in the calculus you just specify what you want. There are different types of relational calculus (well, there would be, wouldn’t there?). The types I am aware of are Tuple Relational Calculus (TRC) and Domain Relational Calculus (DRC). Again, perhaps I should have paid more attention at school. I don’t really know much about DRC and so I’ll do the polite (and sensible) thing and skip over it. Incidentally, don’t bother Googling DRC – you will find lots of references to what, when I was a youngster, was called the Belgian Congo.

The TRC expression above translates into SQL as:

SELECT Name, Age, Height

FROM People

WHERE Name = ‘Tom’

A TRC expression is generally of the form:

{t | COND(t)}

i.e. define a new tuple made up of all members of the tuple t which satisfy the condition COND. The condition can be made up of multiple sub-conditions separated by AND and OR

Here’s a slightly more complex example. It finds all people who work for Smiths Ltd.

{p.Name, p.Age, p.Height | PEOPLE(p) AND (∃e) (EMPLOYERS(e) AND e.Name = ‘Smiths Ltd’ AND p.Employer = e.Employer) }

Which means (informally):

Create a new tuple containing Name, Age & Height from the PEOPLE tuple (i.e. relation/table) to include only a) people p who are in the PEOPLE relation/table, b) there exists an employer e in the EMPLOYERS relation whose Name is ‘Smiths Ltd’ and whose identifier matches the Employer identifier in PEOPLE.

In SQL this translates into:

SELECT Name, Age, Height

FROM People

INNER JOIN Employers

ON People.Employer = Employers.Employer

I hope that you have found this article to be moderately interesting. It’s curious that writing this kind of article is an almost cathartic experience. You know full-well that nobody is ever going to read it, but you somehow feel better for having written it. I hope at least that I have successfully (and correctly) explained the difference between relational algebra and the relational calculus, which is what I set out to do.

Hey-ho! It’s better than working for a living.

 

ByGraham

A Little Database Theory – Part 2

The first part of this article discussed what databases looked like before Codd’s seminal paper in 1969 introduced the relation model. This post describes what the relational model actually is. By the way, the last article gave some examples from non-relational databases that I’d worked on in the past. I don’t want to give the impression that I was writing software before 1969! Just like there are still plenty of dBase applications still around (what do you mean, “what’s dBase?”) when I was first learning database programming there were still plenty of ISAM applications still running – there probably still are.

Codd’s paper revolutionised the data processing world. Data processing was now based on mathematics. There were now rules which everyone knew and understood. That’s why virtually all databases that you see now are relational and why the old-style ISAM databases disappeared.

So hands up who knows what a relation actually is? Unless you are a computer scientist or mathematician you would probably be wrong. Many people think that relations describe relationships between entities, such as equipment and models. Wrong. A relation is a concept from set theory, which you probably learnt about in junior school.

Suppose we have 2 sets: A = Names and B = Ages. A common way to define the contents of a set is as a comma-separated list between curly brackets. e.g.

A = {Tom, Dick, Harry}

B = {20, 21, 30}

A relation is a new set created by making ordered pairs from all elements of original sets. R1 below is a relation on A and B.

R1 = { {Tom, 20}, {Dick, 21}, {Harry, 30} }

Notice that R1 is itself actually a set which contains:

{

{(the 1st member of set A, the 1st member of set B)},

{(the 2nd member of set A, the 2nd member of set B)}

{(the 3rd member of set A, the 3rd member of set B)}

}

You might see this referred to as:

{

{(A1, B1 )},

{(A2, B2 )},

{(A3, B3 )}

}

The values inside the parentheses are called tuples, or n-tuples and we will see more of them in the next post. In this case n = 2, so these are 2-tuples.

Suppose we had a 3rd set C = Heights:

C = { 5′ 8″, 6′ 2″, 6′ 0″ }

The relation R2 over A, B & C would be:

R2 = { {Tom, 20, 5′ 8″}, {Dick, 21, 6′ 2″}, {Harry, 30, 6′ 0″} }

(In this case the tuples are 3-tuples)

You should now be able to spot how relations are important. Let’s just represent this a different way, with the names of the sets at the top.

table

We can see now that a relation is effectively a database table. This is the fundamental element that underpins the entire relational database model. RDBMS’s (Relational DataBase Management Systems) all manage data in what have become known as tables, where table is synonymous with relation. All of the data is defined by relations and everything that you can conceivably do to or with these relations is defined by set theory.

Incidentally, RDBMS’s are not the same thing as ROUS’s (Rodents of Unusual Size). They’re a whole different kettle of ballgames, as they say.

Just step back and ponder the enormity of this. We have 3 sets, Names, Ages and Heights, each having the same cardinality (i.e. number of members). By using some fairly elementary maths we can combine those into a mathematical model of the now-familiar concept of a database table. There are some rules that these sets and relations have to follow but the power and flexibility this model gives us is truly amazing (compared with the disadvantages of the older database systems discussed in the last article).

So far this simple relation describes a single, real-world entity, namely a person. Ok, a person is defined by more than their name, age & height, but getting a model that more-closely resembles reality simply involves adding more sets for things like hair colour, shoe size etc. The relational model allows us to define relations for many different entities and to link them together in all kinds of ways.

As an example, let’s look at some set theory and see how this can extend our model. In set theory the Cartesian Product (normally written as “x”) over 2 sets gives a new set containing every element of set 1 combined with every element of set 2.

If:

A = {a, b}

B = {x, y}

Then:

A x B = { {a, x}, {a, y}, {b, x}, {b, y} }

How is this useful? Lets create a new relation, Employers. to list companies that people can work for and rename our earlier relation R2 to People. We can add a new column to the People relation, called Employer.

Employers = { {1, Smiths Ltd}, {2, Browns Ltd} }

People = { {Tom, 1},  {Dick, 1}, {Harry, 2} }

If we take the Cartesian Product of these we get:

join

Now we just need to get rid of every row where Employer (from People) doesn’t match Employer (from Employers). This will give (I have got rid of the columns I’m not interested in):

join2

It’s hard to grasp how revolutionary this was back in 1969, now that all databases work like this. We can now keep different lumps of data and combine them using trivial set theory. Each real-world entity (like a person or employer) now only ever needs to be defined once. If Smiths Ltd changes its name to Jones Ltd, that change only needs to be made in a single place, in the employers relation. This is one of the key requirements of properly-designed relational databases: “one fact – one place”.

Set theory contains many operations that let you combine sets (UNION), identify things which several sets have in common (INTERSECTION), subtract one set from another and so on. It would be tiresome for programmers if they had to process data using set-theoretical concepts. Just as high-level programming languages (Fortan, COBOL, C, C++, C#, Pascal, BASIC etc. [Look out- I feel another article coming on!]) shield programmers from having to know how the chips inside the computer work, so high-level data processing languages shield programmers from the grubby maths.

The database world has pretty much decided on a language called SQL (Structured Query Language) to process relational data. There is no consensus as to how this is pronounced; one man’s S-Q-L is another man’s Sequel.

In our simple example above the query “who works where” is expressed in SQL by:

SELECT People.Name, Employers.Name

FROM People INNER JOIN Employers

ON People.Employer = Employers.Employer

If we only wanted to see who works for Smiths Ltd we could add a restriction, such as:

WHERE Employers.Name = Smiths Ltd

This is hiding the mathematical complexities from us and allowing programmers to work in a language which is close to English. However, scratch the surface and the maths is still there.

In the next post I’ll discuss both relational calculus and algebra and dig a bit deeper into what SQL is doing.

 

ByGraham

A Little Database Theory – Part 1

The other day I came across a forum on the web and spotted a post by someone who wanted to know the difference between relational calculus and relational algebra. As happens so often on the web, all of the responses were written by people who had virtually no idea what they were talking about. So, having a couple of hours to kill I thought I’d give it a go.

Almost everyone has come across the term “relational database” and probably to many people “relational databases” and “databases” are the same thing. They’re not, but the relational model has become so popular that non-relational databases have (except in very specialist situations) pretty much disappeared.

I thought I’d split this article into 3 separate posts:

  1. What did databases look like before the relational model?
  2. What does “relational model” actually mean?
  3. A comparison between relational calculus and relational algebra

Let’s have a look at what the e-Quip database might have looked like before the Ted Codd presented his ground-breaking paper in 1969. Imagine that you have a book and you have decided to have one page per device. At the front of the book is an index which tells you the page number for a particular device. To get information about Equipment No 12345 you would look in the index and see that this device record can be found on page 253. You then quickly flick through to that page and Bob’s your uncle, you can see all the information you need.

If the book becomes a computer disk file then there were, and still are, plenty of ways of maintaining an index and quickly getting to the right part of the file based on the value from the index. Perhaps the most common was known as ISAM, or Indexed, Sequential Access Method. This meant that you used the index to get to the right place and then you read each character sequentially from that point onward.

If you’re in charge of the book you might come up with some rules, like:

  1. The first 10 characters on line 1 are the Equipment No
  2. The next 10 characters on line 1 are the Serial No
  3. The 2nd line holds the model name and description
  4. The first 10 characters on the 3rd line hold the purchase date in the format dd/mm/yyyy
  5. The next 10 characters on the 3rd line hold the installation date in the same format

So far so good(-ish). How about jobs? You might decide to have 2 lines per job, starting from line 20, which might give you enough space for 20 jobs.

isam

I have actually worked on plenty of applications like this. The biggest problems that we had were a) the dates were in dd/mm/yy format (oops! what’s going to happen the day after 31/12/99?) and b) nobody ever got round to writing down the rules so if we ever needed to add anything we just picked a bit of space somewhere on the “page” that looked vacant and used that. Unfortunately, after a few years the vacant spaces get few and far between. When space gets really tight things get difficult. Suppose we need to add another date (e.g. Replacement Date) but there aren’t 10 unused spaces in the file. There might be 3 spaces available on line 3, one at the start of line 4 and another 2 at the end of line 4, and 4 in the middle of line 7. To read this date the programmer needs to read data from 4 different places in the file, join them together in memory and then interpret the whole as a date. Writing a date is even more complex. The programmer would start with “01/12/2015” and have to split this into 4 separate pieces (“01/”, “1”, “2/” &  “2015”) then write each piece to the appropriate place in the file.

What happens when you finally run out of space on the page? Suppose a device has 21 jobs. You can’t just write this 21st job underneath the 20th because you will be overwriting the next equipment record. Well, there are things you can do, like start another book (with its own index) just for all the additional information do but it all gets complicated and the programmers have to handle the complexity.

Searching for things was very difficult. While you could easily find the right page for a particular device, how do you find all jobs completed last month? You start at page 1 and look through every line (from line 20 onward), then look at the date in columns 10 – 20. If that date is in last month, then add the Job No to your list. Then turn to page 2, and keep doing this until you have read the whole file. You could get clever and create more indexes at the front of the book but this all added to the complexity.

It’s not really surprising that these databases were often processed in batch. i.e. once a month you run a program which produces a whole load of printed reports. If you want to know how many jobs there were last month, then get last month’s report off the shelf and read it. Interactive software applications where you could ask any question any time you liked were not that common.

What do you do if some eejit writes the wrong thing in the wrong space? You have positioned the file at a place where you are expecting to see a date, but instead of “21/01/2015” you find “HELLO”. Just letting your program collapse in a heap wouldn’t be a good idea, although plenty did. You ended up having to write complex code to do simple things, just to take care of possible errors.

Another problem is that the file (i.e. the virtual equipment book) is completely separate from the application. A file is just a file – any programmer can write whatever they like to it, regardless of the format that you have specified. To keep the contents of the file meaningful you have to rely on every program that accesses it a) knowing the rules and b) sticking to the rules. Even if everyone does this, programs will inevitably have bugs. The programmer might have intended to write “Graseby MS26 Syringe Driver” starting from the 1st character on the 2nd line of the particular equipment record, but programmers make mistakes, especially when keeping track of where they are in a large, complicated file. The end result is that it is hugely difficult to prevent the file from being corrupted.

As an aside, I once worked on a database which managed the bonuses paid to travel agents depending on how quickly they submitted details of the travellers’ cheques that they had sold. The database was exactly like the ISAM file described above. In order to handle errors, the standard technique was:

1000 IF something goes wrong

1010 THEN LET X = 1/0

All of the applications were written Data General Business BASIC (which conveniently had ISAM built into the language). If you’ve ever used BASIC you’ll know that it is an interpreted language. The interpreter reads a line, translates it into machine code then executes it. It then moves on to the next line and does the same again. If something goes wrong the interpreter just stops and displays a message, then sits and waits for someone to tell it what to do. LET X = 1/0 is going to cause an error, since you can’t divide anything by 0. The interpreter will display something like “Divide by zero error at line 1010” then just sit and wait.

The computer operator could then call the programmers, who could then a) sit in front of the computer, b) find out what has gone wrong, c) fix it, then d) tell the interpreter to carry on.

From a programmer’s perspective this is awful. Every single database application is different, even in the same organisation. You might be an expert on one system and not have the slightest idea about another. Codd’s paper describing his new relational model, published in 1969, revolutionised the data processing world. Data processing was now based on mathematics. There were now rules which everyone knew and understood. That’s why virtually all databases that you see are relational and why the old-style ISAM (Indexed Sequential Access Method) databases disappeared.

My next post will look at what we actually mean by “relational model”.

 

ByGraham

Turing Machines and the Turing Test

 

Everyone these days has heard of Alan Turing, the father of computer science, the man who cracked the Enigma code and the eponymous “inventor” of both Turing Machines and the Turing Test. Science fiction tends to conflate the two, but they are completely different things, both emanating from that same genius.

The Turing Machine

I don’t work as a computer scientist. I know people who do, but because I have a mortgage to pay I work as a software engineer, even though my discipline is computer science. Computer scientists live in colleges, generally in Cambridge although there may be one or two in Oxford. Back in the 70’s there were even some in the US of A, copying the stuff they learnt from Cambridge (Ok, Kernighan, Ritchie, Thompson et al, this isn’t meant to be serious, please don’t sue me) . They don’t do anything as demeaning as writing software. So what do they do?

How to be a genius: step 1. Think in abstract terms. Einstein worked with thought experiments, like, “if I were sitting on a photon, what would I see if another photon went past me, and at what time would we both think it was time for tea?”. How did Alan Turing imagine computers before they existed? What’s even more astonishing is that Turing worked out the kinds of problems that you could and couldn’t solve with a computational engine before there was any such thing as a computational engine. That is the power of abstraction in the hands of a genius. I’ve only ever met one true genius in my life and his gift was to explain the complex in very simple terms. There are not many people in the world with this gift. Think Feynman, Turing, Einstein…

Can you imagine a machine that can add 3 and 4, without building the actual machine, even before such a machine has been built?

Turing suggested that you imagine a machine which:

a. Takes its input from an infinite tape made up of cells

b. Has a read/write head which can read the cell under the head

c. Has some instructions (a stored program) which tells it what to do depending on what is under the head.

For example, the instruction might say:

Do the following forever:

If there is a non-blank under the head Then

Move the head right to the 2nd blank cell

Write “1”

Move the head left to the 2nd blank cell

Move the head right one space

Erase the cell contents

Otherwise

Stop

These are the instructions to add two numbers together. When the machine stops the answer should be on the tape.

(I can’t do graphics so the following images are from https://vimeo.com/46913004)

turing1

If you just follow the instructions (i.e. program) then you will end up with:

turing2

It’s worthwhile just working through this example. It has nothing to do with binary notation or real tapes – it doesn’t matter whether each cell on the tape has 0’s, 1’s, apples or bananas in it. The model is purely conceptual.

Not only did Alan Turing come up with this fantastically elegant idea, he also defined the limits of this wonderful abstract machine; not only what could it do, but what couldn’t it do. Adding is fine, so is multiplication, division etc. But one very important question is, “if the input is infinite, is the machine ever guaranteed to reach the ‘Otherwise Stop‘ state?”. In Computer science this is called the Halting Problem. It says a lot that Turing was aware of this even back in 1937. Things have moved on a lot since then but computer science is still the science of the nature of problems that can and can’t be solved by computational engines.

It turns out that there is a whole class of problems that can be solved with Turing machines. If you’re interested you might want want to check out the Game of Life, invented by John Conway, which, while not only being a mathematical model for evolution also turns out to be able to solve any problem solvable by a Turing machine.

 

The Turing Test

Even back in the 1930’s as a young under-graduate, Turing could see the logical consequences of his assertions about what could and could not be computed with a computational engine. Naturally (if you’re a genius) his thoughts turned to machine intelligence. Imagine that you were having a conversation with another “person” on the other side of a wall. How could you be certain that you were talking with a person rather than a computational engine or robot, of some kind? This is similar to the famous “cogito” problem in philosophy. We all know that external factors can affect our thoughts. Descartes put this in terms of a mischievous imp but we can imagine that things like brain tumours, hallucinogenic drugs etc. can induce visions. The question is, how can we prove that what we see is real? Descartes reduced this to Cogito ergo sum (I think therefore I am). i.e. the fact that I am thinking means that I must exist. Everything else could be down to the mischievous imp. So imagine a conversation through a brick wall. How could you be certain that the entity (mischievous imp or otherwise) on the other side 0f the wall is intelligent? In Turing’s terms “is it possible for machinery to show intelligent behaviour?”.

This is the so-called Turing Test. The basic idea is that if you can’t tell the difference then artificial intelligence has been achieved (this is a massive generalisation! Many computer scientists would disagree  with this statement. Fortunately, this is a blog, not a PhD thesis). The really important point is that someone was even thinking these ideas before computers actually existed.

So, this is computer science. Turing Machines and Turing Tests are completely different things. Perhaps you did a physics degree but are working as a clinical scientist, bio-med or EBME engineer. The difference between being a “physicist” and a bio-med is similar to the difference between being a computer scientist and a software engineer. I studied the former, but my mortgage is paid by the latter.

It’s the difference between SQL and relational calculus/algebra (I’m sure there’ll be a post about this soon!).

As a post-script, many years ago when I was at university, like most students I developed a skill for knowing what was likely to be in the exams. Exam questions, by their very nature, have to be answerable in a given time or number of paragraphs. Artificial Intelligence is so mathematically complex that only the most general areas lend themselves to exam questions. Thus, when it came to revision, AI was not top of my list, as the maths was too complex for a 4-hour exam. Imagine my horror when the AI question required a mathematical explanation of the differentiation between circles and tubes in satellite photographs (i.e. rocket launchers versus sewage works). It’s a good job I wasn’t working for JFK in the 60’s as I would have recommended a pre-emptive strike on the Ely sewage works!