Tag ArchiveA Developer’s Day

ByGraham

A Little Database Theory – Part 3

At last! My original intention was to write an article about the difference between relational calculus and relational algebra. In order to get here we needed to see what databases looked like before the relational model and what the relational model actually means. Now we can get to the nub of the grist of the nitty, and/or gritty.

Lets look at a simple mathematical problem: quadratic equations. Just in case you can’t remember back to your 1st week at big school, a quadratic equation is an equation of the form:

ax2 + bx + c = 0

There are lots of approaches that you can take to solve these equations. The Egyptians didn’t have algebra and so they solved these problems in a geometric way, by drawing. I have a fantastic book on the history of numbers which explains how to do this. If I can ever find it I will post the explanation. It also explains how Sumerians did crosswords and Sudoku (that’s actually a lie, it does nothing of the sort). You might have learnt how to solve these equations by factorisation (I never got the hang of that) or by using a formula:

quad

Clearly, in the cat-skinning department, the number of solutions > 1. Well, the same goes for set-theoretic problems. The 2 cat-skinning approaches I learnt (yonks ago), vis-a-vis set theory, were relational algebra and relational calculus.

As I mentioned in the last post, the database world has pretty much decided on a language called SQL (Structured Query Language) to process relational data. This language is a practical implementation of some features of both the algebra and calculus.

Relational Algebra

I find this the easiest approach to conceptualise, largely because the notation is much closer to English (well, if not English, then at least closer to the kind of algebra that we all understand).

Just as normal algebra combines and manipulates operators symbolically, so does relational algebra. The relational algebra operators are things like:

SELECTION – σ

PROJECTION – π

RENAME – ρ

UNION – ∪

CARTESIAN PRODUCT – x

Selection is about finding stuff. To find Tom in the People relation we do:

σ Name = Tom(People)

Projection is about choosing the attributes (columns) that we want to see.

π Name, Age, Height (People)

It is (sometimes) easy to see how relational algebra maps to SQL statements:

π Name, Age, HeightName = Tom(People))

Becomes:

SELECT Name, Age, Height

FROM People

WHERE Name = Tom

Very often, as in the example of finding the employer for each person, there are multiple steps which have to be done in a particular order. Think back to the last post where we identified who worked for whom. This took several steps:.

  1. Take the Cartesian Product of People and Employers (x)
  2. Ignore tuples (rows) where the People(Employer) is not the same as the Employers(Employer) (SELECTION σ)
  3. Choose the columns that we want from the results (PROJECTION π)

The key point here is that relational algebra is procedural. In this example there are 3 steps and they have to be carried out in the correct sequence.

Another important point is that with relational algebra you specify what you want and how to get it.

Relational Calculus

Relational calculus is a formal logic. There are many examples of formal logics, such as Predicate Calculus. Had I paid more attention at university I would be able to tell you the difference between 1st-order logics and higher-order logics, but I didn’t and so I can’t. This does not seem to have had an adverse effect on my day-to-day life. If my wife has noticed my shortcomings in this department then she has never mentioned this (at least not in my presence).

The thing that puts people off formal logic is the notation – it doesn’t look like English. For example, the meaning of the expression below doesn’t exactly leap of the page:

{t.Name, t.Age, t.Height | People(t) AND t.Name = ‘Tom’}

This is the art of jargon – invent a language that very few people can understand and then all pat yourselves on the head because you understand it. This is perhaps a little unkind to computer scientists. If you are going to describe things in very precise, unambiguous ways then you need some special notation that perhaps the man on the Clapham omnibus might not understand.

Things get a bit simpler once you know how to “pronounce” the special, magic words and symbols. The “|” symbol is pronounced as “such that“. ∃ symbol means ‘there exists‘ and ∀ means “for all“.

If you want to skip the complicated stuff, the key points about relational calculus are:

  1. It is not procedural
  2. The order in which expressions are evaluated in is not important

Put simply, while in relational algebra you specify a) what you want and b) how to get it, in the calculus you just specify what you want. There are different types of relational calculus (well, there would be, wouldn’t there?). The types I am aware of are Tuple Relational Calculus (TRC) and Domain Relational Calculus (DRC). Again, perhaps I should have paid more attention at school. I don’t really know much about DRC and so I’ll do the polite (and sensible) thing and skip over it. Incidentally, don’t bother Googling DRC – you will find lots of references to what, when I was a youngster, was called the Belgian Congo.

The TRC expression above translates into SQL as:

SELECT Name, Age, Height

FROM People

WHERE Name = ‘Tom’

A TRC expression is generally of the form:

{t | COND(t)}

i.e. define a new tuple made up of all members of the tuple t which satisfy the condition COND. The condition can be made up of multiple sub-conditions separated by AND and OR

Here’s a slightly more complex example. It finds all people who work for Smiths Ltd.

{p.Name, p.Age, p.Height | PEOPLE(p) AND (∃e) (EMPLOYERS(e) AND e.Name = ‘Smiths Ltd’ AND p.Employer = e.Employer) }

Which means (informally):

Create a new tuple containing Name, Age & Height from the PEOPLE tuple (i.e. relation/table) to include only a) people p who are in the PEOPLE relation/table, b) there exists an employer e in the EMPLOYERS relation whose Name is ‘Smiths Ltd’ and whose identifier matches the Employer identifier in PEOPLE.

In SQL this translates into:

SELECT Name, Age, Height

FROM People

INNER JOIN Employers

ON People.Employer = Employers.Employer

I hope that you have found this article to be moderately interesting. It’s curious that writing this kind of article is an almost cathartic experience. You know full-well that nobody is ever going to read it, but you somehow feel better for having written it. I hope at least that I have successfully (and correctly) explained the difference between relational algebra and the relational calculus, which is what I set out to do.

Hey-ho! It’s better than working for a living.

 

ByGraham

A Little Database Theory – Part 2

The first part of this article discussed what databases looked like before Codd’s seminal paper in 1969 introduced the relation model. This post describes what the relational model actually is. By the way, the last article gave some examples from non-relational databases that I’d worked on in the past. I don’t want to give the impression that I was writing software before 1969! Just like there are still plenty of dBase applications still around (what do you mean, “what’s dBase?”) when I was first learning database programming there were still plenty of ISAM applications still running – there probably still are.

Codd’s paper revolutionised the data processing world. Data processing was now based on mathematics. There were now rules which everyone knew and understood. That’s why virtually all databases that you see now are relational and why the old-style ISAM databases disappeared.

So hands up who knows what a relation actually is? Unless you are a computer scientist or mathematician you would probably be wrong. Many people think that relations describe relationships between entities, such as equipment and models. Wrong. A relation is a concept from set theory, which you probably learnt about in junior school.

Suppose we have 2 sets: A = Names and B = Ages. A common way to define the contents of a set is as a comma-separated list between curly brackets. e.g.

A = {Tom, Dick, Harry}

B = {20, 21, 30}

A relation is a new set created by making ordered pairs from all elements of original sets. R1 below is a relation on A and B.

R1 = { {Tom, 20}, {Dick, 21}, {Harry, 30} }

Notice that R1 is itself actually a set which contains:

{

{(the 1st member of set A, the 1st member of set B)},

{(the 2nd member of set A, the 2nd member of set B)}

{(the 3rd member of set A, the 3rd member of set B)}

}

You might see this referred to as:

{

{(A1, B1 )},

{(A2, B2 )},

{(A3, B3 )}

}

The values inside the parentheses are called tuples, or n-tuples and we will see more of them in the next post. In this case n = 2, so these are 2-tuples.

Suppose we had a 3rd set C = Heights:

C = { 5′ 8″, 6′ 2″, 6′ 0″ }

The relation R2 over A, B & C would be:

R2 = { {Tom, 20, 5′ 8″}, {Dick, 21, 6′ 2″}, {Harry, 30, 6′ 0″} }

(In this case the tuples are 3-tuples)

You should now be able to spot how relations are important. Let’s just represent this a different way, with the names of the sets at the top.

table

We can see now that a relation is effectively a database table. This is the fundamental element that underpins the entire relational database model. RDBMS’s (Relational DataBase Management Systems) all manage data in what have become known as tables, where table is synonymous with relation. All of the data is defined by relations and everything that you can conceivably do to or with these relations is defined by set theory.

Incidentally, RDBMS’s are not the same thing as ROUS’s (Rodents of Unusual Size). They’re a whole different kettle of ballgames, as they say.

Just step back and ponder the enormity of this. We have 3 sets, Names, Ages and Heights, each having the same cardinality (i.e. number of members). By using some fairly elementary maths we can combine those into a mathematical model of the now-familiar concept of a database table. There are some rules that these sets and relations have to follow but the power and flexibility this model gives us is truly amazing (compared with the disadvantages of the older database systems discussed in the last article).

So far this simple relation describes a single, real-world entity, namely a person. Ok, a person is defined by more than their name, age & height, but getting a model that more-closely resembles reality simply involves adding more sets for things like hair colour, shoe size etc. The relational model allows us to define relations for many different entities and to link them together in all kinds of ways.

As an example, let’s look at some set theory and see how this can extend our model. In set theory the Cartesian Product (normally written as “x”) over 2 sets gives a new set containing every element of set 1 combined with every element of set 2.

If:

A = {a, b}

B = {x, y}

Then:

A x B = { {a, x}, {a, y}, {b, x}, {b, y} }

How is this useful? Lets create a new relation, Employers. to list companies that people can work for and rename our earlier relation R2 to People. We can add a new column to the People relation, called Employer.

Employers = { {1, Smiths Ltd}, {2, Browns Ltd} }

People = { {Tom, 1},  {Dick, 1}, {Harry, 2} }

If we take the Cartesian Product of these we get:

join

Now we just need to get rid of every row where Employer (from People) doesn’t match Employer (from Employers). This will give (I have got rid of the columns I’m not interested in):

join2

It’s hard to grasp how revolutionary this was back in 1969, now that all databases work like this. We can now keep different lumps of data and combine them using trivial set theory. Each real-world entity (like a person or employer) now only ever needs to be defined once. If Smiths Ltd changes its name to Jones Ltd, that change only needs to be made in a single place, in the employers relation. This is one of the key requirements of properly-designed relational databases: “one fact – one place”.

Set theory contains many operations that let you combine sets (UNION), identify things which several sets have in common (INTERSECTION), subtract one set from another and so on. It would be tiresome for programmers if they had to process data using set-theoretical concepts. Just as high-level programming languages (Fortan, COBOL, C, C++, C#, Pascal, BASIC etc. [Look out- I feel another article coming on!]) shield programmers from having to know how the chips inside the computer work, so high-level data processing languages shield programmers from the grubby maths.

The database world has pretty much decided on a language called SQL (Structured Query Language) to process relational data. There is no consensus as to how this is pronounced; one man’s S-Q-L is another man’s Sequel.

In our simple example above the query “who works where” is expressed in SQL by:

SELECT People.Name, Employers.Name

FROM People INNER JOIN Employers

ON People.Employer = Employers.Employer

If we only wanted to see who works for Smiths Ltd we could add a restriction, such as:

WHERE Employers.Name = Smiths Ltd

This is hiding the mathematical complexities from us and allowing programmers to work in a language which is close to English. However, scratch the surface and the maths is still there.

In the next post I’ll discuss both relational calculus and algebra and dig a bit deeper into what SQL is doing.

 

ByGraham

A Little Database Theory – Part 1

The other day I came across a forum on the web and spotted a post by someone who wanted to know the difference between relational calculus and relational algebra. As happens so often on the web, all of the responses were written by people who had virtually no idea what they were talking about. So, having a couple of hours to kill I thought I’d give it a go.

Almost everyone has come across the term “relational database” and probably to many people “relational databases” and “databases” are the same thing. They’re not, but the relational model has become so popular that non-relational databases have (except in very specialist situations) pretty much disappeared.

I thought I’d split this article into 3 separate posts:

  1. What did databases look like before the relational model?
  2. What does “relational model” actually mean?
  3. A comparison between relational calculus and relational algebra

Let’s have a look at what the e-Quip database might have looked like before the Ted Codd presented his ground-breaking paper in 1969. Imagine that you have a book and you have decided to have one page per device. At the front of the book is an index which tells you the page number for a particular device. To get information about Equipment No 12345 you would look in the index and see that this device record can be found on page 253. You then quickly flick through to that page and Bob’s your uncle, you can see all the information you need.

If the book becomes a computer disk file then there were, and still are, plenty of ways of maintaining an index and quickly getting to the right part of the file based on the value from the index. Perhaps the most common was known as ISAM, or Indexed, Sequential Access Method. This meant that you used the index to get to the right place and then you read each character sequentially from that point onward.

If you’re in charge of the book you might come up with some rules, like:

  1. The first 10 characters on line 1 are the Equipment No
  2. The next 10 characters on line 1 are the Serial No
  3. The 2nd line holds the model name and description
  4. The first 10 characters on the 3rd line hold the purchase date in the format dd/mm/yyyy
  5. The next 10 characters on the 3rd line hold the installation date in the same format

So far so good(-ish). How about jobs? You might decide to have 2 lines per job, starting from line 20, which might give you enough space for 20 jobs.

isam

I have actually worked on plenty of applications like this. The biggest problems that we had were a) the dates were in dd/mm/yy format (oops! what’s going to happen the day after 31/12/99?) and b) nobody ever got round to writing down the rules so if we ever needed to add anything we just picked a bit of space somewhere on the “page” that looked vacant and used that. Unfortunately, after a few years the vacant spaces get few and far between. When space gets really tight things get difficult. Suppose we need to add another date (e.g. Replacement Date) but there aren’t 10 unused spaces in the file. There might be 3 spaces available on line 3, one at the start of line 4 and another 2 at the end of line 4, and 4 in the middle of line 7. To read this date the programmer needs to read data from 4 different places in the file, join them together in memory and then interpret the whole as a date. Writing a date is even more complex. The programmer would start with “01/12/2015” and have to split this into 4 separate pieces (“01/”, “1”, “2/” &  “2015”) then write each piece to the appropriate place in the file.

What happens when you finally run out of space on the page? Suppose a device has 21 jobs. You can’t just write this 21st job underneath the 20th because you will be overwriting the next equipment record. Well, there are things you can do, like start another book (with its own index) just for all the additional information do but it all gets complicated and the programmers have to handle the complexity.

Searching for things was very difficult. While you could easily find the right page for a particular device, how do you find all jobs completed last month? You start at page 1 and look through every line (from line 20 onward), then look at the date in columns 10 – 20. If that date is in last month, then add the Job No to your list. Then turn to page 2, and keep doing this until you have read the whole file. You could get clever and create more indexes at the front of the book but this all added to the complexity.

It’s not really surprising that these databases were often processed in batch. i.e. once a month you run a program which produces a whole load of printed reports. If you want to know how many jobs there were last month, then get last month’s report off the shelf and read it. Interactive software applications where you could ask any question any time you liked were not that common.

What do you do if some eejit writes the wrong thing in the wrong space? You have positioned the file at a place where you are expecting to see a date, but instead of “21/01/2015” you find “HELLO”. Just letting your program collapse in a heap wouldn’t be a good idea, although plenty did. You ended up having to write complex code to do simple things, just to take care of possible errors.

Another problem is that the file (i.e. the virtual equipment book) is completely separate from the application. A file is just a file – any programmer can write whatever they like to it, regardless of the format that you have specified. To keep the contents of the file meaningful you have to rely on every program that accesses it a) knowing the rules and b) sticking to the rules. Even if everyone does this, programs will inevitably have bugs. The programmer might have intended to write “Graseby MS26 Syringe Driver” starting from the 1st character on the 2nd line of the particular equipment record, but programmers make mistakes, especially when keeping track of where they are in a large, complicated file. The end result is that it is hugely difficult to prevent the file from being corrupted.

As an aside, I once worked on a database which managed the bonuses paid to travel agents depending on how quickly they submitted details of the travellers’ cheques that they had sold. The database was exactly like the ISAM file described above. In order to handle errors, the standard technique was:

1000 IF something goes wrong

1010 THEN LET X = 1/0

All of the applications were written Data General Business BASIC (which conveniently had ISAM built into the language). If you’ve ever used BASIC you’ll know that it is an interpreted language. The interpreter reads a line, translates it into machine code then executes it. It then moves on to the next line and does the same again. If something goes wrong the interpreter just stops and displays a message, then sits and waits for someone to tell it what to do. LET X = 1/0 is going to cause an error, since you can’t divide anything by 0. The interpreter will display something like “Divide by zero error at line 1010” then just sit and wait.

The computer operator could then call the programmers, who could then a) sit in front of the computer, b) find out what has gone wrong, c) fix it, then d) tell the interpreter to carry on.

From a programmer’s perspective this is awful. Every single database application is different, even in the same organisation. You might be an expert on one system and not have the slightest idea about another. Codd’s paper describing his new relational model, published in 1969, revolutionised the data processing world. Data processing was now based on mathematics. There were now rules which everyone knew and understood. That’s why virtually all databases that you see are relational and why the old-style ISAM (Indexed Sequential Access Method) databases disappeared.

My next post will look at what we actually mean by “relational model”.

 

ByGraham

Turing Machines and the Turing Test

 

Everyone these days has heard of Alan Turing, the father of computer science, the man who cracked the Enigma code and the eponymous “inventor” of both Turing Machines and the Turing Test. Science fiction tends to conflate the two, but they are completely different things, both emanating from that same genius.

The Turing Machine

I don’t work as a computer scientist. I know people who do, but because I have a mortgage to pay I work as a software engineer, even though my discipline is computer science. Computer scientists live in colleges, generally in Cambridge although there may be one or two in Oxford. Back in the 70’s there were even some in the US of A, copying the stuff they learnt from Cambridge (Ok, Kernighan, Ritchie, Thompson et al, this isn’t meant to be serious, please don’t sue me) . They don’t do anything as demeaning as writing software. So what do they do?

How to be a genius: step 1. Think in abstract terms. Einstein worked with thought experiments, like, “if I were sitting on a photon, what would I see if another photon went past me, and at what time would we both think it was time for tea?”. How did Alan Turing imagine computers before they existed? What’s even more astonishing is that Turing worked out the kinds of problems that you could and couldn’t solve with a computational engine before there was any such thing as a computational engine. That is the power of abstraction in the hands of a genius. I’ve only ever met one true genius in my life and his gift was to explain the complex in very simple terms. There are not many people in the world with this gift. Think Feynman, Turing, Einstein…

Can you imagine a machine that can add 3 and 4, without building the actual machine, even before such a machine has been built?

Turing suggested that you imagine a machine which:

a. Takes its input from an infinite tape made up of cells

b. Has a read/write head which can read the cell under the head

c. Has some instructions (a stored program) which tells it what to do depending on what is under the head.

For example, the instruction might say:

Do the following forever:

If there is a non-blank under the head Then

Move the head right to the 2nd blank cell

Write “1”

Move the head left to the 2nd blank cell

Move the head right one space

Erase the cell contents

Otherwise

Stop

These are the instructions to add two numbers together. When the machine stops the answer should be on the tape.

(I can’t do graphics so the following images are from https://vimeo.com/46913004)

turing1

If you just follow the instructions (i.e. program) then you will end up with:

turing2

It’s worthwhile just working through this example. It has nothing to do with binary notation or real tapes – it doesn’t matter whether each cell on the tape has 0’s, 1’s, apples or bananas in it. The model is purely conceptual.

Not only did Alan Turing come up with this fantastically elegant idea, he also defined the limits of this wonderful abstract machine; not only what could it do, but what couldn’t it do. Adding is fine, so is multiplication, division etc. But one very important question is, “if the input is infinite, is the machine ever guaranteed to reach the ‘Otherwise Stop‘ state?”. In Computer science this is called the Halting Problem. It says a lot that Turing was aware of this even back in 1937. Things have moved on a lot since then but computer science is still the science of the nature of problems that can and can’t be solved by computational engines.

It turns out that there is a whole class of problems that can be solved with Turing machines. If you’re interested you might want want to check out the Game of Life, invented by John Conway, which, while not only being a mathematical model for evolution also turns out to be able to solve any problem solvable by a Turing machine.

 

The Turing Test

Even back in the 1930’s as a young under-graduate, Turing could see the logical consequences of his assertions about what could and could not be computed with a computational engine. Naturally (if you’re a genius) his thoughts turned to machine intelligence. Imagine that you were having a conversation with another “person” on the other side of a wall. How could you be certain that you were talking with a person rather than a computational engine or robot, of some kind? This is similar to the famous “cogito” problem in philosophy. We all know that external factors can affect our thoughts. Descartes put this in terms of a mischievous imp but we can imagine that things like brain tumours, hallucinogenic drugs etc. can induce visions. The question is, how can we prove that what we see is real? Descartes reduced this to Cogito ergo sum (I think therefore I am). i.e. the fact that I am thinking means that I must exist. Everything else could be down to the mischievous imp. So imagine a conversation through a brick wall. How could you be certain that the entity (mischievous imp or otherwise) on the other side 0f the wall is intelligent? In Turing’s terms “is it possible for machinery to show intelligent behaviour?”.

This is the so-called Turing Test. The basic idea is that if you can’t tell the difference then artificial intelligence has been achieved (this is a massive generalisation! Many computer scientists would disagree  with this statement. Fortunately, this is a blog, not a PhD thesis). The really important point is that someone was even thinking these ideas before computers actually existed.

So, this is computer science. Turing Machines and Turing Tests are completely different things. Perhaps you did a physics degree but are working as a clinical scientist, bio-med or EBME engineer. The difference between being a “physicist” and a bio-med is similar to the difference between being a computer scientist and a software engineer. I studied the former, but my mortgage is paid by the latter.

It’s the difference between SQL and relational calculus/algebra (I’m sure there’ll be a post about this soon!).

As a post-script, many years ago when I was at university, like most students I developed a skill for knowing what was likely to be in the exams. Exam questions, by their very nature, have to be answerable in a given time or number of paragraphs. Artificial Intelligence is so mathematically complex that only the most general areas lend themselves to exam questions. Thus, when it came to revision, AI was not top of my list, as the maths was too complex for a 4-hour exam. Imagine my horror when the AI question required a mathematical explanation of the differentiation between circles and tubes in satellite photographs (i.e. rocket launchers versus sewage works). It’s a good job I wasn’t working for JFK in the 60’s as I would have recommended a pre-emptive strike on the Ely sewage works!

 

 

ByGraham

So What exactly is a Virus?

We all have anti-virus software and naturally  take precautions against viruses, but do we know what a virus actually is? It’s obviously a piece of software that does something nasty, but how does that nastiness manage to find its way onto our computer? The other day I was with Phil, Sarah and Peter (Bishop, not Spencer – we have way too many Petes in Integra!) discussing our latest joint project and the subject of SQL Injection came up. Now Peter is a pretty clever chap (he has a PhD and everything!) but this was was new to him. Don’t worry, it wasn’t new to me, but I thought it might be worth writing a blog article about. After all, Peter used to design turbine blades for Rolls Royce Jet (“Jet” starts with a capital letter to distinguish it from glider engines and other engines that work by magic, as opposed to “proper” engines which work by, umm, jet) engines. So if Peter doesn’t know this, maybe lots of other people don’t either.

Now Peter is a very clever chap and, in common with very many clever chaps, he thinks that everyone in the world is a jolly good egg. Not being a jet engine designer I know that the world is not exclusively populated by nice people (I have even met some of them). Let’s suppose that we have a web site with access controlled by a username and password. Obviously we must have a screen where we ask people to enter a username and password. Let’s say that a genuine user visits our web site and enters his username as “fred” and his password as “secret”. Not wishing to bore everyone with the details of SQL we might be tempted to execute a command in the database (i.e. SQL) to validate this username/password combination. The obvious choice is:

SELECT * FROM Users
WHERE UserName = ‘fred’ AND Password = ‘secret’

(forgetting for the time being that no good system designer would ever store a password in a database in plain-text)

Well, that looks harmless enough. If the database USER table has a record for UserName = ‘fred’ where the password is ‘secret’ then what we have here is a good egg logging on to our application.

Suppose for just a minute that the world isn’t populated exclusively by PLU (“People Like Us” – if you’re a child of the 60’s you may recall this Nancy Mitford TLA). What if the user typed in:

”rubbish’ OR 1=2

When our application sends this to the server it turns out to be:

SELECT * FROM Users
WHERE UserName = ‘fred’ AND Password = ‘rubbish’ OR 1=1

There is a very good chance that the password in the database for the user Fred won’t be ‘rubbish’, but if you have an understanding of Boolean logic you will realise that while Password = ‘rubbish’ is FALSE, 1 = 1 is TRUE. As the Boolean OR evaluates FALSE OR TRUE to be TRUE, the user has managed to gain access to our system.

So far so bad, but things can get worse, much worse, Let’s suppose we have a really bad egg visit our web site. Now e-Quip is an equipment management database and so a well-informed hacker might guess that there was a table in the database called EQUIPMENT, or maybe INVENTORY. If this hacker guessed that e-Quip was based on Microsoft SQL-Server, then in this particular database system it is possible to chain together multiple commands in a single SQL statement, using the “;” character to separate the commands. What would happen if the user typed a password of:

”rubbish’ OR 1=2;DELETE FROM Equipment;DELETE FROM Job

A curious, but disastrous password! I don’t think you need to be a rocket scientist (or even a jet-engine scientist like Peter) to work out that not only will this password give you access to the system, but it will also destroy the system. If there isn’t a table called Equipment but there is one called Inventory that’s no problem for the hacker. It’s pretty simple to try out millions of combinations of names just by writing a little bit of code.

This kind of attack is known as “SQL Injection” and fortunately, programmers are well aware of this vulnerability and guard against it (I won’t go into details as that would make life way too easy for the bad guys). I used SQL in that example because the SQL language it is sufficiently English-like for anyone to get a rough idea what is going on and allow you to get the idea that bad things happen when code and data get mixed up. Although there are lots of SQL injection attacks out there, the baddies are way cleverer than that (some may even be cleverer than Peter) and they can turn data into executable code in lots of very clever ways.

I suppose that technically speaking that’s not really a definition of a virus, it’s more like a description of how the bad guys can trick an innocent application into doing something that it was never intended to do. What makes it a virus is the ability to replicate itself in another computer. Once a bit of software is doing naughty things in your name it’s only a small step to emailing itself to everyone in your global Address Book. That’s when it becomes a virus.

In case you’re thinking “why would someone bother to do this to me?”, it’s not you personally, it’s everyone. There are bits of software running 24 hours a day trying every combination of IP addresses and sending thousands of combinations of attacks to them. Here’s an extract from the Windows Event Log on a computer running Microsoft SQL-Server Express. This database server comes with a built-in system administrator account called “SA”.

Login failed for user ‘sa’. Reason: An error occurred while evaluating the password. [CLIENT: xx.xx.xx.xx]

(I’ve blanked out the bad guy’s IP address in case they sue me, or send the boys with baseball bats round)

On the computer I looked at this is happening every 60 seconds, all day, every day. A piece of software sitting on a perfectly innocent computer (possibly yours) is sending out millions of SQL-Server attacks in the hope that someone didn’t disable the “SA” account when they installed SQL-Server Express. How does it know where SQL-Express is installed? It doesn’t matter – when you have enough computers around the world doing this 24/7 it’s easy enough to try every computer in the world.

Ok, that’s not a virus, that’s somebody exploiting a security flaw. But the software that is doing this almost certainly got onto someone’s computer as a result of a virus. So let’s be careful out there and be sure to keep our data separate from our executable code!

 

 

 

ByGraham

Hacking Medical Devices

Pete Spencer came across this article. http://www.wired.com/2014/04/hospital-equipment-vulnerable/ You might find it interesting

 

ByGraham

Q. Just how easy is it to change your password?

Answer. A lot Harder than you might think!

The other week it occurred to me that I really ought to get around to changing the passwords that pretty much control my life. You might think that would be pretty simple, but this “project” has been going for 3 or more months and I still feel like I’ve only scratched the surface. I have more than once thought that it was a mistake to start, but going back to where I started is probably no longer an option. So what better way can there be to unburden myself of all of this self-inflicted stress than to inflict some of it on you?

Some Password History

It’s funny how attached you can become to a password. Years ago, in the early days of CAD/CAM in Cambridge, all of us at Integra were involved with a CAD system called Medusa. I chose “medusa” for my password and it’s  a safe bet that 90% of my colleagues and students did the same. It certainly made life easier – using Medusa was hard enough without having to remember a password as well!

In the early-90’s I found myself at Canary Wharf working for a Swiss bank in the back office of the derivatives team. Nowadays derivatives traders get a lot of bad press. True, they may have brought the financial system crashing down around our ears and have condemned the country to 10 years of austerity that we will probably never recover from, but fair’s fair – they were probably just having a bad day. Aside from developing a taste for Bolly I also picked up a new password. Naturally, being in a secure banking environment our passwords had to be ultra-secure, and unfortunately they had to be at least 8 characters long, so good old medusa couldn’t withstand the rigours of  investment banking. I knew that I had no chance of remembering 8 letters so I asked a colleague what her password was (did I mention that this was a secure banking environment?).  This was my first encounter with the cunning use  of numbers which look like letters since learning how to write “hollies” (a popular beat combo of the period) on an LED calculator way back in the 70’s. It had never occurred to me to use this feature in passwords. Well, her password couldn’t be bettered as an example of obfuscation, so I decided to use it unchanged. Thus was born “soonbesummer”, made up of a combination of letters and numbers, which had the added advantage that if I ever forgot it I could always ask her. I liked it so much that I continued to use it for almost 20 years. One of the nice things about “soonbesummer” was that if I was forced to change it I could simply switch to “soonbewinter”.

Both “medusa” and “soonbesummer”/”winter” have stood me in good stead over the decades, but times have changed. We need to take things like identity theft very seriously these days and so these passwords have now been consigned to the dustbin (that’s what we used to call wheelie bins before they had wheels on) of history.

How Did it Come to This?

Years ago computers were relatively rare and it was pretty unlikely that you would have access to more than one.  To run a program you had to a) get into a building connected to computer, b) find a terminal and “log on” (because the system kept a log of what you did), for which you needed a username and password . Then you had to know enough about the particular operating system in order to find and run a program. That was considered enough of a barrier – no further protection was deemed necessary.  Computer programs existed then to do useful work. What would have been the point in gaining illicit access to a computer only to find that you could do nothing except calculate the hoop stresses in thin-shelled pressure-vessels? The computer itself was guarded, like a queen in a hive, by the drones of the Computer Department  (it was quite a few years later that people starting to call it IT, probably about the same time that Peking was renamed to Beijing). The Computer Department existed to a) keep people off the computer and b) play with the computer. Most of their effort was  expended on a) because that made the computer go faster for b). I know this to be true, because in a past life I was the VAX/VMS Systems Manager at McDonnell  Douglas in Cambridge, and I spent 60% of my time doing combinations of a) and b), and the other 40% on my knees in the machine room trying to coax computers into life.

When PC’s came long things got better. The “P” in “PC” stood for “Personal”: the Computer Department couldn’t tell you to stop using it because you  were slowing down their attempts to print pictures of naked ladies using a line printer.

 Ok, she’s not naked, but you get the idea.

You could use PC’s whenever you wanted with no interference from “IT” and with no pesky passwords. Then one day, someone in the  Computer Department looked up from his terminal, peered out from behind his beard and noticed that nobody except them was using their computer any more. Great news for some (more time to play) but bad news if you wanted to keep your job. So in a stroke of pure genius – they invented networking.

Unfortunately, because of the “P” in “PC”, everybody could do pretty much whatever they wanted to with their PC’s, and they did exactly that, which meant it wasn’t long before all the PC’s stopped working. This was made much worse by the complete absence of any backups. Of course, once you had broken your PC you were on your own. Imagine calling a professional mechanic after your attempts at dismantling your gearbox have failed.  Well, try taking your PC to the Computer Department when it fails to boot after some seemingly harmless editing of CONFIG.SYS. Imagine a pod of whales all simultaneously taking a very deep breath before plunging down into the Stygian  depths; that would be as the mere murmuring of a gentle summer breeze compared to the collective sharp intake of breath from the bearded ranks.

Anyway, back to the plot. Very cleverly they decided that they should join up all the PC’s together with string, and then they would be in charge of the string. Just for good measure they put a computer with lots of disks into the machine room (right where the VAX used to be), and said we could use their big disks instead of the tiny ones on our PC’s. They would even do the backups! I’m ashamed to say that we fell for it.  To use the disks you had to use the network, for which you had to have a password. They were back in charge; if they took their string or disks away you couldn’t play any more, and the space where the VAX used to be filled up with more and more file servers. Funnily enough, the servers then gradually got bigger and bigger until they took up all of the space  where the VAX used to be, and everybody was happy again. Most of us now had computers on our desks and as long as we were nice to IT, we could even use them every now and then.

Going from Bad to Worse – The Internet

As long as you didn’t need to venture further than the end of the Computer Department’s string, things still weren’t all that bad. DBase came along and you could use your PC (albeit no longer quite so personal) for databases, but not many of these required you to use passwords. Nobody kept anything important on PC’s anyway, so it didn’t really matter, and in order to use your PC someone had to first get into your office.

You may remember the days when a program was delivered as a set of 15 disks: 1 for the program itself, and another 14 for all of the printer drivers. What did you do when some enterprising company brought out yet another model of printer? In order to get a new printer driver you could either pay the manufacturer to post you a copy (probably with a compliments slip stapled through the disk), or you could use a bulletin board. Probably the biggest in the world back then was CompuServe. You could connect your 300 baud modem (I still have it in the garage, next to the tape deck) and dial in to their service, and from there you could do things like download updated device drivers and ask technical questions to people clever enough to answer them. Of course, in order to use CompuServe you had to have a username and password. Worse still, almost everyone that had something useful there also required a username and password.

We were on a slippery slope, the gradient of which rapidly sharpened with the arrival of Windows 3.1. At last ordinary people had a reason to use a computer – to play solitaire. Then Tim Berners-Lee invented “the Web” and the slippery slope turned into a death spiral (apparently it’s OK to mix metaphors on the web). Companies like CompuServe turned into ISP’s; browsers like Mosaic allowed us to surf rather than to type, and bulletin boards became forums (yes, that is the correct plural of forum). We didn’t need line printer drawings of naked ladies any more, we could get actual pictures of naked ladies!

Around about this time the people who built hardware and wrote software realised that they didn’t have to provide any manuals any more: they could just put them on the web. The people running call-centres realised that they could sack all of their staff and replace them with an on-line forum. I have to say that I’m a great fan of this trend. On a forum I can always find someone who really knows what they’re talking about, whereas the chaps on the phone only ever seemed to tell you to turn it off and back on again. The forum has replaced product documentation. If I get error “x800B0100” when installing Windows 7 Service Pack 1, I don’t reach for a manual, I reach for Google. The problem is that now I have a forum account for just about every software product that I use – and they all need a password.

So, we have arrived at a point at which the number of “medusa”s and “soonbesummer”s has multiplied out of control. Despite the Second Law of Thermodynamics, something must be done to reduce the chaos and restore some order. And so, like an intrepid explorer  I set out into the void to boldly go and change my passwords.

A Strategy

What I needed was a strategy, and perhaps things might have gone better had I come up with a strategy before I set of on my journey; but you can’t change the habits of a lifetime. The first thing that became obvious was that I would need more than one password. There are some passwords that we share within the company. Anyone at Integra, for  example, can use my account on the Infragistics forum to get technical advice, software patches etc. The worst thing that anyone can do with this account is to ask really dumb questions on the forum under my name and make people send me emails like “when you say that the text was blank do you mean that the output was a non-blank series of blanks, terminated with a blank, or was it just blank?”. That’s fine, I can live with that, but I don’t want to give these guys access to my bank account.

Then again, we also occasionally have to tell passwords to customers. We try hard not to, but sometimes you just can’t help yourself. It’s a bit like telling someone your home phone number: it goes against your principles but one day you really need to talk to someone and your mobile signal disappears so you call them from the land-line. They call 1471, find your number and that’s it, they’ll call you at home every time you don’t answer your mobile.

So, I’ll be needing three passwords: 1 for me, 1 that I can share with Integra and 1 that I can share with just about anyone else. Now all I need to do is to think of 3 random combinations of letters & numbers, with at least 1 upper-case and 1 punctuation character. On top of that it can’t contain more than 4 characters from my old password or some sites won’t allow it. Whatever I come up with has to be easy to remember and difficult to guess. Hmmm. I’ve used “medusa” for nearly 30 years, I can’t use the wife’s name again! I could use my kids’ names and birthdays, only I can’t remember them.  In the end I give up on logic and decide to look out of the window and choose the 1st 3 things that I see.

Where do I Start

Any adventure starts with the 1st step, and mine starts with the local, non-networked things closest to me. The thing that I use most is SQL-Server, and I have 2 laptops (one of which has 3 virtual machines, each of which has SQL 2000, 2005 & 2008. The other laptop only has 2000 and 2005), plus the server which fortunately only has SQL 2008. So that’s 12 “medusa”s gone. Now all I need to do is reconfigure every single product that I support which connects to SQL and change the client password. I reckon that’s another 5 gone at least. Changing the connection string in the configuration file for http://demo.e-quip.uk.net is probably the last.

The next step is to change my Window’s logins on all of these computers, so that’s 3 more gone, along with the administrator password on the server. So far at least 22 “medusa”s have bit the dust. Now I can start on networked stuff. The hardest of those is going to be email. BT has a strange approach to accessing services on line. They give you one username which you have to use for 2 purposes, 1) as your username to log on to the web site, and b) as your username to access your email. Not only is this inconvenient as you have to log in twice, but the web site insists that you have a punctuation character in your password, while the email system won’t allow punctuation. Of course, this isn’t documented anywhere and it takes a few wasted hours to discover that I’m going to need a 4th password.  Anyway, that’s a couple less “soonbesummer”s.

Right, now my email server has a new password, I’ll change to change my email clients accordingly. I only use Outlook on my Windows 7 laptop, so that’s one change, but I also use email on my phone, so that’s one more. Normally to configure your email you need to enter passwords to access the POP3 server (for incoming mail) and the SMTP server (for outgoing mail). Since my recent BT email “upgrade”, BT now use Microsoft Outlook 365 to provide their mail services. Unfortunately the Outlook 365 servers won’t allow me to send emails from graham@integra.uk.net, since I am clearly the only one of BT’s customers in the entire country that doesn’t want a @btconnect.com email address. They have created an additional account for me which allows this, but Outlook 365 clearly hasn’t been told about these features. So I have to used one account for Outlook 365 POP3 services, and a different BT server for SMTP. At least they now have the same password, even if I can’t remember both of the account names.

I forgot that my router also has a password which I need to change just so that I’m being consistent. The only problem there is remembering how to connect to it. I’m sure it’s 192.168.something or other. Fortunately a bit of Googling tells me that with Google Chrome I can just type in “router”, which is nice and simple. I must make a note of that somewhere. Where was I?  I mustn’t forget my KIndle. That has to connect to Amazon using my email address and password, so that has to change too. That reminds me, I sometimes use the Kindle reader on my wife’s i-Pad so I’ll need to change that too. So, server, laptops (real & virtual), phone, Kindle, i-Pad, I must almost be done; time to start looking at my on-line accounts.

The accounts that I use most often are:

1. Microsoft Passport, which covers almost all Microsoft sites

2. The Infragistics support forum

3. Our web domain management control panel

4. Our web hosting management control panel

5. Our blog control panel

6. Out ftp site

That’s 6 more “soonbesummer”s gone. As far as work goes, all I have left to do is change the passwords for the accounts that I don’t use very often. These include: a) Virtual Link (who we buy our icons from), b) Component Source (who we buy most of our non-Microsoft software from), c) Norton (for our firewalls & anti-virus software), d) WinZip, e) Crystal Reports support account, f) Adobe, g) Dell, and a few more that I rarely use and have completely forgotten the passwords for.

On the Home Straight!

That’s probably most of the work stuff out of the way, now for my Personal Accounts.  In no particular order of precedence we have: i-Tunes, e-Bay (I’ve got some great Meccano brass gears, if anyone’s interested), Google, Barclay Card, a couple of on-line store accounts, Holiday Inn, British Gas, E-on, Cambridge Water, BT (home phones, not email), and the list goes on.

There are some accounts that I will probably never ever be able to use again, even if I want to. Fortunately I have never really bought into the whole “so-shul-meeja” concept, so I don’t have to worry about Facebook or Twitter or tosh like Friends Re-united. The closest I got to this was the family history research site ancestry.co.uk (one less “soonbesummer” to worry about).

So, I think I’m nearly there, but today my wife asked me if I could check on the progress of the planning application to replace the old stable next to our house. I went to http://www.huntsdc.gov.uk/ and guess what I was confronted with?

You Couldn’t Make it Up!

So even though I think that I must have changed getting on for a hundred passwords on about 10 devices (virtual or otherwise),maybe I have a bit further to go.

One last thing: I use Microsoft OneNote to keep track of things like lists of passwords, and I have updated this with all of the new instances of my 3 new passwords. Naturally, it’s password protected. Those of you familiar with Bertrand Russell’s famous paradox will appreciate the irony of:

 

ByGraham

A Developer’s Day – Wrestling with Filters

I Need to Make Some Space in my Head – What can I Throw Out?

This is the second article in an informal series of shock exposés in which I lift the lid on the white-knuckle ride which constitutes the daily rough and tumble of the E-Quip gerontocracy. Over the years I have written a lot of code, a fair proportion of which seems to work, and somehow or other I seem to remain in control of all the 0’s and 1’s floating around in my head. There are, however, two exceptions – bits of code which, while not exceptionally complex,  I seem to have to re-learn every time I look at them.

The first example is Transformation Matrices, which are central to our 3D graphics and CAD applications. In case you don’t know, if you represent a point in space as a 4-element column vector (called homogeneous coordinates), there are several matrices that you can multiply this vector by to do things like rotation, scaling, transformation etc. This is how CAD systems manipulate graphical objects. It’s not rocket science and doesn’t really involve anything more complex than matrix multiplication. Twenty or more years ago I wrote a graphics library to handle all this good stuff and, for the most part, it just works. It doesn’t need changing very often: once to add support for Windows (that will give you an idea how long this library has been around), again to add support for 32-bit processors (did I mention that this code has been around for a while?), once to convert it from C to C++, once to add support for 3D, and again to convert it to C#. The trouble is, every time I look at this code, it’s like it’s the first time I’ve ever seen it! I don’t just have to “brush up” on my theory, I have to dive back into Foley & Van Dam or reach for my old university lecture notes (yes, I do have all of my old lecture notes). These algorithms (or, perhaps more accurately, how I turned them into code over 20 years ago) just won’t stay in my head.

The second example is expression evaluation, which lies at the heart of filtering in all of our products. There is absolutely nothing difficult about parsing an expression from a given grammar; it’s a pretty common 1st-year undergraduate assignment which comes somewhere between the matriculation dinner and finding your coat-hook and locker.

A grammar defines a set of rules which are used to build an expression tree. Grammars are often described using BNF ( Backus – Naur Form), and they specify things like how factors are combined to form terms, and how terms combine to form expressions. A very simple example is shown below. Of course, being software developers we have to be different, and in the interests of bio-diversity, our trees have their roots at the top and their leaves at the bottom. I have yet to work out where worms and/or birds appear in these trees, yet alone which way up the nests need to go.

Each node of this tree has three key features:

1. An operator (+, – etc)

2. Two operands, Op1 & Op2

A really simple expression, like “1 + 2” would be represented by a single node tree where the operator is +, Op1 is 1 and Op2 is 2. In more complicated trees like the example above, both Op1 & Op2 can be other tree nodes.

Filter Expressions

A filter is also an expression, albeit a Boolean rather than an arithmetic one, and so they are handled in a very similar way. The tree is slightly different to a simple arithmetic expression because the operands are predicates, i.e. statements which can evaluate to true or false. The tree below is taken from the ICL Reference Documentation (ICL is a set of class libraries used for all Integra products): it is the parse tree for the filter: A = B OR ( A = 5 AND B = 3)

Note that brackets can be used to change the way that the tree is built, but do not actually appear in the tree itself. Turning this parse tree into SQL (that’s the language used to get data out of the database) is simply a case of doing a recursive, left-right tree walk of our upside-down tree, starting from the root (i.e. the bit nearest the sky) translating each node into SQL syntax as we go.

On the off-chance that you’re interested, compilers translate entire software programs into parse trees in exactly the same way as this, the only difference is that the grammar is more complex, with each programming language having its own grammar. Compilers often have a front end which builds the parse tree, and then a back end which does the tree-walk to turn the tree into assembler instructions.

Now, every time I have to make any changes to the code which implements this, I run straight into the old mental block and have to go right back to basics. Fortunately this isn’t code that changes very often, but, like transformation matrices, there just doesn’t seem to be any room for this in my head on a long-term basis.

What’s the Problem?

The thing that makes this complicated isn’t creating the parse tree itself; that really is pretty simple stuff. There are three things about filters that make them complicated-ish:

1. They use public names for data, rather than the names that are actually in the database. The predicate [Status = ‘Active’] must be translated into “EquipmentStatusShortName = ‘Active’“. This means that when we do the left-right tree walk (keep up!) to translate the filter into SQL, each public name has to be looked up in the database to find its equivalent database name.

2. They are defined in client-side code and have to be sent to the server. E-Quip uses a type of formatted text called XML to communicate between the client and the server, and unfortunately there are characters (<, > and ‘ are the most common) that have special meaning in XML but which are commonly used in filters. An expression like [Unit Price > 5000] can’t be sent directly to the server because XML is using the > character for its own purposes.

3. The filter processing code itself has many reserved words and characters which users also might want to include in filters. For example, the filter [Status = ‘Scrapped – Not in use’] actually includes the words “Not In” which have special meaning to the filter mechanism. We also use the [ ] characters to delimit predicates which makes building the parse tree much easier, but which cause problems if users actually want to include those characters in filter text. To make things even trickier, [ & ] are actually SQL wildcards: “Model LIKE 3[1-5]00” means all models where the name has a 3 followed by any number between 1 and 5, followed by 00, i.e 3100, 3200, 3300 etc.

Problems with BETWEEN

In fact, it is exactly this last point that has prompted me to write this article. A filter expression is a series of terms separated by OR’s, each of which can be a series of factors separated by AND’s, just like an arithmetic expression is a series of terms separated by + and -. You might not have noticed, but in version 2.0 support for the BETWEEN operation was added to E-Quip filters. Now, most predicates are made up of three bits: a field name to filter on (e.g. Status), an operator (e.g. =) and a comparison value (e.g. Active). Put these three together and you get the filter [Status = ‘Active’].

BETWEEN is unusual in that it doesn’t have three components, it has four, and the last two are separated by the word AND, e.g. [Purchase Date BETWEEN 1-JAN-2010 AND 31-DEC-2010]. This is particularly inconvenient, not only because of the extra operand, but we have already seen how the word AND is used to separate the factors in a filter expression. Although this doesn’t cause any problems when evaluating the filter, it does confuse the Filter Wizard when you open a saved Query Builder filter which contains BETWEEN. This relatively simple type of filter is always made up of a number predicates ANDed together, and the Filter Wizard uses the word “AND” to split a filter expression up into its constituent elements. It then writes each factor to a separate line in the Query Builder. This extra AND convinces the Query Builder that  the expression has two factors:

1. Purchase Date BETWEEN 1-JAN-2010

2. 31-DEC-2010

Neither of these terms is valid according to our grammar, and if you applied this filter you would get an error. Note that the problem only appeared in the Query Builder and not in any of the other Filter Wizard options, because only the Query Builder attempts to split an expression into individual terms.

The solution to the problem is pretty straightforward: because we are in charge of the translation we can use whatever words we like. So, instead of using BETWEEN … AND, I just changed the code to use FROM … TO. Hey presto – the extra AND has disappeared and sanity has been restored. We do a very similar thing for the LIKE predicate. Strictly speaking the opposite of LIKE is NOT LIKE but that extra word is problematic and makes the creation of the parse tree a tad more complex, so we use the word UNLIKE instead (try it – create a filter using the DOES NOT CONTAIN comparison and look at the filter that is produced).

So, it took me 5 minutes to come up with a solution to the problem, and then the rest of the day to dig through the documentation (yes, we do have documentation) to remind myself how this stuff all hangs together and to check to see if my simple fix would introduce other problems.

So that’s it for filters – until the next time. By the time I have clicked on Publish to put this article on the blog, everything that I have just re-taught myself about about filter expressions will have been flushed from my brain yet again. Perhaps I should be glad that there are only two things that seem not to fit in there!

 

 

 

 

 

 

 

ByGraham

A Developer’s Day – A Bad Day at Stansted!

The Lost Passport & Driving Licence Saga

I thought I would write a series of occasional articles about the kind of nonsense that fills my day, either at work or otherwise, and to kick this off I shall share with you how a software developer goes about replacing a lost driving licence. Think of the old “how many … does it take to change a light bulb” jokes, and you might see why we developers are not often allowed out to meet other people! In my defence, I had plenty of spare time because I was confined to bed for a few days, and Maria insisted that I was not allowed to work, otherwise I would have been hiding under the covers with my torch dashing off E-Quip code – honest!

On the way back from Dublin a couple of weeks ago I left my wallet (containing £100 and all my plastic) and passport (which had my driving licence in the same holder) on the long-stay parking bus at Stansted airport. This would be a pain at the best of times, but I’m  off to Italy shortly, for which the passport and driving licence would come in very handy.

So, first things first – all plastic cancelled. Next, how do I get a new passport and driving licence in 2 weeks. I can fast-track my passport application and get that within a week, so no problem there, but what about the driving licence? I can apply on-line, for which I need my current driving licence and passport numbers, but of all the important numbers that I can remember (the wife’s birthday, wedding anniversary, my old RAF service number, my first ever phone number, the number of yards in a mile and the coronation date of George III), I have no record of either of these two numbers. Incidentally, George III was crowned in 1760, which is the number of yards in a mile, so that should perhaps only count as one number.

The only way that I can get a driving licence without knowing my passport number is to apply by post, but unfortunately DVLA want all the same documents as the Passport Office, and neither of these will accept copies. This means that unless I want to wait 6 weeks I have no choice but to find these two numbers.

Getting my driving  licence number wasn’t too bad. When I changed my car a few months ago the nice people at Peugeot photocopied my licence, so all I had to do was call them, convince them that I was who I said I was, and they sent me a copy. So far, so good. Now for the passport number. I’d just come back from Dublin, and I still had a bit of my boarding pass – surely your passport  number is printed on that?

Diversion 1 – Boarding Passes

Had all this happened a year ago that would have been the end of the story, but Ryanair no longer print passport numbers on boarding passes.

However, just because the passport number doesn’t appear on the printed boarding doesn’t mean it’s not there! When your aircraft eventually arrives and you finally get to the end of the queue at the gate, there must be some way for the keen, efficient, polite & courteous Ryanair staff to check your passport number against your boarding pass. No prizes for guessing that it must be embedded in the data matrix bar code.

Unfortunately, I don’t happen to own a data matrix scanner, but I do own a mobile phone, and a little bit of searching found a couple of free “apps” for my phone which claim to be able to read bar codes. A after a little bit of experimentation with a packet of Paracetamol and a jar of anchovies I’m ready to move up from bar codes to data matrix. Unfortunately, while my phone could happily read bar codes, even though the software had an option to read data matrix labels, it failed at every attempt, so I’m back to square one.

By now I should have spotted all the characteristic signs of a project that was set to waste a huge amount of time, but, undaunted, I stumbled on. A bit more Googling told me that the standard symbology adopted by IATA for boarding passes is called PDF-417, and its use is defined by a standard called BCBP (Bar Coded Boarding Pass). Still more Googling found some web sites that can decode PDF-417 labels, and having far more spare time than common sense I persevered and headed to http://www.datasymbol.com/barcode-recognition-sdk/barcode-reader/online-barcode-decoder.html. Success!

This is what is hidden in the boarding pass:

M1STANBURY/GRAHAM LIKWFK STNDUBFR 225 088Y000 104 10C DEMO

4D315354414E425552592F47524148414D2020202020204C494B57464B2053544E445542465220

203232352030383859303030203130342020313043202020202020202044454D4F

0 10 146 10 0 603 0 603 146

Surely my passport number is buried in that little lot somewhere – perhaps I should have stayed on the IATA BCBP web site a little longer! Any code is easier to crack if you have a crib, just ask Alan Turing, and I will admit that knowing that my name is Graham Stanbury and that the boarding bass was for flight FR 225 from Stansted to Dublin did contribute significantly to simplifying the decryption. But I still couldn’t see anything that looked like a passport number, so back the IATA BCBP web site to dig into the standard.

The first character, “M” means that this is a multi-leg flight, and the next number indicates how many legs, so “M1” means that this is a multi-leg flight consisting of 1 leg. Hmmm, I can’t say  I’m entirely sure about a multi-leg journey which consists of a single leg as a philosophical concept, but I’ll leave that one for the philosophers.

After the flight number FR 225 there is 088, which, according to the standard is the date of the flight, i.e. day 88 which is 29th March. The “Y” refers to the compartment number. Now I’ve never found a Ryanair flight which has an upstairs bar area, or even a little blue curtain between the front and the back, so I guess this is always “Y” for Ryanair. Then we have the seat number – for any Ryainair frequent flyers I’ll just allow a few minutes for the laughter to die down.

Then we’ve got the check-in sequence number, 104. I guess that this means that I was the 104th passenger to check in for this flight. The next “1” is the passenger status, which I think means “adult”.

After that we have OC, which as we all know is 12 in hexadecimal. Which means that the next chunk of information is 12 characters long. All the programmers among you will have immediately spotted the sequence of 20’s in:

202020202020202044454D4F0 10 146 10 0 603 0 603 146

Now all programmers know instinctively that 20 is hexadecimal for 32, which is the ASCII space character, which means there are then 8 spaces, followed by “DEMO”. i.e. 44 is 68 in hexadecimal, which is the ASCII code for D, and 45 represents E, etc. Should we be concerned that the word DEMO appears in here, almost like this is part of a test system which has been released by mistake? So that takes care of the 12 characters,  now it looks like we’re getting close to a passport number! Is this it?

0 10 146 10 0 603 0 603 146

And that’s round about where the wheel fell off!  The last block of boarding pass data is the security section, and it is airline specific – well, it bloody would be, wouldn’t it! There is a group of 9 numbers, and a passport number is 9 digits long, but these aren’t ASCII characters representing numbers, because those only range from 48 to 57 (0 – 9), which means the passport number is encrypted.

Still, if anyone fancies a challenge (believe it or not there is a limit to how much time I will waste on this kind of adventure), it can’t be very complex because it has to be readable by any scanner at the airport, and it can’t use 1-way encryption for the same reason.

Diversion 2 – Internet Browser Caches

Ok, so I have to give up on the boarding pass strategy, but this is where we software developers show that we can waste time on an industrial scale – did I give up? Not likely!

In order to get the boarding pass in the first place I had to check-in on-line. Surely the details of that would still be in my web browser cache? Can’t I just open up the history window of my browser and go back and look at the web page where I originally entered the passport information?

I use Google Chrome, and this is what my cache looked like for the day in question:

I clicked on each of these and got nothing but error messages. So, give up or dig deeper? No contest!

Did you know that you could type chrome:cache as a URL into Google Chrome? This shows you the browser cache in a lot more detail, including HTTP headers and the binary file contents, but still failed to yield a passport number.

Yet more Googling uncovered a tool called the Chrome Cache Viewer, which shows more information about your cache than you could imagine, but did it show my passport number? That was a rhetorical question, by the way –  no, it didn’t!

The Solution

There must be some way that equipped with all of this technology I can find my blasted passport number. Well yes, there is. Peter reminded me that back in 2010  our accountant had photocopied both of our passports. One email – problem solved – Italy here we come 🙂

So was all of this a waste of time? Far from it. I’ve learned some new stuff, I can now use my mobile phone to check bar codes on jars of anchovies, and it’s even looking like I might end up with a driving licence and a passport. Result!