identification numbers and error detection

39
Identification Numbers and Error Detection Meredith Wachs

Upload: teo

Post on 08-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Identification Numbers and Error Detection. Meredith Wachs. Where do you find identification numbers?. Checks Credit cards Driver’s licenses VIN Numbers Zip Codes SSN ISBN Numbers Bar Codes/UPC’s (Universal Product Codes). Check Digits. Used to reduce error - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identification Numbers and Error Detection

Identification Numbers and Error Detection

Meredith Wachs

Page 2: Identification Numbers and Error Detection

Where do you find identification numbers?

Checks Credit cards Driver’s licenses VIN Numbers Zip Codes SSN ISBN Numbers Bar Codes/UPC’s (Universal Product Codes)

Page 3: Identification Numbers and Error Detection

Check Digits

Used to reduce error “Grocery items, credit cards, overnight

mail, magazines, personal checks, traveler’s checks, soft drink cans, and automobiles” have check digits (COMAP)

Page 4: Identification Numbers and Error Detection

Congruence mod n

“a is congruent to b, mod n, if n divides

a-b” (Stillwell, Elements of Number Theory) For example, 27 is congruent to 6 mod 3

because 27-6=21, which is divisible by three. Similarly, a number is congruent (mod n) to its

remainder after being divided by n; for example, 20/3=6 R2, so 20=2 (mod 3)

Page 5: Identification Numbers and Error Detection

Money Order

COMAP example- The identification number on a money order is 63024383845.

The last digit is the check digit, so the first 10 digits should be 5 mod 9.

How do we divide such a large number by 9? How do we know this?

Page 6: Identification Numbers and Error Detection

Calculating Divisibility Rules

Take a number z=abcdefg This could also be written as z = g*10^0 +

f*10^1 + e*10^2 + d*10^3 + c*10^4 + b*10^5 + a*10^6

Let’s replace each 10^n by its congruence mod 9

Page 7: Identification Numbers and Error Detection

Divisibility by 9 continued

10^0=1=1 (mod 9) 10^1=10=1 (mod 9) 10^2=1*1=1 (mod 9) By analogy, we see that every place will

have a weight of 1, so a(1)+b(1)+c(1)+d(1)+e(1)+f(1)+g(1) leaves the same remainder (mod 9) as abcdefg, so this is our divisibility rule.

Page 8: Identification Numbers and Error Detection

Divisibility by 11

We’ll proceed in the same way: 10^0=1 mod 11 10^1=-1 mod 11 10^2= -1*-1=1 mod 11 10^3= 10^2*10^1=1*-1=-1 mod 11 In this way, we see that abcdefg=(g+e+c+a)-

(f+d+b) (mod 11) Ex. 3458679=(9+6+5+3)-(7+8+4)=4 (mod 11),

so is not divisible by 11.

Page 9: Identification Numbers and Error Detection

Money Order Revisited

If the check digit only checks the divisibility of the sum, what can be overlooked?

Page 10: Identification Numbers and Error Detection

Potential Problems with the Check Digit

Any number can be replaced with its congruence mod n (so 0 can be replaced with 9 in this example)

Also, any digits can be easily transposed.

Page 11: Identification Numbers and Error Detection

Other Uses of the Check Digit

Traveler’s checks and Euro banknotes use the check digit, but the entire number, including the check digit, should be divisible by 9.

If you know the divisor rule, you can easily figure out what the check digit should be. For example, if a traveler’s check ID number is 3487956321 and it has a divisor rule of 9, what should the check digit be?

Page 12: Identification Numbers and Error Detection

Answer

The sum of the digits is 48, and 48+6=54, which is divisible by 9, so the check digit should be 6.

Page 13: Identification Numbers and Error Detection

More Sophisticated- UPC’s

A 12-digit number that is found along the bottom of a barcode

A BBBBB CCCCC D “A” is the type of good, “B” is the

manufacturer’s code, “C” is the product code, “D” is the check digit (COMAP)

Page 14: Identification Numbers and Error Detection

UPC Error Prevention

Take the code A BCDEF GHIJK L. Going from left to right, calculate 3A+B+3C+D+3E+F+3G+H+3I+J+3K+L. If it isn’t divisible by 10, it is an incorrect UPC.

This “detects all single-position errors and about 89% of other errors” (COMAP 326).

Try this on your own!

Page 15: Identification Numbers and Error Detection

The U.S. Banking System

To identify a certain bank, it has an identification number (the first string on the bottom of the check). The string is ABCDEFGH, and the check digit is I. “I” must be the last digit of the resulting sum of 7A+3B+9C+7D+3E+9F+7G+3H

Page 16: Identification Numbers and Error Detection

Here, we can see that 7(1)+3(2)+9(1)+0+0+0+7(2)+3(4)= 48. Since 8 is the check digit, this is a valid number.

http://blog.wellsfargo.com/GuidedByHistory/images/Earth_Check_large.jpg

Page 17: Identification Numbers and Error Detection

Credit Card Numbers

A more effective algorithm, which is used for credit cards, is called Codabar, and is also used by “libraries, blood banks, photofinishing companies, German banks, and the South Dakota driver’s license department” (COMAP 327).

Codabar “allows computers to detect 100% of single-position errors and about 98% of other common errors” (COMAP 328).

The Codabar algorithm was developed by Hans Peter Luhn (1896-1964) and was patented in 1960 (http://www.merriampark.com/anatomycc.htm).

Page 18: Identification Numbers and Error Detection

The Algorithm

Assume a 16-digit card number, with the final number being the check digit. Add every digit in the odd-numbered spaces (going from left to right) and multiply by two.

Then add the number of digits in odd-numbered spaces that >4.

Finally, add all the digits in the even-numbered spaces (except for the check digit).

The check digit will need to be whatever will make the total divisible by 10.

Ex. What is the check digit for card number 124785943967210?

Page 19: Identification Numbers and Error Detection

Solution Ex. What is the check digit for card

number 124785943967210?1+4+8+9+3+6+2+0=33*2=6666+3=6969+2+7+5+4+9+7+1= 104104+6=110, so the check digit is

6.

Page 20: Identification Numbers and Error Detection

ISBN Numbers

The 10-digit International Standard Book Number detects “100% of single errors and 100% of transposition errors” (COMAP 328)

An ISBN of A-BCDE-FGHI-J, with J as the check digit, is valid if 10A+9B+8C+7D+6E+5F+4G+3H+2I+J is divisible by 11.

Page 21: Identification Numbers and Error Detection

ISBN’s continued

Example: 0-387-95587-9 10(0)+9(3)+8(8)+7(7)+6(9)+5(5)+4(5)+3(8)

+2(7)+9=286 286 is divisible by 11 because (6+2)-8=0,

which is divisible by 11.

Page 22: Identification Numbers and Error Detection

Why does this work every time?

COMAP proof: Let there be an error in the B slot called B’. Then both calculations (with and without

errors) must be divisible by 11 in order for B’ to not be detected. Then the difference between the calculations must be divisible by 11.

Page 23: Identification Numbers and Error Detection

Proof continued…

So (10A+9B+8C+7D+6E+5F+4G+3H+2I+J) – (10A+9B’+8C+7D+6E+5F+4G+3H+2I+J)= 9(B-B’).

Since B,B’ <=9, B-B’ cannot be 11, and so 9(B-B’) cannot be divisible by 11 unless B=B’.

Page 24: Identification Numbers and Error Detection

Another Example

What is the check digit for the ISBN 0-7167-1910?

Page 25: Identification Numbers and Error Detection

Solution

10(0)+9(7)+8(1)+7(6)+6(7)+5(1)+4(9)+3(1)+2(0)= 199

The next number divisible by 11 is 199+10=209.

To represent 10, ISBN numbers have an X.

Page 26: Identification Numbers and Error Detection

Code 39

This uses the digits 0-9 and letters A-Z (which correspond to 10-35)

Code 39 is used by the DoD, automotive companies, and the health industry (COMAP 329).

A 15-character string is validated by whether 15a+14b+13c+….1*o is divisible by 36 (with o as the check digit).

The VIN system is a more complicated alphanumeric system.

Page 27: Identification Numbers and Error Detection

Bar Codes

“To decode the information in a bar code, a beam of light is passed over the bars and spaces via a scanning device, such as a handheld wand of a fixed-beam device. The dark bars reflect very little light back to the scanner, whereas the light spaces reflect much light. The differences in reflection intensities are detected by the scanner and converted to strings of 0’s and 1’s that represent specific numbers and letters.” (COMAP 334)

Page 28: Identification Numbers and Error Detection

Postnet Codes

A bar code for a ZIP+4 code (+1 check digit)

There are 52 long or short bars, with one “guard bar” on either side and the remaining 50 bars grouped into 10 groups of 5, with 2 long bars and 3 short bars each.

The check digit makes the sum of all ten numbers divisible by 10.

Page 29: Identification Numbers and Error Detection

http://www-math.cudenver.edu/~wcherowi/jcorner/barcodes.html

Page 30: Identification Numbers and Error Detection

UPC Bar Codes

Each digit is made up of seven modules There are guard bars, a center division, and a

difference between manufacturer and product numbers to make reading the codes as accurate as possible.

Bar codes for UPC’s have been in use since a pack of Wrigley Juicy Fruit gum was scanned on June 26, 1974 in Marsh’s Supermarket in Troy, Ohio. The first barcodes were made by National Cash Register, but smearing ink problems soon made IBM the top contender in the market. (http://en.wikipedia.org/wiki/Barcode)

Page 31: Identification Numbers and Error Detection

Digit Manufacturer Product

0 0001101 1110010

1 0011001 1100110

2 0010011 1101100

3 0111101 1000010

4 0100011 1011100

5 0110001 1001110

6 0101111 1010000

7 0111011 1000100

8 0110111 1001000

9 0001011 1110100

http://en.wikipedia.org/wiki/Universal_Product_Code

Is the UPC above valid? (3*0+1*3+3*6+…3*5+1*2 = 60 = 0 mod 10, so yes, it is valid.)

Page 32: Identification Numbers and Error Detection

Illinois Driver’s License Numbers

In contrast to Social Security numbers, an Illinois driver’s license number can help reconstruct a person’s surname (by sound, not by spelling), first and middle initials, date of birth, and gender.

These forms of ID numbers are also used in the National Archives, the Library of Congress, and in genealogy research. (COMAP 341-2).

Page 33: Identification Numbers and Error Detection

Soundex Coding System for Surnames

1. Delete all occurrences of h and w. (so “Wachs” becomes “acs”)

2. Assign number as follows: a,e,i,o,u,y = 0; b,f,p,v = 1; c,g,j,k,q,s,x,z = 2; d,t = 3; l = 4; m,n = 5; r = 6 (so “acs” = 022)

3. If two or more letters with the same numeric value are adjacent, omit all but the first (so we’re left with “ac”)

Page 34: Identification Numbers and Error Detection

Soundex continued…

4. Delete the first character of the original name if still present.

5. Delete all occurrences of a,e,i,o,u,y. (so we’re left with “c”)

6. Retain only the first three digits corresponding to the remaining letters; append trailing 0’s if fewer than three letters remain; precede the three digits with the first letter of the surname (so we have W200)

Because of the way this is coded, many errors in spelling are taken into account.

*taken directly from COMAP 342

Page 35: Identification Numbers and Error Detection

The Middle Digits-First Initial

InitialCodeInitialCodeInitialCodeInitialCode A 0 H 320 O 640 V 860 B 60 I 400 P 660 W 880 C 100 J 420 Q 700 X 940 D 160 K 500 R 720 Y 960 E 200 L 520 S 780 Z 980 F 240 M 540 T 800 G 280 N 620 U 840

Page 36: Identification Numbers and Error Detection

The Middle Digits-Middle Initial

InitialCodeInitialCodeInitialCodeInitialCode A 1 H 8 O 14 V 18 B 2 I 9 P 15 W 19 C 3 J 10 Q 15 X 19 D 4 K 11 R 16 Y 19 E 5 L 12 S 17 Z 19 F 6 M 13 T 18 G 7 N 14 U 18

Page 37: Identification Numbers and Error Detection

Calculating the Middle Digits

Add the code for the first initial to the code for the middle initial. For my initials, MJ, you have 540+10=550.

So far, my number is W200-550. All middle digits information taken from

http://www.highprogrammer.com/alan/numbers/dl_us_shared.html

Page 38: Identification Numbers and Error Detection

The Last Five Digits

In Illinois, the last five digits retain the birth date and sex of the person.

Each month is assumed to have 31 days (starting with January 1 as 001). My birthday, March 2, is therefore 2*31=62+2=064. If male, you are done. If female, add 600 to this number (so I am 664).

Put the last two digits of your birth year before this number. Thus, the last five digits of my Illinois driver’s license is 8-8664. My complete number is W200-5508-8664. As the website author points out, IL not having overflow numbers makes it likely to have multiple people with the same number printed on their license.

*COMAP 343

Page 39: Identification Numbers and Error Detection

Conclusions

We have seen several ways of identifying objects or people with numbers and the likelihood of error and the ways error is caught with each one. As COMAP points out on p. 329, “Like many practices in the ‘real world,’ historical accident and lack of knowledge about existing methods seem to be the explanation [for having so many means of identification].” We see this especially in the difference between driver’s license numbers and SSN, which were assigned prior to computers and the development of many of the systems used (like Soundex).