u03b - data type implementationthenry/csc301/old/u03b1/u03... · 2013-05-30 · •!high cost in...

Data Types

Data TypesIntroduction

Primitive Data Types

Composite Data Types

Structured Data Types

Abstract Data Types

Data Types

Introduction

IntroductionData Type is a

Collection of Data Objects

Possible r-values for a memory cell

Set of operations on those objects

Descriptor

Collection of attributes for a variable

Data TypesBinding a Data Type binds:

Range of possible values

Set of operations

How the data will be stored

Structure of descriptor

dope vector

Signature

implications of operations on the data

Data TypesPrimitive Data Types (Scalar)

Not defined in terms of other types

Usually ties to hardware implementation


Data type made up of similar primitives

Complex structures created by compiler

Data TypesStructured Data Types

An aggregate of other data types

Heterogeneous Composite of Data Types

Abstract Data Type

Combination of data and methods that operate on that data

Objects and Classes

Data Types

Common Primitive Data Types

Primitive TypesBoolean

Character

Integer

Floating Point

Decimal (historical)

BooleanTrue or False (1 or 0)

Improves program readability

Implementation

Single Bit - saves storage

Byte - usually faster access

C implemented as integer

False - 0

True - any other value

CharacterStored as either

ASCII Codes - 7 bits (127 core characters)

Unicode - 16 bits (international characters)

Possible Values

Character Data (A...Z, a...z)

Numeric Digits (0...9)

Special Symbols (! @ # $ % ^ & ...)

Escape Codes (nul, cr, ack, bs, sp, esc, ...)

IntegerRepresentation of 2’s Compliment

Commonly a four-byte representation

Range: -(231) to 231-1

Options:

signed, unsigned, long, short, byte

Integer

85

-86

0 1 0 1 0 1 0 1+1

0 1 0 1 0 1 1 0

1 0 1 0 1 0 1 00 1 0 1 0 1 0 1

14

1664

24

1664Sign Bit

Sign Bit

Floating PointIEEE Standard 754 for storage

32- and 64-bit precisions

Numbers consist of three fields

Sign Field

Exponent

Mantissa

Floating PointSign Field (S)

One bit

Zero is positive

Exponent (E)

Excess-127 notation

Values range from 0 to 255 (for 8-bit exponent)

Represent exponents ranging from -127 to 128

Exponent is biased

Floating PointMantissa (M)

First bit of mantissa is always oneIt is not explicitly storedInserted by hardwareEffectively yields and extra bit of precision

! Parameters Value

! E=255 and M ! 0 An invalid number

! E=255 and M = 0 "

! 0<E<255 2{E-127}(1.M)

! E=0 and M ! 0 2 {-126}.M

! E=0 and M=0 0

Floating Point32-bit precision

8-bit Exponent, 23-bit Mantissa

Range 10-38 to 1038

64-bit precision (Double Precision)

11-bit Exponent, 52-bit Mantissa

Range 10-308 to 10308

Floating PointIEEE 32-bit Floating Point Representation

Sign bit Exponent 8-bits Mantissa 24-bits

00000000000000000000000000000000

+1.0 20*1 2(127-127)*1.0



+1.0 20*1 2(127-127)*1.0

00111111100000000000000000000000



+1.5 20*1.5 2(127-127)*1.1

00111111110000000000000000000000



-5 22*1.25 2(129-127)*1.01

11000000101000000000000000000000

DecimalBinary Coded Decimal (BCD)

Stores a fixed number of digits

One or Two digits stored per byte

Nine (9) is 1001 binary (four bits)

For business applications (COBOL)

Very accurate

Limited Range

Wastes memory

02 UNIT-PRICE PICTURE IS 999V99.02 BAL-ON-HAND PICTURE IS 9(5).

Data Types


Composite TypesIncrease Readability

Common Implementations

Ordinal (Enumerated) Types

Sub-range Data Types

String Data Type

Arrays

Enumerated TypesList (enumerate) possible data values

Values associated with positive integers

Values become “Symbolic Constants”

Greatly increase program readability

Colors, Months, Days of Week

Increase Reliability

Compiler can check operations and ranges

Enumerated TypesC, C++typedef enum {RED, BLUE, GREEN} colorType;colorType color = RED;

Pascaltype colorType = (RED, BLUE, GREEN);var color : colorType;color := BLUE;

JavaEnumerated interface

Pascal, C, C++ do not allow reuse

of names across type definitions.

Sub-Range TypesContiguous subsequence of ordinal type

Behaves as parent type

Increased reliability and readability

Compiler can insert code to restrict range

Pascal:

type posInt = 0 .. MAXINT;

C++:

Range<0, MAXINT> i = x;

Data Types

Character Strings

Character StringsComposed of a character sequence

ASCII Characters (7/8-bit)

Unicode Characters (16-bit)

String Specific Operations

Increase Writability

Character StringsInstantiating Strings

‘test’ vs “test”

Concatenation & + || strcat()

Relational Operations < >

Lexicographical Ordering (by code)

Java - .compareTo() method

Input/Output Formating

Character StringsSubstring Operations

Selection based on position

Selection based on pattern

Substring Assignment Overlay Issue

str1 = “stringTest”str1[2:5] = str1[1:4]print str1

Character StringsSubstring Assignment Overlay Issue

str1 = “stringTest” str1[2:5] = str1[1:4] print str1

What’s printed?

sssssgTest - if character by character copysstrigTest - if block copy

Character StringsMemory Allocation for Strings

Static Length Strings

Limited Dynamic Length Strings

Dynamic Length Strings

Descriptor Record(Compile- Time)

Data Storage

Character StringsStatic Length String

Fixed Declared Length

FORTRAN, COBOL, Pascal

Padded with blanks

Most implementations output entire declared length.

Length (14)Address

Static String

Two parts of a string

R E L A T I V I T Y

Character StringsLimited Dynamic Length String

Variable Length to Declared Bounds

R E L A T I V I T Y

Descriptor Record(Compile-Time)Dynamic Maintenance

Maximum Length (14)

Address

Limited Dynamic String

Current Length (10)

Length of Current String

R E L A T I V I T Y

Maximum Length (14)

Address

Limited Dynamic String

Character StringsLimited Dynamic Length String

Variable Length to Declared Bounds

C, C++

C & C++ do not track

current length in descriptor.

Instead, string is “null terminated” -- \0 or 0x00 hex

Character StringsDynamic Length String

Unbound Length

Perl, Javascript, PHP

Descriptor Record (Run-Time)

AddressDynamic String

R E L A T I V I T Y

String is always “null terminated”

• Only characters in current

string are output.•!Provides potential space

savings•!High cost in storage

management

Data Types

Arrays

ArraysArray Concepts

Array Storage

Array Access

Array Slices

Associative Arrays

ArraysAn aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate relative to the first element.

Ordered sequence of identical objects

Ordering determined by a scalar object

Usually integer or enumerated data

Referred to as the Subscript or Index

ArraysDesign Issues

What types are legal for subscripts?

Are subscripting expressions in element references range checked?

When are subscript ranges bound?

When does allocation take place?

What is the maximum number of subscripts?

Can array objects be initialized?

Are any kind of slices allowed?

ArraysArray Initialization

List of values placed in array in the order in which the array elements are stored in memory

Indexing

Specifying an element’s position

Mapping function from indices to elements

map(array_name, index_value) " an element

ArraysArray Operations

APL - all about arrays

AssignmentRHS can be an aggregate constant or an array name

Concatenationfor all single-dimensioned arrays

Relational operators what is exact meaning?))

Intrinsics (functions or operators)matrix multiplication, vector dot product

Array StorageStorage Allocation

Static

Fixed Stack Dynamic

Stack Dynamic

Heap Dynamic

Array StorageStatic

Loaded into memory at program load

Provides execution efficiency

No allocation/deallocation penalty

FORTRAN 77

Array StorageFixed Stack Dynamic

Subscript range is statically bound [ ]

Storage is bound at elaboration (creation)

Activation Record Instance

Space efficiency

C/C++ locals not declared static

Array StorageStack Dynamic

Subscript range and storage are dynamic

Becomes fixed once variable is instatiated

Fixed for lifetime of variable

Flexible

Array size need not be known until it is to be used

Array StorageHeap Dynamic

Subscript range and storage are dynamic

Bindings are never fixed

All Java arrays (objects) are heap dynamic

PHP, Perl & Javascript

Arrays can change size as needed

Array AccessTo store and retrieve data values

Determine element’s L-value (address)

Array subscript range

Upper & Lower Bounds

array[L1:U1, L2:U2]

Lower bound is often 0 (zero)

Array Access

Single-dimensionarray

Multi-dimensionarray

Array Descriptor (Dope Vector)

Array AccessDetermining Element’s Address

var arr: array[-2 .. 2, -3 .. 3] of int;

arr[1, 2] := 6;

Allocate storage beginning at !

total_bytes =(U1-L1+1)*(U2-L2+1)*element size

Array AccessL-value access function:

es - (element size) based on element type:

Integer - 4 bytes

Float - 4 bytes (single) or 8 bytes (double)

Char - 1 byte

Structures - based on size of pointer (4 bytes)

Array AccessL-Value Access Function:

row_size = numberOfElementsInRow * elementSize

row_size = (U2 - L2 + 1) * es

row = i - L1

col = j - L2

Array Access

L-value(arr[i, j])

= ! + row * row_size + col * es

L-Value Access Function:

For the statement:

arr[1,2] = 6;

Where is the 6 stored?

Array Accessj

i

arr[i, j]

Actual Storage#

arr[L1, L2]arr[L1, L2+1]arr[L1, L2+2]

arr[L1, U2]arr[L1+1, L1]

arr[-2, -3]

arr[-2, 3]

arr[1, 2]

arr[L1 : U1, L2 : U2]

Logical Storagearr[-2 .. 2, -3 .. 3]

arr[1, 2] ?

#

Array AccessL-value of arr[1,2] => L-value(arr[i, j])

= # + rows * row_size + cols * es

= # + (i - L1) * row_size + (j - L2) * es

= # + (i-L1) * (U2-L2+1) * es + (j-L2) * es

= # + es * ( (i-L1) * (U2-L2+1) + (j-L2) )

= # + 4 * ( (1-(-2)) * (3-(-3)+1) + (2-(-3)) )

= # + 4 * ( (3) * (7) + (5) )

= # + 4 * (26 element offset)

#

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

6

Array AccessVirtual Origin (VO)

Element at i= 0; j = 0; $ arr[0, 0]

L-value(arr[0, 0])

= # + es * ( (i-L1) * (U2-L2+1) + (j-L2) )

= # + 4 * ( (0-(-2)) * (3-(-3)+1) + (0-(-3)) )

= # + 4 * ( (2) * (7) + (3) )

= # + 4 * (17 element offset)

= # + 68 byte offset

#

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

VO

Array AccessDope Vector

use a dope vector to access an array element

VO - virtual origin (address)

row size

element size

ARRAYSTORAGE

Array SlicesSlices

A substructure of an array - row, column, plane

A referencing mechanism

Very useful in languages with array operations

Slice Examples (FORTRAN 90):

INTEGER MAT (1:4, 1:4)

MAT(1:4, 1) - the first column

MAT(2, 1:4) - the second row

Array Slices

Associative ArraysAn unordered collection of data elements indexed by an equal number of values (keys)

Associative Arrays in Perl (PHP is similar)

Declare and Initialize%hi_temps = ("Monday" => 77, "Tuesday" => 79,…);

Index and Assign value$hi_temps{"Wednesday"} = 83;

Remove Elementsdelete $hi_temps{"Tuesday"};

Data Types

Structured Data Types

Records and Unions

RecordsA heterogeneous aggregate of data elements where individual elements are identified by names

Individual Elements - “Fields”

struct date { char *month; int day; int year; };

RecordsC / C++

Declarations:

struct date { char *month; int day; int year; };

struct date myDate;

Structure Type

typedef struct { char *month; int day; int year; } dateType;

dateType myDate;

User Type Definition

RecordsC / C++

Use:

dateType myDate;

myDate.day = 13;myDate.year = 2004;

Field Access Dereferencing (Pointers)

dateType* pDate;

pDate->day = 13;pDate->year = 2004;

RecordsRecord Descriptor

Compile Time

RecordsComparing Records and Arrays

Array element access is slower

Subscripts are dynamic (data[i])

Field names are static (myDate.day)

UnionsVariables allowed to store different type values at different times during execution

Pascal:

type intreal = record tagg : Boolean of true : (blint : integer); false : (blreal : real); end;

u03b - data type implementationthenry/csc301/old/u03b1/u03... · 2013-05-30 · •!high cost in...

Documents