u03b - data type implementationthenry/csc301/old/u03b1/u03... · 2013-05-30 · •!high cost in...
TRANSCRIPT
Data Types
Data TypesIntroduction
Primitive Data Types
Composite Data Types
Structured Data Types
Abstract Data Types
Data Types
Introduction
IntroductionData Type is a
Collection of Data Objects
Possible r-values for a memory cell
Set of operations on those objects
Descriptor
Collection of attributes for a variable
Data TypesBinding a Data Type binds:
Range of possible values
Set of operations
How the data will be stored
Structure of descriptor
dope vector
Signature
implications of operations on the data
Data TypesPrimitive Data Types (Scalar)
Not defined in terms of other types
Usually ties to hardware implementation
Composite Data Types
Data type made up of similar primitives
Complex structures created by compiler
Data TypesStructured Data Types
An aggregate of other data types
Heterogeneous Composite of Data Types
Abstract Data Type
Combination of data and methods that operate on that data
Objects and Classes
Data Types
Common Primitive Data Types
Primitive TypesBoolean
Character
Integer
Floating Point
Decimal (historical)
BooleanTrue or False (1 or 0)
Improves program readability
Implementation
Single Bit - saves storage
Byte - usually faster access
C implemented as integer
False - 0
True - any other value
CharacterStored as either
ASCII Codes - 7 bits (127 core characters)
Unicode - 16 bits (international characters)
Possible Values
Character Data (A...Z, a...z)
Numeric Digits (0...9)
Special Symbols (! @ # $ % ^ & ...)
Escape Codes (nul, cr, ack, bs, sp, esc, ...)
IntegerRepresentation of 2’s Compliment
Commonly a four-byte representation
Range: -(231) to 231-1
Options:
signed, unsigned, long, short, byte
Integer
85
-86
0 1 0 1 0 1 0 1+1
0 1 0 1 0 1 1 0
1 0 1 0 1 0 1 00 1 0 1 0 1 0 1
14
1664
24
1664Sign Bit
Sign Bit
Floating PointIEEE Standard 754 for storage
32- and 64-bit precisions
Numbers consist of three fields
Sign Field
Exponent
Mantissa
Floating PointSign Field (S)
One bit
Zero is positive
Exponent (E)
Excess-127 notation
Values range from 0 to 255 (for 8-bit exponent)
Represent exponents ranging from -127 to 128
Exponent is biased
Floating PointMantissa (M)
First bit of mantissa is always oneIt is not explicitly storedInserted by hardwareEffectively yields and extra bit of precision
! Parameters Value
! E=255 and M ! 0 An invalid number
! E=255 and M = 0 "
! 0<E<255 2{E-127}(1.M)
! E=0 and M ! 0 2 {-126}.M
! E=0 and M=0 0
Floating Point32-bit precision
8-bit Exponent, 23-bit Mantissa
Range 10-38 to 1038
64-bit precision (Double Precision)
11-bit Exponent, 52-bit Mantissa
Range 10-308 to 10308
Floating PointIEEE 32-bit Floating Point Representation
Sign bit Exponent 8-bits Mantissa 24-bits
00000000000000000000000000000000
+1.0 20*1 2(127-127)*1.0
Floating PointIEEE 32-bit Floating Point Representation
Sign bit Exponent 8-bits Mantissa 24-bits
+1.0 20*1 2(127-127)*1.0
00111111100000000000000000000000
Floating PointIEEE 32-bit Floating Point Representation
Sign bit Exponent 8-bits Mantissa 24-bits
+1.5 20*1.5 2(127-127)*1.1
00111111110000000000000000000000
Floating PointIEEE 32-bit Floating Point Representation
Sign bit Exponent 8-bits Mantissa 24-bits
-5 22*1.25 2(129-127)*1.01
11000000101000000000000000000000
DecimalBinary Coded Decimal (BCD)
Stores a fixed number of digits
One or Two digits stored per byte
Nine (9) is 1001 binary (four bits)
For business applications (COBOL)
Very accurate
Limited Range
Wastes memory
02 UNIT-PRICE PICTURE IS 999V99.02 BAL-ON-HAND PICTURE IS 9(5).
Data Types
Composite Data Types
Composite TypesIncrease Readability
Common Implementations
Ordinal (Enumerated) Types
Sub-range Data Types
String Data Type
Arrays
Enumerated TypesList (enumerate) possible data values
Values associated with positive integers
Values become “Symbolic Constants”
Greatly increase program readability
Colors, Months, Days of Week
Increase Reliability
Compiler can check operations and ranges
Enumerated TypesC, C++typedef enum {RED, BLUE, GREEN} colorType;colorType color = RED;
Pascaltype colorType = (RED, BLUE, GREEN);var color : colorType;color := BLUE;
JavaEnumerated interface
Pascal, C, C++ do not allow reuse
of names across type definitions.
Sub-Range TypesContiguous subsequence of ordinal type
Behaves as parent type
Increased reliability and readability
Compiler can insert code to restrict range
Pascal:
type posInt = 0 .. MAXINT;
C++:
Range<0, MAXINT> i = x;
Data Types
Character Strings
Character StringsComposed of a character sequence
ASCII Characters (7/8-bit)
Unicode Characters (16-bit)
String Specific Operations
Increase Writability
Character StringsInstantiating Strings
‘test’ vs “test”
Concatenation & + || strcat()
Relational Operations < >
Lexicographical Ordering (by code)
Java - .compareTo() method
Input/Output Formating
Character StringsSubstring Operations
Selection based on position
Selection based on pattern
Substring Assignment Overlay Issue
str1 = “stringTest”str1[2:5] = str1[1:4]print str1
Character StringsSubstring Assignment Overlay Issue
str1 = “stringTest” str1[2:5] = str1[1:4] print str1
What’s printed?
sssssgTest - if character by character copysstrigTest - if block copy
Character StringsMemory Allocation for Strings
Static Length Strings
Limited Dynamic Length Strings
Dynamic Length Strings
Descriptor Record(Compile- Time)
Data Storage
Character StringsStatic Length String
Fixed Declared Length
FORTRAN, COBOL, Pascal
Padded with blanks
Most implementations output entire declared length.
Length (14)Address
Static String
Two parts of a string
R E L A T I V I T Y
Character StringsLimited Dynamic Length String
Variable Length to Declared Bounds
R E L A T I V I T Y
Descriptor Record(Compile-Time)Dynamic Maintenance
Maximum Length (14)
Address
Limited Dynamic String
Current Length (10)
Length of Current String
R E L A T I V I T Y
Maximum Length (14)
Address
Limited Dynamic String
Character StringsLimited Dynamic Length String
Variable Length to Declared Bounds
C, C++
C & C++ do not track
current length in descriptor.
Instead, string is “null terminated” -- \0 or 0x00 hex
Character StringsDynamic Length String
Unbound Length
Perl, Javascript, PHP
Descriptor Record (Run-Time)
AddressDynamic String
R E L A T I V I T Y
String is always “null terminated”
• Only characters in current
string are output.•!Provides potential space
savings•!High cost in storage
management
Data Types
Arrays
ArraysArray Concepts
Array Storage
Array Access
Array Slices
Associative Arrays
ArraysAn aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate relative to the first element.
Ordered sequence of identical objects
Ordering determined by a scalar object
Usually integer or enumerated data
Referred to as the Subscript or Index
ArraysDesign Issues
What types are legal for subscripts?
Are subscripting expressions in element references range checked?
When are subscript ranges bound?
When does allocation take place?
What is the maximum number of subscripts?
Can array objects be initialized?
Are any kind of slices allowed?
ArraysArray Initialization
List of values placed in array in the order in which the array elements are stored in memory
Indexing
Specifying an element’s position
Mapping function from indices to elements
map(array_name, index_value) " an element
ArraysArray Operations
APL - all about arrays
AssignmentRHS can be an aggregate constant or an array name
Concatenationfor all single-dimensioned arrays
Relational operators what is exact meaning?))
Intrinsics (functions or operators)matrix multiplication, vector dot product
Array StorageStorage Allocation
Static
Fixed Stack Dynamic
Stack Dynamic
Heap Dynamic
Array StorageStatic
Loaded into memory at program load
Provides execution efficiency
No allocation/deallocation penalty
FORTRAN 77
Array StorageFixed Stack Dynamic
Subscript range is statically bound [ ]
Storage is bound at elaboration (creation)
Activation Record Instance
Space efficiency
C/C++ locals not declared static
Array StorageStack Dynamic
Subscript range and storage are dynamic
Becomes fixed once variable is instatiated
Fixed for lifetime of variable
Flexible
Array size need not be known until it is to be used
Array StorageHeap Dynamic
Subscript range and storage are dynamic
Bindings are never fixed
All Java arrays (objects) are heap dynamic
PHP, Perl & Javascript
Arrays can change size as needed
Array AccessTo store and retrieve data values
Determine element’s L-value (address)
Array subscript range
Upper & Lower Bounds
array[L1:U1, L2:U2]
Lower bound is often 0 (zero)
Array Access
Single-dimensionarray
Multi-dimensionarray
Array Descriptor (Dope Vector)
Array AccessDetermining Element’s Address
var arr: array[-2 .. 2, -3 .. 3] of int;
arr[1, 2] := 6;
Allocate storage beginning at !
total_bytes =(U1-L1+1)*(U2-L2+1)*element size
Array AccessL-value access function:
es - (element size) based on element type:
Integer - 4 bytes
Float - 4 bytes (single) or 8 bytes (double)
Char - 1 byte
Structures - based on size of pointer (4 bytes)
Array AccessL-Value Access Function:
row_size = numberOfElementsInRow * elementSize
row_size = (U2 - L2 + 1) * es
row = i - L1
col = j - L2
Array Access
L-value(arr[i, j])
= ! + row * row_size + col * es
L-Value Access Function:
For the statement:
arr[1,2] = 6;
Where is the 6 stored?
Array Accessj
i
arr[i, j]
Actual Storage#
arr[L1, L2]arr[L1, L2+1]arr[L1, L2+2]
arr[L1, U2]arr[L1+1, L1]
arr[-2, -3]
arr[-2, 3]
arr[1, 2]
arr[L1 : U1, L2 : U2]
Logical Storagearr[-2 .. 2, -3 .. 3]
arr[1, 2] ?
#
Array AccessL-value of arr[1,2] => L-value(arr[i, j])
= # + rows * row_size + cols * es
= # + (i - L1) * row_size + (j - L2) * es
= # + (i-L1) * (U2-L2+1) * es + (j-L2) * es
= # + es * ( (i-L1) * (U2-L2+1) + (j-L2) )
= # + 4 * ( (1-(-2)) * (3-(-3)+1) + (2-(-3)) )
= # + 4 * ( (3) * (7) + (5) )
= # + 4 * (26 element offset)
#
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
6
Array AccessVirtual Origin (VO)
Element at i= 0; j = 0; $ arr[0, 0]
L-value(arr[0, 0])
= # + es * ( (i-L1) * (U2-L2+1) + (j-L2) )
= # + 4 * ( (0-(-2)) * (3-(-3)+1) + (0-(-3)) )
= # + 4 * ( (2) * (7) + (3) )
= # + 4 * (17 element offset)
= # + 68 byte offset
#
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
VO
Array AccessDope Vector
use a dope vector to access an array element
VO - virtual origin (address)
row size
element size
ARRAYSTORAGE
Array SlicesSlices
A substructure of an array - row, column, plane
A referencing mechanism
Very useful in languages with array operations
Slice Examples (FORTRAN 90):
INTEGER MAT (1:4, 1:4)
MAT(1:4, 1) - the first column
MAT(2, 1:4) - the second row
Array Slices
Associative ArraysAn unordered collection of data elements indexed by an equal number of values (keys)
Associative Arrays in Perl (PHP is similar)
Declare and Initialize%hi_temps = ("Monday" => 77, "Tuesday" => 79,…);
Index and Assign value$hi_temps{"Wednesday"} = 83;
Remove Elementsdelete $hi_temps{"Tuesday"};
Data Types
Structured Data Types
Records and Unions
RecordsA heterogeneous aggregate of data elements where individual elements are identified by names
Individual Elements - “Fields”
struct date { char *month; int day; int year; };
RecordsC / C++
Declarations:
struct date { char *month; int day; int year; };
struct date myDate;
Structure Type
typedef struct { char *month; int day; int year; } dateType;
dateType myDate;
User Type Definition
RecordsC / C++
Use:
dateType myDate;
myDate.day = 13;myDate.year = 2004;
Field Access Dereferencing (Pointers)
dateType* pDate;
pDate->day = 13;pDate->year = 2004;
RecordsRecord Descriptor
Compile Time
RecordsComparing Records and Arrays
Array element access is slower
Subscripts are dynamic (data[i])
Field names are static (myDate.day)
UnionsVariables allowed to store different type values at different times during execution
Pascal:
type intreal = record tagg : Boolean of true : (blint : integer); false : (blreal : real); end;