1 data types. 2 two principal purposes two principal purposes provide implicit context forprovide...

11

Data TypesData Types

22

Two principal purposesTwo principal purposes• provide implicit context for provide implicit context for

operators and subroutine calls in generaloperators and subroutine calls in general e.g. a + b, new p(), overloadinge.g. a + b, new p(), overloading

• limit the set of operations that may be limit the set of operations that may be performedperformed

e.g. add character to a record?e.g. add character to a record? type systems help to catch typing (and type systems help to catch typing (and

thinking) errorsthinking) errors

33

Chapter contentsChapter contents Meaning and purpose of typesMeaning and purpose of types Type equivalence & compatibilityType equivalence & compatibility

• Are types T1 and T2 the same?Are types T1 and T2 the same?• Can we use a value of type T1 in a context Can we use a value of type T1 in a context

expecting a value of type T2?expecting a value of type T2? Syntactic, semantic, pragmatic issues of Syntactic, semantic, pragmatic issues of

most common (and important) typesmost common (and important) types• recordsrecords• arraysarrays• pointers (also naming issues & heap pointers (also naming issues & heap

management)management)• strings, sets, files (also I/O in general)strings, sets, files (also I/O in general)

44

Type SystemsType Systems Computer hardware Computer hardware

• can can interpretinterpret bit sequences in various ways bit sequences in various ways instructions, addresses, charactersinstructions, addresses, characters integer & real numbers (of various lengths)integer & real numbers (of various lengths)

• machine does not machine does not knowknow which interpretation is the which interpretation is the correct onecorrect one

assembly languages can operate the memory assembly languages can operate the memory locations in any way they wishlocations in any way they wish

High-level languagesHigh-level languages• always associate variables with types using always associate variables with types using type systemtype system• to provide the context & to check errorsto provide the context & to check errors

55

Components of a type systemComponents of a type system

MechanismMechanism• to define types andto define types and• to associate them with other language to associate them with other language

constructsconstructs Rules forRules for

• type type equivalenceequivalence,,• type type compatibilitycompatibility, and, and• type type inferenceinference

derive the type of an expressionderive the type of an expression

from its parts and from its contextfrom its parts and from its context

66

What ‘things’ must have types?What ‘things’ must have types? Anything that may have a value or refer to Anything that may have a value or refer to

something having a valuesomething having a value• constants (named & explicit literals)constants (named & explicit literals)• variablesvariables• record fieldsrecord fields• parameters & return valuesparameters & return values• subroutines themselves (if 1subroutines themselves (if 1st st or 2or 2ndnd class) class)• all expressions containing theseall expressions containing these

‘‘type of name’ and ‘type of the object named’ can type of name’ and ‘type of the object named’ can be different!be different!• but usually type-compatiblebut usually type-compatible• important in polymorphic (e.g. o-o) languagesimportant in polymorphic (e.g. o-o) languages

the same name can refer to objects of different typesthe same name can refer to objects of different types

77

Type checkingType checking

““Process of ensuring that the program follows Process of ensuring that the program follows the type rules”the type rules”• violation = violation = type clashtype clash

Strongly typed languagesStrongly typed languages• language implementation prevents inappropriate use language implementation prevents inappropriate use

of objectsof objects Statically typed languagesStatically typed languages

• strongly typed andstrongly typed and• type checking can be carried out at compile-timetype checking can be carried out at compile-time• used often even when some of the tests are run-time used often even when some of the tests are run-time

88

Some example languagesSome example languages Java: strongly typed but not statically typed (type casts)Java: strongly typed but not statically typed (type casts) Ada: ‘almost’ statically typedAda: ‘almost’ statically typed Pascal: variant records create a loophole in strong typingPascal: variant records create a loophole in strong typing ANSI C: union types, subroutines with varargs, array/pointer ANSI C: union types, subroutines with varargs, array/pointer

–interoperability–interoperability ‘‘good old’ C: implementations check rarely anything at run-good old’ C: implementations check rarely anything at run-

timetime Dynamic scope, late binding Dynamic scope, late binding dynamic type checking dynamic type checking

• LISP, Scheme, SmallTalkLISP, Scheme, SmallTalk Polymorphism does not necessarily imply dynamic checkingPolymorphism does not necessarily imply dynamic checking

• Eiffel & type inheritanceEiffel & type inheritance• ML, Haskell & type inferenceML, Haskell & type inference

99

Definition of typesDefinition of types Type declarationType declaration

• gives a name to some typegives a name to some type• happens in some scopehappens in some scope

Type definitionType definition• describes the type itselfdescribes the type itself

Declaration <> definitionDeclaration <> definition• although they quite often appear combinedalthough they quite often appear combined• e.g. TYPE intvec = ARRAY[1..10] OF Integer;e.g. TYPE intvec = ARRAY[1..10] OF Integer;

Declaring without definingDeclaring without defining• forward declarations, opaque types, abstract data types, ...forward declarations, opaque types, abstract data types, ...

Defining without declaringDefining without declaring• ‘‘anonymous’ typesanonymous’ types• e.g. VAR x: ARRAY[1..10] OF Integer;e.g. VAR x: ARRAY[1..10] OF Integer;

1010

three points of viewthree points of view• denotationaldenotational• constructiveconstructive• abstraction-basedabstraction-based

1111

Denotational view to typesDenotational view to types

Type = set of valuesType = set of values• domaindomain• the values the objects of that type can takethe values the objects of that type can take• if constant value v if constant value v T then v is of type T T then v is of type T• if v if v T for all values v of x then x is of type T T for all values v of x then x is of type T

Widely used in formalizing semantics of Widely used in formalizing semantics of programming languagesprogramming languages• record: n-tuple, array: functionrecord: n-tuple, array: function• assignment: mapping store assignment: mapping store store store

Ignores implementation issuesIgnores implementation issues

1212

Abstraction-based viewAbstraction-based view

Pioneered by Simula-67 and SmalltalkPioneered by Simula-67 and Smalltalk Type is an Type is an interfaceinterface

• set of operations allowed for that typeset of operations allowed for that type• explains the meaning and purpose of the typeexplains the meaning and purpose of the type

Operations should Operations should • have well-defined semantics (pre- & post-have well-defined semantics (pre- & post-

conditions)conditions)• respect the data invariant of the typerespect the data invariant of the type

1313

Constructive viewConstructive view Pioneered by Algol W and Algol 68Pioneered by Algol W and Algol 68 Tells ‘how the type is built’Tells ‘how the type is built’ Built-in typesBuilt-in types

• a.k.a primitive, pre-defineda.k.a primitive, pre-defined• integer, Boolean, ...integer, Boolean, ...

Composite typesComposite types• created by applying a type constructor to one or more created by applying a type constructor to one or more

simpler typessimpler types• ‘‘simpler types’ may be composite, toosimpler types’ may be composite, too• typical constructors: record, array, settypical constructors: record, array, set

Rest of the chapter focuses on this constructive Rest of the chapter focuses on this constructive point of viewpoint of view

1414

Built-in types...Built-in types... Note: some (but not many) languages may have Note: some (but not many) languages may have

exceptions to what is said hereexceptions to what is said here Built-in = same as the ones the hardware Built-in = same as the ones the hardware

supportssupports BooleansBooleans

• implemented as 1-byte quantitiesimplemented as 1-byte quantities• 0: false, 1: true (other values illegal)0: false, 1: true (other values illegal)• C: no boolean type (int 0 = false, anything else = true)C: no boolean type (int 0 = false, anything else = true)

CharactersCharacters• ASCII encoding ASCII encoding one byte one byte• UNICODE UNICODE 2 bytes (Java) 2 bytes (Java)• C++: char & wide charC++: char & wide char

1515

...Built-in types...Built-in types

IntegersIntegers• different lengths (C, Fortran)different lengths (C, Fortran)• signed and unsigned (C, Modula-2: signed and unsigned (C, Modula-2:

cardinal)cardinal) Floating-point numbersFloating-point numbers

• different lengths (different lengths ( precision & precision & magnitude)magnitude)

1616

Some non-common builtins...Some non-common builtins...

Note: languages which don’t have these as Note: languages which don’t have these as built-ins quite commonly provide them via built-ins quite commonly provide them via librarieslibraries

Fixed-point numbers (Ada)Fixed-point numbers (Ada)• can be implemented as integerscan be implemented as integers• fast summation (if same precision)fast summation (if same precision)• can express large magnitudes compared to can express large magnitudes compared to

floating point numbers with same number of floating point numbers with same number of bitsbits

Decimal types (Cobol, PL/I)Decimal types (Cobol, PL/I)• some processors support BCD arithmeticssome processors support BCD arithmetics

1717

...some non-common builtins...some non-common builtins

Complex numbers (Fortran, LISP)Complex numbers (Fortran, LISP)• implemented as a pair of f.p. numbersimplemented as a pair of f.p. numbers

Rational numbers (Scheme, LISP)Rational numbers (Scheme, LISP)• pair of integerspair of integers

Arbitrary precision integers Arbitrary precision integers (SmallTalk)(SmallTalk)• multiple wordsmultiple words

1818

Some terminologySome terminology

DiscreteDiscrete type type• countable domaincountable domain

OrdinalOrdinal type type• each element has a successor and a each element has a successor and a

predecessor (except min & max predecessor (except min & max element)element)

ScalarScalar type type• elements of the type can ‘express scale’elements of the type can ‘express scale’• all numeric typesall numeric types

1919

Enumeration typesEnumeration types introduced in Pascalintroduced in Pascal

type weekday = (sun, mon, tue, wed, thu, fri, sat);type weekday = (sun, mon, tue, wed, thu, fri, sat); Ordered set of named elementsOrdered set of named elements

• comparisons make sensecomparisons make sensemon < frimon < fri

• predecessor, successorpredecessor, successorfri = succ (thu)fri = succ (thu)

• enumeration-controlled loopsenumeration-controlled loopsfor today := mon to fri do begin ...for today := mon to fri do begin ...

• indexing arraysindexing arraysvar number_of_absent : array [weekday] of integervar number_of_absent : array [weekday] of integer

• each element has its unique each element has its unique ordinal valueordinal value mappings mappings Pascal: ord(c) Pascal: ord(c) ASCII code of c (if c is of type Char), ASCII code of c (if c is of type Char),

ord(c) ordinal valueord(c) ordinal valuechr(int) inversechr(int) inverse

Ada: weekday’pos(mon), weekday’val(1)Ada: weekday’pos(mon), weekday’val(1)• Ada, ANSI C: ordinal values other than ‘default ones’Ada, ANSI C: ordinal values other than ‘default ones’• Ada: overloading of enum names is allowedAda: overloading of enum names is allowed

2020

Why not to use just integers?Why not to use just integers?• more readable programsmore readable programs

Why not to use just integer constants?Why not to use just integer constants?• C enum is just syntactic sugarC enum is just syntactic sugar

enum weekday {sun, mon, tue, wed, thu, fri, sat};enum weekday {sun, mon, tue, wed, thu, fri, sat};

equivalent toequivalent totypedef int weekday;typedef int weekday;const weekday sun = 0, mon = 1, tue = 2,const weekday sun = 0, mon = 1, tue = 2,

wed = 3, thu = 4, fri = 5, sat = wed = 3, thu = 4, fri = 5, sat = 6;6;

• different in othersdifferent in otherscompiler can catch errors when enumerations are real compiler can catch errors when enumerations are real types on their own, e.g., one can not use an integer in types on their own, e.g., one can not use an integer in the place of an enumeration typethe place of an enumeration type

2121

Subrange typesSubrange types Values comprise a contiguous subset of another discrete Values comprise a contiguous subset of another discrete

typetype• base type, parent typebase type, parent type• integer, character, enumeration, another subrangeinteger, character, enumeration, another subrange

Ada makes a distinction betweenAda makes a distinction betweentype testscore is new integer range 0..100;type testscore is new integer range 0..100;subtype workday is weekday range mon..fri;subtype workday is weekday range mon..fri;

• derived types (not assignment compatible)derived types (not assignment compatible)• constraint subtypesconstraint subtypes

Advantages of subrangesAdvantages of subranges• ‘‘automatic documentation’ of an integer rangeautomatic documentation’ of an integer range• compiler can generate range checking codecompiler can generate range checking code• compiler can ‘compress’ the subrange (120..125 needs only 3 compiler can ‘compress’ the subrange (120..125 needs only 3

bits)bits)• usually the implementation takes the ‘expected’ amountusually the implementation takes the ‘expected’ amount

type water_temperature = 273..373; (* degrees Kelvin *)type water_temperature = 273..373; (* degrees Kelvin *)

2222

Common composite typesCommon composite types

Records (structures)Records (structures)• collection of collection of fieldsfields• Cartesian product of (field) domainsCartesian product of (field) domains

Arrays (vectors, tables)Arrays (vectors, tables)• function from function from index index typetype to to componentcomponent type type• strings are quite often ‘just’ arrays of strings are quite often ‘just’ arrays of

characters with some special operationscharacters with some special operations Variant recordsVariant records

• union of field typesunion of field types• alternative fields under one name, only one alternative fields under one name, only one

alternative is valid at a timealternative is valid at a time

2323

...common composite types...common composite types SetsSets

• powerset of its (discrete) base typepowerset of its (discrete) base type Pointers (l-values)Pointers (l-values)

• references to objects of pointer’s base typereferences to objects of pointer’s base type• often implemented as machine addresses (not necessarily!)often implemented as machine addresses (not necessarily!)• requirement for recursive data structuresrequirement for recursive data structures

ListsLists• sequences of elements (like arrays)sequences of elements (like arrays)• recursive definition instead of an indexing functionrecursive definition instead of an indexing function• variable lengthvariable length• fundamental to functional & logic languagesfundamental to functional & logic languages

FilesFiles• data on mass storage devicesdata on mass storage devices• like arrays (if ‘seek’ allowed) with known ‘current position’like arrays (if ‘seek’ allowed) with known ‘current position’• like lists (if only sequential access allowed)like lists (if only sequential access allowed)

2424

Orthogonality and typesOrthogonality and types

Must everything have a type?Must everything have a type?• statements as expressions: void typestatements as expressions: void type• expressions as statements: cast result to voidexpressions as statements: cast result to void

foo_index = insert_in_symbol_table (foo);foo_index = insert_in_symbol_table (foo);

......(void) insert_in_symbol_table (bar)(void) insert_in_symbol_table (bar)

Pascal is not orthogonalPascal is not orthogonal• variant part must be the last of a recordvariant part must be the last of a record• functions can return only scalars & pointersfunctions can return only scalars & pointers• no true subroutine typeno true subroutine type

ML almost completely orthogonalML almost completely orthogonal

2525

...orthogonality and types...orthogonality and types

LiteralsLiterals• commonly found for scalar typescommonly found for scalar types• but not for composite onesbut not for composite ones

Ada: aggregatesAda: aggregates• positional and named (composite) assignmentspositional and named (composite) assignments• can be used as an assignment statementcan be used as an assignment statement

type person is recordtype person is recordname : string (1..10);name : string (1..10);age : integer;age : integer;

end record;end record;p, q : person;p, q : person;A, B : array (1..10) of integer;A, B : array (1..10) of integer;......p := (”Jane Doe ”, 37);p := (”Jane Doe ”, 37);q := (age => 36, name => ”John Doe ”);q := (age => 36, name => ”John Doe ”);A := (1, 0, 3, 0, 3, 0, 3, 0, 0, 0);A := (1, 0, 3, 0, 3, 0, 3, 0, 0, 0);B := (1 => 1, 3 | 5 | 7 => 3, others => 0);B := (1 => 1, 3 | 5 | 7 => 3, others => 0);

C,C++: initializersC,C++: initializers• can be used only in declarationscan be used only in declarations

2626

Type checkingType checking Typed objectsTyped objects

• every definition of an object must specify also the every definition of an object must specify also the object’s typeobject’s type

Typed contextsTyped contexts• rules of the language tell what types are allowed in each rules of the language tell what types are allowed in each

contextcontext• sometimes finding this out requires sometimes finding this out requires type inferencetype inference

Type checkingType checking• may an object of type T be used in some given context?may an object of type T be used in some given context?• if types are if types are equivalentequivalent (same): yes (same): yes• if types are if types are compatiblecompatible : depends on the language : depends on the language

casts / conversionscasts / conversions coercioncoercion nonconverting castsnonconverting casts

• type inferencetype inference

2727

Type equivalenceType equivalence Two principal waysTwo principal ways Structural equivalenceStructural equivalence

• based on the based on the content content of definitionsof definitions• (roughly put) types are the same if they(roughly put) types are the same if they

consist of same components andconsist of same components and they are composed in the same waythey are composed in the same way

• Algol 68, Modula-3, C & ML (with various ‘wrinkles’)Algol 68, Modula-3, C & ML (with various ‘wrinkles’) Name equivalenceName equivalence

• based on the lexical occurrence of definitionsbased on the lexical occurrence of definitions• each definition defines a new typeeach definition defines a new type• more popular in recent languages (Java, Ada)more popular in recent languages (Java, Ada)

Note: separate compilation creates some problemsNote: separate compilation creates some problems• see section 9.6see section 9.6

2828

What is structurally equivalent?What is structurally equivalent? See examples on page 331See examples on page 331 What differences are important and what not?What differences are important and what not?

• format of declarationformat of declaration• order of fields in a recordorder of fields in a record• representations of same constant valuesrepresentations of same constant values• index values of an arrayindex values of an array

Algorithm to decide structural equivalenceAlgorithm to decide structural equivalence• expand all definitions until no user-defined types are leftexpand all definitions until no user-defined types are left• check if the 2 expanded definitions are the samecheck if the 2 expanded definitions are the same• recursive types give some trouble (must match graphs)recursive types give some trouble (must match graphs)

2929

Problems with structural equivalenceProblems with structural equivalence

Unintentional equivalence (p. 332)Unintentional equivalence (p. 332)• programmer defines 2 types that have nothing programmer defines 2 types that have nothing

in commonin common different namedifferent name

• but the type system thinks they are the samebut the type system thinks they are the same same internal structuresame internal structure

Name equivalence resolves thisName equivalence resolves this• ‘‘if programmer takes the effort to define 2 if programmer takes the effort to define 2

types then he most probably has the intention types then he most probably has the intention that those types are different’that those types are different’

3030

Name equivalenceName equivalence

AliasingAliasing• define a type using just the name of another typedefine a type using just the name of another type

ProblemProblem• are these 2 types the same (name equivalent) or not?are these 2 types the same (name equivalent) or not?• essential for Modula-2 example to work essential for Modula-2 example to work

3131

• but sometimes we do but sometimes we do notnot want this want this

3232

Strict name equivalenceStrict name equivalence• aliased types are distinctaliased types are distinct

Loose name equivalence (Pascal, Modula-2)Loose name equivalence (Pascal, Modula-2)• aliased types are considered equivalentaliased types are considered equivalent

Ada: ‘best of both worlds’Ada: ‘best of both worlds’• derivedderived type: incompatible with base type type: incompatible with base typetype celsius_temp is new integer;type celsius_temp is new integer;type fahrenheit_temp is new integer;type fahrenheit_temp is new integer;

• subtypesubtype: compatible: compatiblesubtype stack_element is integer;subtype stack_element is integer;

• Modula-3: branded types (otherwise structural Modula-3: branded types (otherwise structural equivalence)equivalence)

3333

Strict and looseStrict and loose

TYPE A = BTYPE A = B• strict: declaration and a definitionstrict: declaration and a definition• loose: just a declaration, A shares the loose: just a declaration, A shares the

definition of Bdefinition of B Example on p. 333Example on p. 333

• strict: p & q, r & ustrict: p & q, r & u• loose: r & s & u, p & qloose: r & s & u, p & q• structural: all 6 variablesstructural: all 6 variables

3434

Type conversionsType conversions Contexts expecting values of a specific typeContexts expecting values of a specific type

• assignmentassignment• expressions with overloaded operatorsexpressions with overloaded operators• subroutine callssubroutine calls

Suppose types must match exactlySuppose types must match exactly explicit type conversions requiredexplicit type conversions required

Conversion depends on the typesConversion depends on the types• types are structurally equivalent, conversion just makes them name types are structurally equivalent, conversion just makes them name

equivalent equivalent no run-time codeno run-time code

• different subsets of values, common values are represented in the different subsets of values, common values are represented in the same waysame way

e.g. signed & unsigned integerse.g. signed & unsigned integers check that the value is in the common area, then use the machine check that the value is in the common area, then use the machine

representation as suchrepresentation as such• different low-level representationsdifferent low-level representations

must use some mapping routinemust use some mapping routine 32 bit integer 32 bit integer 64 bit float: ok 64 bit float: ok opposite direction: loss of precision (round/trunc), overflowopposite direction: loss of precision (round/trunc), overflow

3636

Nonconverting type castsNonconverting type casts

Change the type of the value Change the type of the value withoutwithout changing the underlying implementationchanging the underlying implementation• useful in systems programminguseful in systems programming• example 1: memory allocationexample 1: memory allocation

heap is allocated as an array of (say) integersheap is allocated as an array of (say) integers it can contain addresses and different user-defined it can contain addresses and different user-defined

data structuresdata structures

• example 2: high-performance arithmeticsexample 2: high-performance arithmetics treat IEEE floating point number as a recordtreat IEEE floating point number as a record use exponent, sign & mantissa as integersuse exponent, sign & mantissa as integers

3737

...nonconverting casts...nonconverting casts AdaAda

• generic subroutine ‘unchecked_conversion’generic subroutine ‘unchecked_conversion’ CC

• type cast type cast run-time conversion with no checking run-time conversion with no checking• nonconverting casts possible by ‘clever’ use of pointersnonconverting casts possible by ‘clever’ use of pointers

r = *((float *) &n);r = *((float *) &n);

• also possible with union types (and variant records in also possible with union types (and variant records in other languages)other languages)

C++C++• static_cast: type conversionstatic_cast: type conversion• reinterpret_cast: nonconvertingreinterpret_cast: nonconverting• dynamic_cast: run-time checkdynamic_cast: run-time check

Dangerous!Dangerous!

3838

Why type compatibility?Why type compatibility?

A := BA := B• type of B must be compatible with the type of type of B must be compatible with the type of

AA A + BA + B

• types of A & B must be compatible with integer types of A & B must be compatible with integer type or with float typetype or with float type

C := p(A,B)C := p(A,B)• types of A & B must be compatible with the types of A & B must be compatible with the

types of the formal parameters of ptypes of the formal parameters of p• return value of p must be type compatible with return value of p must be type compatible with

CC

3939

Examples of type compatibilityExamples of type compatibility

Ada: type S is compatible with type T Ada: type S is compatible with type T iffiff• S & T are equivalent orS & T are equivalent or• S is a subtype of T (or vice versa) orS is a subtype of T (or vice versa) or• S & T are subtypes of the same type orS & T are subtypes of the same type or• S & T are arrays with same dimensions, S & T are arrays with same dimensions,

ranges and component typesranges and component types PascalPascal

• integers can be used in the place of integers can be used in the place of realsreals

4040

Implementing type compatibilityImplementing type compatibility ScenarioScenario

• A & B are type compatible A & B are type compatible A := B allowed A := B allowed• A & B have different semantics (e.g. subrange) A & B have different semantics (e.g. subrange)

compiler must generate type checking codecompiler must generate type checking code• A & B have different low-level representation A & B have different low-level representation compiler compiler

must convert B to the type of Amust convert B to the type of A CoercionCoercion

• implicit type conversion provided automatically by the implicit type conversion provided automatically by the compilercompiler

• may require run-time codemay require run-time code checks (Ada coercions need only these)checks (Ada coercions need only these) actual conversionsactual conversions

4242

To coerce or not?To coerce or not?

CoercionCoercion• allows types to be mixed without explicit allows types to be mixed without explicit

indication from the programmerindication from the programmer• weakens significantly type securityweakens significantly type security• ‘‘the weaker the type system, the more the weaker the type system, the more

coercions the language provides’ (Fortran & C)coercions the language provides’ (Fortran & C) most numeric types can be intermixedmost numeric types can be intermixed compiler coerces results ‘back and forth’ when compiler coerces results ‘back and forth’ when

necessarynecessary

ExampleExample

4444

...to coerce or not...to coerce or not Most modern languages try toMost modern languages try to

• get closer to strong typing and get closer to strong typing and • further from coercionsfurther from coercions

But not C++But not C++• motivation: coercions are the natural way to support motivation: coercions are the natural way to support

data abstraction & program extensibilitydata abstraction & program extensibility• extremely rich programmer-extensible set of coercion extremely rich programmer-extensible set of coercion

rulesrules• programmer can define coercion functions for his own programmer can define coercion functions for his own

classesclasses• add overloading and templates to this and you’ll have add overloading and templates to this and you’ll have

the most complicated type system ever createdthe most complicated type system ever created overloadingoverloading

4545

Coercion and literalsCoercion and literals Are literals polymorphic?Are literals polymorphic?

• constant nil and all pointer typesconstant nil and all pointer types• empty set and all set typesempty set and all set types

Usually treated as special cases in the type Usually treated as special cases in the type systemsystem• constants have their own (pre-defined) typesconstants have their own (pre-defined) types• these types are coerced when necessary (even if the these types are coerced when necessary (even if the

language doesn’t support other coercions)language doesn’t support other coercions) Ada ‘solution’Ada ‘solution’

• integer constants have type universal_integerinteger constants have type universal_integer• compatible with all types derived from integer compatible with all types derived from integer • similarly for real numberssimilarly for real numbers

4646

Generic referencesGeneric references References to objects of any typeReferences to objects of any type

• systems programmingsystems programming• generic container classesgeneric container classes• easy to implement because machine addresses are of easy to implement because machine addresses are of

the same size no matter the base typethe same size no matter the base type• C/C++: void *, Java: Object, Clu: any, Modula-2: address, C/C++: void *, Java: Object, Clu: any, Modula-2: address,

Modula-3: refany, ...Modula-3: refany, ... Type safety?Type safety?

• generic reference type has usually no operations generic reference type has usually no operations ok ok• assignment p = q when assignment p = q when

p & q refer to different typesp & q refer to different types we must make a dynamic check that the base types are we must make a dynamic check that the base types are

compatiblecompatible

4747

Type-safe generic assignmentsType-safe generic assignments Make objects self-descriptiveMake objects self-descriptive

• system can reason out in the run-timesystem can reason out in the run-time what is the type of the object andwhat is the type of the object and what operations it supportswhat operations it supports

ImplementationImplementation• augment objects with a augment objects with a type tagtype tag• consumes some additional space (1 word)consumes some additional space (1 word)• Java: A = (class of A) BJava: A = (class of A) B

explicit type cast requiredexplicit type cast required runtime check with class of B and class of Aruntime check with class of B and class of A example (next slide)example (next slide)

• Eiffel: A ?= B (instead of A := B)Eiffel: A ?= B (instead of A := B)• C++: dynamic_castC++: dynamic_cast

No type tags No type tags unchecked conversions unchecked conversions• C++ provides type tags only for polymorphic types (speed C++ provides type tags only for polymorphic types (speed

issues)issues)

4949

Type inferenceType inference Type checking ensures thatType checking ensures that

• components of an expressioncomponents of an expressionare type compatible with the expected component types are type compatible with the expected component types of that expressionof that expression

• but how to find out the ‘type of an expression’?but how to find out the ‘type of an expression’? Often easyOften easy

• function call: corresponding function result typefunction call: corresponding function result type• assignment statement: type of assigned valueassignment statement: type of assigned value

Problematic case: operations that do not preserve Problematic case: operations that do not preserve the types of their operandsthe types of their operands• operations on subrangesoperations on subranges• operations on (some) composite typesoperations on (some) composite types

5050

Arithmetics on subrangesArithmetics on subranges See example on p. 341See example on p. 341

• what is the type of ‘a + b’?what is the type of ‘a + b’? new range 10..40?new range 10..40?

• Pascal (and descendants)Pascal (and descendants) base type of the subrange (integer in this case)base type of the subrange (integer in this case)

for-loop in Adafor-loop in Ada• subrange tells the type of the index variablesubrange tells the type of the index variable• for compatibility: type = base type of range boundsfor compatibility: type = base type of range bounds

avoiding run-time checksavoiding run-time checks• compiler can keep track on min/max boundscompiler can keep track on min/max bounds• some checks may be avoided this way (or half of the check)some checks may be avoided this way (or half of the check)• sometimes we may catch even semantic errors (low bound 1 > sometimes we may catch even semantic errors (low bound 1 >

high bound 2)high bound 2)• not always possible (user-defined functions, p. 342)not always possible (user-defined functions, p. 342)

5151

Operations on composite typesOperations on composite types

The type of the result of an operation is The type of the result of an operation is different from types of operandsdifferent from types of operands

Example: strings in AdaExample: strings in Ada””abc” & ”defg”abc” & ”defg”

• string is an ‘incomplete’ typestring is an ‘incomplete’ type• string of length n is compatible with string of length n is compatible with anyany array array

of characters of length nof characters of length n• the actual range does not matterthe actual range does not matter the type of the result of string concatenation the type of the result of string concatenation

depends on the contextdepends on the context

5252

Example: sets in Pascal and ModulaExample: sets in Pascal and Modula• compatible types if operands of the same base compatible types if operands of the same base

typetype• dynamic semantic checkdynamic semantic check• avoid by keeping track of the minimum and avoid by keeping track of the minimum and

maximum members, the values known to be in maximum members, the values known to be in the setthe set

Fortran 90Fortran 90• array arithmeticarray arithmetic• type inference is not an issuetype inference is not an issue

5353

Records and VariantsRecords and Variants

Structures and unionsStructures and unions• C++: struct is a special form of a class C++: struct is a special form of a class

(or vice versa)(or vice versa)• Java: class is the only ‘struct-like’ type Java: class is the only ‘struct-like’ type

constructorconstructor Pascal & C syntax for records Pascal & C syntax for records (next slide)(next slide)

• records consist of named records consist of named fieldsfields• anonymous fields anonymous fields tuples (ML) tuples (ML)

5555

Referring to fields Referring to fields • usually referred using the ‘dot notation’usually referred using the ‘dot notation’

Fortran 90: %-notationFortran 90: %-notation• some languages use functional notationsome languages use functional notation

projection functions (Cobol, Algol 68)projection functions (Cobol, Algol 68)name of copper, atomic_weight of coppername of copper, atomic_weight of copper

ML: #fieldname record-objectML: #fieldname record-object#name copper#name copper

Common LispCommon Lispelement-name copperelement-name copper

Nested definitionsNested definitions• directly (Pascal) or using intermediate structures (F90)directly (Pascal) or using intermediate structures (F90)

malachite.element_yielded.atomic_numbermalachite.element_yielded.atomic_number ML the order of fields insignificantML the order of fields insignificant

{name = ”Cu”, atomic_number = 29, atomic_weight = 63.546, metallic = {name = ”Cu”, atomic_number = 29, atomic_weight = 63.546, metallic = true}true}

{atomic_number = 29, name = ”Cu”, atomic_weight = 63.546, metallic = {atomic_number = 29, name = ”Cu”, atomic_weight = 63.546, metallic = true}true}

• tuplestuples(”Cu”, 29)(”Cu”, 29) {1 = ”Cu”, 2 = 29}{1 = ”Cu”, 2 = 29} {2 = 29, 1 = {2 = 29, 1 = ”Cu”}”Cu”}

5656

ImplementationImplementation Prime reason why the order of the fields in a Prime reason why the order of the fields in a

record should matterrecord should matter• fields are usually stored after each otherfields are usually stored after each other

Accessing a record fieldAccessing a record field• find base pointer (frame/global)find base pointer (frame/global)• add to thatadd to that

record’s offset from the base and record’s offset from the base and field’s offset in the recordfield’s offset in the record

• generate corresponding load/store instructiongenerate corresponding load/store instruction• assumes assumes alignment,alignment, i.e., fields start at memory word i.e., fields start at memory word

boundariesboundaries Example: Figure 7.1 Example: Figure 7.1 (next slide)(next slide)

• alignment creates ‘holes’ in the memory layoutalignment creates ‘holes’ in the memory layout• array of such records would allocate 20 bytes for eacharray of such records would allocate 20 bytes for each

5858

Packed recordsPacked records

Pascal keyword PACKEDPascal keyword PACKED• can be applied to record, array, file, setcan be applied to record, array, file, set• tells the compiler to use minimum amount of tells the compiler to use minimum amount of

memorymemory• ‘‘push fields together’push fields together’• accessing fields is sloweraccessing fields is slower

collect pieces and reassemble them to registerscollect pieces and reassemble them to registers trade memory for speedtrade memory for speed

Example in Figure 7.2 Example in Figure 7.2 (next slide)(next slide)

• array of these would allocate 16 bytes for eacharray of these would allocate 16 bytes for each• PACKED array would allocate 15PACKED array would allocate 15

6060

Record operations...Record operations...

Assignment r1 := r2Assignment r1 := r2• most languages allow thismost languages allow this• naive implementation: copy each field naive implementation: copy each field

separatelyseparately• fast implementation: use block memory fast implementation: use block memory

transferstransfers just transfer all bits of r2 into r1just transfer all bits of r2 into r1 block_copy(source, dest, length)block_copy(source, dest, length) hardware supporthardware support

6161

...Record operations...Record operations

Comparison r1 = r2Comparison r1 = r2• most languages do NOT support thismost languages do NOT support this

exception: Adaexception: Ada in C++ (and many others) one can program in C++ (and many others) one can program

own equality tests for own classesown equality tests for own classes

• implementationimplementation block compareblock compare

• problem: also the garbage in the holes gets problem: also the garbage in the holes gets testedtested always fill holes with zeroes (takes time)always fill holes with zeroes (takes time)

field-by-field comparisonfield-by-field comparison

6262

Saving spaceSaving space

Holes in records waste spaceHoles in records waste space• packing packing heavy cost in access time heavy cost in access time

Compromise solutionCompromise solution• rearrange fields so that wasting caused by rearrange fields so that wasting caused by

word-alignment is minimalword-alignment is minimal• greedy heuristics for this minimizationgreedy heuristics for this minimization

sort fields according to their (alignment) sizesort fields according to their (alignment) size place smallest fields firstplace smallest fields first

• bytes, half-words, words, double words, ...bytes, half-words, words, double words, ... larger fields are never (unnecessarily) split over larger fields are never (unnecessarily) split over

several wordsseveral words

• example (next slide)example (next slide)

6464

Does the ordering matter?Does the ordering matter?

Usually notUsually not• compiler can rearrange fields as it compiler can rearrange fields as it

wisheswishes Some systems programming tasks Some systems programming tasks

• require knowledge of the exact location require knowledge of the exact location and length of the fieldsand length of the fields systems programming languagessystems programming languages

allow programmer to specify theseallow programmer to specify these C, C++ guarantee that the order is not C, C++ guarantee that the order is not

changed anywaychanged anyway

6565

WITH statements & recordsWITH statements & records

Introduced in PascalIntroduced in Pascal• aim: simplify manipulation of deeply nested aim: simplify manipulation of deeply nested

structures (x1.f.g.y := x2.f.g.y)structures (x1.f.g.y := x2.f.g.y)• example pp. 355-356example pp. 355-356

WITH statement opens a new scopeWITH statement opens a new scope• fields of the fields of the openedopened record become normal record become normal

variable namesvariable names• formalize the notion of formalize the notion of elliptical referenceselliptical references of of

CobolCobol allows the use of a field name as a variable if it’s allows the use of a field name as a variable if it’s

uniqueunique

6666

...WITH statements...WITH statements ProblemsProblems

• How to manipulate the fields of 2 similar records How to manipulate the fields of 2 similar records simultaneously?simultaneously?

• Naming conflictsNaming conflicts new scope new scope local variables inaccessible local variables inaccessible

• Long and nested statementsLong and nested statements which field comes from which WITH recordwhich field comes from which WITH record type definition may be very fartype definition may be very far

Modula-3 solutionModula-3 solution• WITH creates WITH creates aliasesaliases instead of opening records instead of opening records• fields are not directly visible but accessible via aliasesfields are not directly visible but accessible via aliases• aliases can be used for other objects, tooaliases can be used for other objects, too• examples on page 357examples on page 357

6767

WITH without WITHWITH without WITH C simulationC simulation

• use local pointer variables as aliasesuse local pointer variables as aliases• needs the capability ofneeds the capability of

declaring variables in nested blocksdeclaring variables in nested blocks addressing stack (non-heap) variablesaddressing stack (non-heap) variables Pascal has neitherPascal has neither

• C++: use reference types insteadC++: use reference types instead implementationimplementation

• each WITH creates a local ‘hidden pointer’ to the opened each WITH creates a local ‘hidden pointer’ to the opened or aliased recordor aliased record

• access to fields via this pointer & offsetsaccess to fields via this pointer & offsets• good optimizer good optimizer might ‘might ‘invent’ these automaticallyinvent’ these automatically

6868

Variant recordsVariant records AimAim

• provide more provide more alternativealternative fields fields• only one of them is valid at a given timeonly one of them is valid at a given time

Pascal variant record Pascal variant record (next slide)(next slide)• tag field (naturally_occurring)tag field (naturally_occurring)• variants (in parentheses)variants (in parentheses)

ImplementationImplementation• variants may share the same space variants may share the same space (slide+2)(slide+2)• origin: equivalence statement of Fortran I (use origin: equivalence statement of Fortran I (use

same space for different variables)same space for different variables)integer iinteger ireal rreal rlogical blogical bequivalence (i, r, b)equivalence (i, r, b)

7171

Why is ‘variant’ better than union?Why is ‘variant’ better than union?

Pascal integrates variants with recordsPascal integrates variants with records• variations only seldom appear elsewherevariations only seldom appear elsewhere• variant fields can be accessed with variant fields can be accessed with

standard dot-notationstandard dot-notation C & unions C & unions (next slide)(next slide)

• need to create intermediate structuresneed to create intermediate structures extra levels of naming to access variant extra levels of naming to access variant

datadata

7272

copper.extra_fields.natural_info.sourcecopper.extra_fields.natural_info.source

7373

Variants and type safetyVariants and type safety Fortran equivalenceFortran equivalence

• no built-in mechanism to verify which of the no built-in mechanism to verify which of the ‘equivalenced’ objects is currently valid ‘equivalenced’ objects is currently valid programming programming errors (not caught by compiler)errors (not caught by compiler)r = 3.0r = 3.0

......print ’(I10)’, iprint ’(I10)’, i

Algol 68 & unionsAlgol 68 & unions• implementation must keep track of the valid alternativeimplementation must keep track of the valid alternative• access via a special kind of a case-statementaccess via a special kind of a case-statement implementation must maintain a hidden variable implementation must maintain a hidden variable

containing the variant tagcontaining the variant tag updated at each assignmentupdated at each assignment Pascal tag-field makes this variable explicitPascal tag-field makes this variable explicit

7474

Pascal variants & type safetyPascal variants & type safety Problem 1Problem 1

• explicit tag-field can be manipulated directlyexplicit tag-field can be manipulated directly in Algol-68 only indirectly via assignmentsin Algol-68 only indirectly via assignments

possible to update tag without updating the possible to update tag without updating the depending variantdepending variant

type safety is lost (p. 361)type safety is lost (p. 361) change should make dependent data uninitializedchange should make dependent data uninitialized

or change should be possible only with a new value for the or change should be possible only with a new value for the variantvariant

Problem 2Problem 2• tag fields are actually tag fields are actually optionaloptional

no need to allocate space for tag fieldno need to allocate space for tag field not possible to check illegal accesses at runtimenot possible to check illegal accesses at runtime

7575

Unions or not?Unions or not?

Discriminated unionsDiscriminated unions• variant records with tags (visible or not)variant records with tags (visible or not)• Pascal, Ada, Modula-2Pascal, Ada, Modula-2

Nondiscriminated unionsNondiscriminated unions• no tagsno tags• union types of C (much like Fortran eq)union types of C (much like Fortran eq)

Java, Modula-3: no variant typeJava, Modula-3: no variant type• use subtyping / subclasses insteaduse subtyping / subclasses instead

7676

Ada variantsAda variants Tag must appear at record header Tag must appear at record header (next slide)(next slide)

• always presentalways present• possible to assign a default valuepossible to assign a default value

copper : element;copper : element;plutonium : element (false);plutonium : element (false);neptunium : element (naturally_occuring neptunium : element (naturally_occuring false);⇒ false);⇒

• note: the tag (discriminant) may contain also other note: the tag (discriminant) may contain also other information that affects the size of the record information that affects the size of the record (next slide)(next slide)

• may define subtypesmay define subtypes AssignmentsAssignments

• whole recordwhole record• or via aggregate (both tag & variant part)or via aggregate (both tag & variant part)

Variable declarationsVariable declarations• may bind the tag value (constrained to one variant)may bind the tag value (constrained to one variant)• or leave it openor leave it open

7878

ImplementationImplementation

Constrained variants (Ada)Constrained variants (Ada)• allocate space only for the chosen alternativeallocate space only for the chosen alternative

UnconstrainedUnconstrained• Pascal, Ada: variable part must be the last in Pascal, Ada: variable part must be the last in

the declarationthe declaration

every field has a constant offsetevery field has a constant offset

no ‘non-alignment’ holesno ‘non-alignment’ holes• Modula-2: any order is allowed, space allocated Modula-2: any order is allowed, space allocated

for the largest alternative for the largest alternative (next slide)(next slide)

8080

ArraysArrays homogenous collection of elementshomogenous collection of elements

• records: heterogeneousrecords: heterogeneous• most common and important composite data typemost common and important composite data type• fundamental part of any programming languagefundamental part of any programming language

SemanticsSemantics• mapping from an index type to a component (element) mapping from an index type to a component (element)

typetype• most languages restrict index to be of a discrete typemost languages restrict index to be of a discrete type

more general arrays require a hash-table implementationmore general arrays require a hash-table implementation C++, Java: mapsC++, Java: maps

• elements can usually be of any typeelements can usually be of any type Fortran 77: components must be scalarsFortran 77: components must be scalars

8181

Array syntax...Array syntax...

Accessing elementsAccessing elements• Pascal, C, ...: A[3]Pascal, C, ...: A[3]

no confusion with subroutine callsno confusion with subroutine calls

• Fortran, Ada: A(3)Fortran, Ada: A(3) Fortran: keypunch machines did not have ‘[‘ Fortran: keypunch machines did not have ‘[‘

‘]’‘]’ Ada: deliberate design decisionAda: deliberate design decision

• arrays are mappings, that is, functionsarrays are mappings, that is, functions• easy to replace an array with the corresponding easy to replace an array with the corresponding

mapping (or vice versa)mapping (or vice versa)• example (next slide)example (next slide)

8383

...Array syntax...Array syntax Declaring array typesDeclaring array types

• append subscript notation to a ‘normal’ scalar append subscript notation to a ‘normal’ scalar declarationdeclaration

C: C: char upper[26]char upper[26], lower bound = 0, lower bound = 0 Fortran: Fortran: character(26) uppercharacter(26) upper, l.b. = 1, l.b. = 1

• use array constructoruse array constructor Pascal: Pascal: upper: ARRAY[‘a’..’z’] OF Char;upper: ARRAY[‘a’..’z’] OF Char;

Multidimensional arraysMultidimensional arrays• syntactic sugar for ‘arrays of arrays’syntactic sugar for ‘arrays of arrays’• Ada makes a difference betweenAda makes a difference between

a 2-dimensional array anda 2-dimensional array andan array of 1-dimensional arraysan array of 1-dimensional arrays

the latter is more flexible to use (matrix(3) is a normal the latter is more flexible to use (matrix(3) is a normal array)array)

• C: C: int matrix[3][4]int matrix[3][4] matrix[3] matrix[3] is a reference (to int or an array of ints, is a reference (to int or an array of ints,

depends on context)depends on context)

8484

Array operationsArray operations Selecting & assigning elementsSelecting & assigning elements Slices / sectionsSlices / sections

• Fortran-90: many operationsFortran-90: many operations slice = rectangular portion of an arrayslice = rectangular portion of an array next figure: matrix & some slicesnext figure: matrix & some slices

• Ada supports only 1-dimensional slicesAda supports only 1-dimensional slices slice = contiguous subrange of elementsslice = contiguous subrange of elements

Comparing equalityComparing equality AdaAda

• lexicographic ordering (A < B) for 1-dim arrays of discrete lexicographic ordering (A < B) for 1-dim arrays of discrete elementselements

• OR/AND/XOR on Boolean arraysOR/AND/XOR on Boolean arrays Fortran 90, APL: many built-in array operations (over 60)Fortran 90, APL: many built-in array operations (over 60)

• A + B, tan(A), ...A + B, tan(A), ...• structural equivalence structural equivalence same element type & shape (good same element type & shape (good

when using slices)when using slices)• most built-in scalar operations generalize to arrays most built-in scalar operations generalize to arrays • also ‘array-specific’ operations (like matrix transposition)also ‘array-specific’ operations (like matrix transposition)

8686

Allocating arraysAllocating arrays Depends on Depends on

• lifetime of the arraylifetime of the array• the time the shape of the array is knownthe time the shape of the array is known

PossibilitiesPossibilities1.1. global, staticglobal, static

bounds & dimensions known at compile-timebounds & dimensions known at compile-time allocate from global memory areaallocate from global memory area

2.2. local, static: recursive subroutineslocal, static: recursive subroutines allocate from stack frameallocate from stack frame

3.3. local, elaboration time (next slide)local, elaboration time (next slide) divide stack frame to fixed & variable partdivide stack frame to fixed & variable part allocate a pointer from fixed part, array itself from variableallocate a pointer from fixed part, array itself from variable nested definitions nested definitions delay array allocation delay array allocation

4.4. arbitrary, elaboration time (e.g. Java): use heaparbitrary, elaboration time (e.g. Java): use heap int[] Aint[] A creates a reference creates a reference A = new int[size] A = new int[size] or or A = BA = B

5.5. dynamic shapedynamic shape must use heap (array may grow from both ends)must use heap (array may grow from both ends) re-allocation & copy when necessaryre-allocation & copy when necessary

8888

Some examples of arrays in Some examples of arrays in different languagesdifferent languages

arrays with shape not known by elaboration time heavily used in arrays with shape not known by elaboration time heavily used in numerical softwarenumerical software

Pascal: conformant array parametersPascal: conformant array parameterspassed by value or referencepassed by value or reference

C: array = pointerC: array = pointer

8989

Ada Ada (next slide)(next slide)• arrays with no bounds (useful as parameter type)arrays with no bounds (useful as parameter type)

all ‘real’ arrays are subtypes of these (and thus type-all ‘real’ arrays are subtypes of these (and thus type-compatible)compatible)

• local arrays with elaboration-time shapelocal arrays with elaboration-time shape• access to actual array access to actual array attributesattributes

Fortran 90 Fortran 90 (slide+2)(slide+2)• dynamic allocation of arraysdynamic allocation of arrays• can be simulated using pointerscan be simulated using pointers

Java strings (immutable)Java strings (immutable)• new value new value new string new string

C++ vector classC++ vector class Dynamic arrays: Clu, Perl, LispDynamic arrays: Clu, Perl, Lisp

9292

Memory layoutMemory layout Elements in contiguous locationsElements in contiguous locations

• possible alignment holes (esp. with records)possible alignment holes (esp. with records) Multidimensional arraysMultidimensional arrays

• row-major orderrow-major order ‘‘last’ dimension grows first in consecutive locationslast’ dimension grows first in consecutive locations A[1,1], A[1,2], ..., A[1,max2], A[2,1], ...A[1,1], A[1,2], ..., A[1,max2], A[2,1], ... most languages use thismost languages use this

• column-major ordercolumn-major order ‘‘first’ dimension grows first in consecutive locationsfirst’ dimension grows first in consecutive locations A[1,1], A[2,1], ..., A[max1,1], A[1,2], ...A[1,1], A[2,1], ..., A[max1,1], A[1,2], ... FortranFortran

• straightforward generalization to m > 2 dimensionsstraightforward generalization to m > 2 dimensions

9494

Row- or column order?Row- or column order? Row-majorRow-major

• easy to define matrix as an array of subarrayseasy to define matrix as an array of subarrays Computational efficiencyComputational efficiency

• better performance if array elements are in cache better performance if array elements are in cache • cache miss cache miss several elements of array are loaded several elements of array are loaded• if subsequent indices use these then we are doing wellif subsequent indices use these then we are doing well• Fig. 7. 10: good cache hit ratio with row-order, worse Fig. 7. 10: good cache hit ratio with row-order, worse

with column orderwith column order• the ‘good’ and ‘bad’ depend on the program!the ‘good’ and ‘bad’ depend on the program!• one might implement BOTH orders and use the one might implement BOTH orders and use the

appropriate oneappropriate one

9595

Row-pointer implementationRow-pointer implementation Memory layoutMemory layout

• rows can be anywhere in the memoryrows can be anywhere in the memory• an auxiliary array of pointers to rowsan auxiliary array of pointers to rows• generalizes to m > 2 dimensionsgeneralizes to m > 2 dimensions

AdvantagesAdvantages• sometimes faster to access row elementssometimes faster to access row elements

may depend on hardware (indirect addressing vs. multiplication)may depend on hardware (indirect addressing vs. multiplication)• rows can be of different lengthrows can be of different length

May waste or save spaceMay waste or save space• pointer array takes some spacepointer array takes some space• ‘‘dynamic’ lengths of rows may save moredynamic’ lengths of rows may save more

LanguagesLanguages• C & C++ have both row-major & row-pointer (Fig. 7.11)C & C++ have both row-major & row-pointer (Fig. 7.11)• Java uses row-pointerJava uses row-pointer

9797

Address calculationsAddress calculations

ExampleExample• 3-dimensional array with row-major ordering3-dimensional array with row-major ordering

generalizes easily to any number of dimensionsgeneralizes easily to any number of dimensions computation is similar for column-major casecomputation is similar for column-major case

• A: [L1..U1, L2..U2, L3..U3]A: [L1..U1, L2..U2, L3..U3]• DefineDefine

S3 = size of the element typeS3 = size of the element type S2 = size of a row = (U3 – L3 + 1)*S3S2 = size of a row = (U3 – L3 + 1)*S3 S1 = size of a 2-d plane = (U2 – L2 + 1)*S2S1 = size of a 2-d plane = (U2 – L2 + 1)*S2

• address of A[i,j,k]?address of A[i,j,k]? = &A + (i – L1)*S1 + (j – L2)*S2 + (k - L3)*S3= &A + (i – L1)*S1 + (j – L2)*S2 + (k - L3)*S3

9898

Faster address calculationsFaster address calculations Previous computation involvesPrevious computation involves

• 5 multiplications and 10 additions5 multiplications and 10 additions IFIF

• Li & Ui (i=1,2,3) are known at compile-timeLi & Ui (i=1,2,3) are known at compile-time THENTHEN

• Si (i=1,2,3) are compile-time constantsSi (i=1,2,3) are compile-time constants move subtractions of Li out of the formulamove subtractions of Li out of the formula• &A[i,j,k] = &A[i,j,k] =

&A + i*S1 + j*S2 + k*S3 (runtime computation)&A + i*S1 + j*S2 + k*S3 (runtime computation) - [(L1*S1) + (L2*S2) + (L3*S3)] (compile-time constant)- [(L1*S1) + (L2*S2) + (L3*S3)] (compile-time constant)

• 3 multiplications & 4 additions/subtractions3 multiplications & 4 additions/subtractions if A is a global/static variable then also &A is a compile-time if A is a global/static variable then also &A is a compile-time

constantconstant• corresponding machine code on page 376corresponding machine code on page 376

100100

Restricted & generalized casesRestricted & generalized cases

Indexes (i,j,k) may be known at compile-Indexes (i,j,k) may be known at compile-timetime• move to the ‘static part’ of computationmove to the ‘static part’ of computation

Lower/upper bounds may be unknownLower/upper bounds may be unknown• move to the ‘dynamic part’ of computationmove to the ‘dynamic part’ of computation

ExampleExample• L1 not known, k = 3L1 not known, k = 3

C, C++, JavaC, C++, Java• lower bounds always 0 lower bounds always 0 they never contribute they never contribute

to runtime costto runtime cost

101101

Static & dynamic address Static & dynamic address computationscomputations

This far only arrays, but the idea can This far only arrays, but the idea can be used for any structuresbe used for any structures

Example (p. 378)Example (p. 378)• V = local array of records RV = local array of records R• R has a 2-dimensional array in field MR has a 2-dimensional array in field M• &V[i].M[3,j] = ?&V[i].M[3,j] = ?

102102

Row-pointer addressesRow-pointer addresses

Computations much simplerComputations much simpler A[i,j,k] = A[i,j,k] =

• (*(*A[ i ])[ j ])[ k ] in C notation(*(*A[ i ])[ j ])[ k ] in C notation• A[ i ]^[ j ]^[ k ] in Pascal notationA[ i ]^[ j ]^[ k ] in Pascal notation• instruction sequence on p. 378instruction sequence on p. 378

Speed vs. row-major implementationSpeed vs. row-major implementation• earlier machines had so slow earlier machines had so slow

multiplication that indirect addressing multiplication that indirect addressing was fasterwas faster

103103

Dope vectorsDope vectors = run-time array descriptors= run-time array descriptors

• dimension and bounds of the arraydimension and bounds of the array• compiler needs them anywaycompiler needs them anyway• useful to store with arrays also at run-timeuseful to store with arrays also at run-time

ContentsContents• number of dimensionsnumber of dimensions• lower bound & sizelower bound & size• upper bound if dynamic bounds checks are doneupper bound if dynamic bounds checks are done

NotesNotes• some of the info of the descriptor may be known at compile-some of the info of the descriptor may be known at compile-

timetime• but it still makes sense to store itbut it still makes sense to store it

space usage is not that muchspace usage is not that much all arrays look the sameall arrays look the same the info is there when neededthe info is there when needed

104104

Implementing arrays with Implementing arrays with descriptorsdescriptors

Array = combination ofArray = combination of• pointer to contents andpointer to contents and• descriptordescriptor• this part has always the same sizethis part has always the same size

contents-part may be dynamiccontents-part may be dynamic place descriptor & pointer in the static part of stack frames, contents in place descriptor & pointer in the static part of stack frames, contents in

the dynamic partthe dynamic part Array operators (some)Array operators (some)

• elaboration: allocate space (stack/heap) & initialize descriptorelaboration: allocate space (stack/heap) & initialize descriptor• assignment (A := B)assignment (A := B)

deallocate old contents and allocate new contents (if needed)deallocate old contents and allocate new contents (if needed) block-copy contents, copy descriptorblock-copy contents, copy descriptor

• projectionsprojections need own descriptor (copy the appropriate part from the ‘parent’)need own descriptor (copy the appropriate part from the ‘parent’)

Records containing dynamic arrays?Records containing dynamic arrays?• continuous contents continuous contents no fixed field offsets no fixed field offsets need need record descriptorrecord descriptor• non-continuous non-continuous block-copy/compare impossible block-copy/compare impossible

105105

StringsStrings

(just) an array of characters or(just) an array of characters or a special data type with own a special data type with own

operatorsoperators• dynamic arraydynamic array

even if the language doesn’t support them even if the language doesn’t support them otherwiseotherwise

• many applications require stringsmany applications require strings• strings are easier to implement than strings are easier to implement than

arrays in general arrays in general 1 dimension, byte elements1 dimension, byte elements

106106

String LiteralsString Literals

Sequence of characters in quotation Sequence of characters in quotation marksmarks• character literals (char = string of length character literals (char = string of length

1?)1?)• escape sequences for non-printable escape sequences for non-printable

characterscharacters C: ‘\t’ (tab) ‘\n’ (newline), ‘\006’ (octal! ascii C: ‘\t’ (tab) ‘\n’ (newline), ‘\006’ (octal! ascii

code)code) Java: C + numeric escapes ‘\uxxxx’ for Java: C + numeric escapes ‘\uxxxx’ for

Unicode charactersUnicode characters

107107

String operationsString operations

Often implementation-dependentOften implementation-dependent• size known at elaboration timesize known at elaboration time

C, Pascal, AdaC, Pascal, Ada contiguous array of characterscontiguous array of characters restricted operabilityrestricted operability lexicographic ordering (<, >)lexicographic ordering (<, >) C: no built-in operations C: no built-in operations

• size can change dynamically size can change dynamically Lisp, ML, Icon, JavaLisp, ML, Icon, Java heap implementation (block, chain of blocks)heap implementation (block, chain of blocks) concatenation, lengthconcatenation, length substrings, pattern matchingsubstrings, pattern matching ability to define own string-valued functionsability to define own string-valued functions

108108

SetsSets Collection of elements Collection of elements

• like arrayslike arrays homogenoushomogenous element type = element type = base typebase type of the set of the set

• Different from arraysDifferent from arrays unorderedunordered all elements are differentall elements are different size arbitrarysize arbitrary

Part of Pascal languagePart of Pascal language• many others have library supportmany others have library support• creation, literals, union, intersection, differencecreation, literals, union, intersection, difference

109109

Implementing setsImplementing sets Numerous standard data structuresNumerous standard data structures

• e.g. tree structurese.g. tree structures Usually as a bit vectorUsually as a bit vector

• bit i = 1 bit i = 1 i-th element is a member of the set i-th element is a member of the set• bit i = 0 bit i = 0 i-th element is not a member of the i-th element is not a member of the

setset• suits only for small base types suits only for small base types

base domain of size n needs a vector of n bitsbase domain of size n needs a vector of n bits 32-bit integers 32-bit integers 2^32 bits = 540 Mb of memory 2^32 bits = 540 Mb of memory typical bound 256 elements (set of Char)typical bound 256 elements (set of Char)

• easy to implement and/or/xor/noteasy to implement and/or/xor/not just use the corresponding bit operationsjust use the corresponding bit operations

110110

Pointers and recursive typesPointers and recursive types Recursive typesRecursive types

• objects contain references to other objects of the same typeobjects contain references to other objects of the same type• typically recordstypically records

some data in addition to those referencessome data in addition to those references• generally used to build linked data structures like lists and generally used to build linked data structures like lists and

treestrees Easy to define with reference variable modelEasy to define with reference variable model

• everything is a reference anywayeverything is a reference anyway Value model needs a special Value model needs a special pointerpointer type type

• value of a pointer = reference to some objectvalue of a pointer = reference to some object• restricted to point only to heap objects (Pascal, Modula-3, Ada restricted to point only to heap objects (Pascal, Modula-3, Ada

83)83) new pointers created only via memory allocationnew pointers created only via memory allocation

• references to stack objects allowed (C, C++, Ada 95)references to stack objects allowed (C, C++, Ada 95) new pointers also by using ‘address-of’ –operatornew pointers also by using ‘address-of’ –operator

111111

Pointers and addressesPointers and addresses

Pointer is a high-level conceptPointer is a high-level concept• a reference to an objecta reference to an object

Address is a low-level conceptAddress is a low-level concept• a location in computer memorya location in computer memory

Pointers Pointers cancan be implemented as addresses be implemented as addresses• addresses do not make sense in distributed addresses do not make sense in distributed

environmentsenvironments• address may be augmented with other address may be augmented with other

information to implement a pointerinformation to implement a pointer

112112

Storage reclamationStorage reclamation How long is the program supposed to run?How long is the program supposed to run?

• one short time one short time just forget just forget• long / infinite time long / infinite time memory leaks are a real problem memory leaks are a real problem

Explicit reclaiming (C, Pascal)Explicit reclaiming (C, Pascal)• programmer’s responsibilityprogrammer’s responsibility• simplifies implementationsimplifies implementation• dangersdangers

we may forget to reclaim unused objects we may forget to reclaim unused objects memory leak memory leak we may reclaim used objects we may reclaim used objects dangling pointers dangling pointers

Automatic reclaiming (Java, Ada)Automatic reclaiming (Java, Ada)• garbage collector garbage collector • how to distinguish garbage from objects?how to distinguish garbage from objects?

113113

Pointer operationsPointer operations

allocation and deallocation of objects allocation and deallocation of objects in the heapin the heap

dereferencingdereferencing assignmentassignment language: functional or imperativelanguage: functional or imperative reference or value model for reference or value model for

variables/namesvariables/names

114114

Pointer assignmentPointer assignment A := BA := B

• reference model: reference model: A refers to the same object as BA refers to the same object as B• value modelvalue model

if B is a reference if B is a reference A refers to B’s object A refers to B’s object if B is an object if B is an object copy contents to A copy contents to A

Primitive types & reference modelPrimitive types & reference model• inefficient to use pointersinefficient to use pointers• number ‘3’ never changesnumber ‘3’ never changes

immutableimmutable types (int, float, char) types (int, float, char) use the actual object instead of a pointeruse the actual object instead of a pointer

• use pointers only foruse pointers only for mutable mutable typestypes

115115

Defining recursive data types...Defining recursive data types...

Reference model languagesReference model languages• ML example ML example (slide +1)(slide +1)

tagged tuplestagged tuples

• Lisp example Lisp example (slide +2)(slide +2) everything is a cons-cell or an atomeverything is a cons-cell or an atom

• note: data structures of purely functional note: data structures of purely functional languages are always acycliclanguages are always acyclic

new objects may only point to older onesnew objects may only point to older ones old ones never changeold ones never change

• mutually recursive typesmutually recursive types ML: declare together in a group (p. 386)ML: declare together in a group (p. 386)

116116

MLMLdatatype chr_tree = empty | node of string * chr_tree * chr_tree;datatype chr_tree = empty | node of string * chr_tree * chr_tree;

(’R’, node (’X’,empty, empty),node (’Y’, node ( ’Z’, empty, empty),(’R’, node (’X’,empty, empty),node (’Y’, node ( ’Z’, empty, empty),

node (’W’, empty, empty)))node (’W’, empty, empty)))

117117

LispLisp’’(#/R (#/X ()()) (#/Y (#/Z ()()) (#/W ()())))(#/R (#/X ()()) (#/Y (#/Z ()()) (#/W ()())))

118118

...Defining recursive data types...Defining recursive data types Value model languagesValue model languages

• examples examples (p. 387)(p. 387) forward declarations (Pascal)forward declarations (Pascal) incomplete declarations (Ada, C)incomplete declarations (Ada, C)

• note that in C the type name is ‘struct chr_tree’note that in C the type name is ‘struct chr_tree’ no ‘aggregates’, structures must be built in programsno ‘aggregates’, structures must be built in programs

• allocationallocation using built-in functions (Pascal, Ada)using built-in functions (Pascal, Ada) using library functions (C)using library functions (C)

• note sizeof & castingnote sizeof & casting using constructors (C++, Java)using constructors (C++, Java)

• parameters & overloadingparameters & overloading

119119

120120

Accessing pointed objectsAccessing pointed objects Explicit dereferencingExplicit dereferencing

• Pascal ‘^’, C: ‘*’Pascal ‘^’, C: ‘*’ Dereferencing and recordsDereferencing and records

• recall: recursive data structures are almost always recordsrecall: recursive data structures are almost always records justified to provide a special syntax to access fields of justified to provide a special syntax to access fields of

pointed recordspointed records C: r->fC: r->f

• Ada: no special notationAda: no special notation use pointed records just as standard recordsuse pointed records just as standard records implicit dereferencingimplicit dereferencing pseudofield ‘all’ to copy all of the recordpseudofield ‘all’ to copy all of the record

ML languageML language• has an imperative part (with side effects)has an imperative part (with side effects)• assignment statement allowed but only if l.h.s. is a pointerassignment statement allowed but only if l.h.s. is a pointer• see example on p. 389see example on p. 389

121121

Pointers and arrays in CPointers and arrays in C an 1-dimensional array is an 1-dimensional array is almostalmost the same as a pointer to array the same as a pointer to array

elementelement• see example on p. 389see example on p. 389

a[i][j] equiv. (*(a+i))j] equiv. *(a[i]+j) equiv. *(*(a+i)+j)a[i][j] equiv. (*(a+i))j] equiv. *(a[i]+j) equiv. *(*(a+i)+j)• arrays are always passed as pointers to subroutinesarrays are always passed as pointers to subroutines

pointer arithmeticpointer arithmetic• add/subtract an integeradd/subtract an integer• subtract another pointersubtract another pointer p-qp-q• compare 2 pointers compare 2 pointers p<qp<q• results are automatically scaled according to the element sizeresults are automatically scaled according to the element size• common to iterate over arrays using pointers instead of indexescommon to iterate over arrays using pointers instead of indexes

used to be fasterused to be faster ‘‘more elegant’?more elegant’?

differencesdifferences• space allocation (and thus the result of sizeof)space allocation (and thus the result of sizeof)• int *a[n] vs. int a[n][m]int *a[n] vs. int a[n][m]

122122

How to read C type declarations?How to read C type declarations?

(short course)(short course)start at the name of the variablestart at the name of the variablelooploop

work right as much as possible (parentheses)work right as much as possible (parentheses)work left as much as possiblework left as much as possiblejump out of parenthesesjump out of parentheses

until all readuntil all read examplesexamples

• int *a[n]: a is an array of n pointers to intint *a[n]: a is an array of n pointers to int• int (*a)[n]: a is a pointer to an array of n intsint (*a)[n]: a is a pointer to an array of n ints

123123

Passing array parameters in CPassing array parameters in C One-dimensional: pointer to the arrayOne-dimensional: pointer to the array 2-dimensional, row-pointer layout2-dimensional, row-pointer layout

• int *a[] or int **aint *a[] or int **a 2-dimensional, contiguous layout2-dimensional, contiguous layout

• int a[][m] or int (*a)[m]int a[][m] or int (*a)[m]• the size of the first dimension is irrelevantthe size of the first dimension is irrelevant• declaration must contain enough info to compute the sizes of declaration must contain enough info to compute the sizes of

elementselements int a[][] is not enough (can not compute a+i or a[i])int a[][] is not enough (can not compute a+i or a[i]) exception: size can be deduced from an aggregateexception: size can be deduced from an aggregate

2-dimensional, contiguous layout, sizes not known2-dimensional, contiguous layout, sizes not known• pass pointer & dimension sizespass pointer & dimension sizes• compute address explicitly with pointer arithmetics (p. 391)compute address explicitly with pointer arithmetics (p. 391)

124124

Dangling referencesDangling references A live pointer that no longer points to a valid A live pointer that no longer points to a valid

objectobject Created byCreated by

• explicit reclamation (p. 391)explicit reclamation (p. 391) dispose, delete (+ destructor)dispose, delete (+ destructor) other pointers may still point to the same objectother pointers may still point to the same object

• references to ‘dead’ stack objectsreferences to ‘dead’ stack objects lifetime of reference exceeds the lifetime of the referred lifetime of reference exceeds the lifetime of the referred

objectobject DangersDangers

memory area may be allocated to some other object memory area may be allocated to some other object dangling reference may read or dangling reference may read or write write random bits over itrandom bits over it

• data structures are corrupteddata structures are corrupted• memory area may even contain heap bookkeeping datamemory area may even contain heap bookkeeping data

125125

WorkaroundsWorkarounds Algol 68Algol 68

• pointer is not allowed to point to an object pointer is not allowed to point to an object which has a shorter lifetime than the pointerwhich has a shorter lifetime than the pointer

heap heap stack stack outer subroutine outer subroutine inner subroutine inner subroutine

• problem: pointer & object parametersproblem: pointer & object parameters pointers & objects must be augmented with lifetime pointers & objects must be augmented with lifetime

informationinformation Ada 95Ada 95

• forbids references to objects whose lifetime is forbids references to objects whose lifetime is briefer than pointer’s briefer than pointer’s typetype

• can be checked at compile-time in most casescan be checked at compile-time in most cases

126126

TombstonesTombstones Mechanism to catch all dangling Mechanism to catch all dangling

references at run-timereferences at run-time• works both for stack & heap referencesworks both for stack & heap references• tombstone = an extra level of indirection tombstone = an extra level of indirection

between the reference and the objectbetween the reference and the object• all references point to the tombstoneall references point to the tombstone• tombstone points to the objecttombstone points to the object• should be used for all references (even for should be used for all references (even for

global data) to avoid special casesglobal data) to avoid special cases Reclamation of an objectReclamation of an object

• set tombstone to some special value (non-set tombstone to some special value (non-address)address)

127127

128128

Maintaining tombstonesMaintaining tombstones Explicit reclamation Explicit reclamation easy easy Subroutine returnSubroutine return

• all references to stack frame should be updatedall references to stack frame should be updated• implementation must be able to find these references (and the implementation must be able to find these references (and the

associated tombstones)associated tombstones) possible solutionpossible solution

• keep ‘stack tombstones’ sorted (according to memory keep ‘stack tombstones’ sorted (according to memory addresses of stack frames) in a listaddresses of stack frames) in a list

• creation of a pointer creation of a pointer add tombstone to the top of the list add tombstone to the top of the list• passing a pointer parameter passing a pointer parameter scan list to the right address scan list to the right address

and insert there (to keep list sorted)and insert there (to keep list sorted)• subroutine return: invalidate all the tombstones of the current subroutine return: invalidate all the tombstones of the current

frame, remove from listframe, remove from list• list can be allocated from the heap or from a separate memory list can be allocated from the heap or from a separate memory

pool (no fragmentation problems, faster allocation)pool (no fragmentation problems, faster allocation)

129129

Cost of tombstonesCost of tombstones Time overheadTime overhead

• creation (allocation, &)creation (allocation, &)• check validity for each accesscheck validity for each access

almost free if hardware catches illegal addressesalmost free if hardware catches illegal addressese.g. outside of program memory areae.g. outside of program memory area

• double indirectiondouble indirection Space overheadSpace overhead

• significant (almost 1 per each live reference)significant (almost 1 per each live reference)• simple implementation: reclaim objects but simple implementation: reclaim objects but

leave tombstones (tombstones are usually leave tombstones (tombstones are usually much smaller)much smaller)

• augment with reference counters (reclaim augment with reference counters (reclaim when 0)when 0)

130130

Benefits of tombstonesBenefits of tombstones

Dangling references are caughtDangling references are caught Easy to rearrange heap objectsEasy to rearrange heap objects

• all references go through tombstoneall references go through tombstone only the tombstone reference must be only the tombstone reference must be

updatedupdated• rearrangement is necessary when compacting rearrangement is necessary when compacting

the heap (to eliminate external fragmentation)the heap (to eliminate external fragmentation) book: not widely used in language book: not widely used in language

implementations, Macintosh OS uses themimplementations, Macintosh OS uses them

131131

Locks and keysLocks and keys

Alternative to tombstonesAlternative to tombstones DisadvantagesDisadvantages

• works only for heap objectsworks only for heap objects• does not give 100% protectiondoes not give 100% protection

AdvantagesAdvantages• avoids the need of ‘keeping tombstones avoids the need of ‘keeping tombstones

forever’ (or reclaiming them)forever’ (or reclaiming them)

132132

Implementing locks & keysImplementing locks & keys Every pointer consists ofEvery pointer consists of

• the actual referencethe actual reference• and a keyand a key

Every heap object begins with a lock fieldEvery heap object begins with a lock field Access is valid if key = lock Access is valid if key = lock (next slide)(next slide) Allocation Allocation create a new key/lock value create a new key/lock value Reclaim Reclaim set lock to some special value set lock to some special value Why does it work?Why does it work?

• even if the memory area is used by some other object, it even if the memory area is used by some other object, it is very unlikely it has the same value as the key in the is very unlikely it has the same value as the key in the dangling referencedangling reference

133133

134134

Cost of locks & keysCost of locks & keys

Space overheadSpace overhead• extra word to every pointer & heap objectextra word to every pointer & heap object

Time overheadTime overhead• copying pointerscopying pointers• each access involves key/lock –comparisoneach access involves key/lock –comparison• unclear whether cheaper than tombstonesunclear whether cheaper than tombstones

tombstone: max 2 indirect accesses (and cache tombstone: max 2 indirect accesses (and cache misses)misses)

lock & key: 1 indirect access + some arithmeticslock & key: 1 indirect access + some arithmetics

135135

Language designLanguage design

Most languagesMost languages• do not (by default) generate ‘catch dangling do not (by default) generate ‘catch dangling

reference’ codereference’ code• ‘‘debug mode’ enables checksdebug mode’ enables checks

PascalPascal• programmer can enable dynamic checksprogrammer can enable dynamic checks

compiler uses locks & keys technique for compiler uses locks & keys technique for pointerspointers

CC• not even optional checksnot even optional checks

136136

Garbage collectionGarbage collection

Automatic reclamation of storageAutomatic reclamation of storage• essential in functional/logic languagesessential in functional/logic languages

no ‘stack objects’, everything in heapno ‘stack objects’, everything in heap

• more and more popular in imperative more and more popular in imperative languageslanguages

difficult to implementdifficult to implement convenience of programmingconvenience of programming

• slower than explicit ‘manual’ slower than explicit ‘manual’ reclamationreclamation

but eliminates need to check dangling but eliminates need to check dangling referencesreferences

137137

Reference countsReference counts When is an object X ‘not useful’?When is an object X ‘not useful’?

• no pointers to X existno pointers to X exist• place a counter to each object = number of pointers place a counter to each object = number of pointers

referring to this objectreferring to this object Maintaining reference countsMaintaining reference counts

• object X creation object X creation X.rc = 1 X.rc = 1• assignment p := qassignment p := q

decrement p^.rc (if p <> NIL)decrement p^.rc (if p <> NIL) increment q^.rcincrement q^.rc

• subroutine return subroutine return pointers deallocated from the stack framepointers deallocated from the stack frame decrement rc of each pointed objectdecrement rc of each pointed object

• hierarchical structures hierarchical structures recursive updates to recursive updates to componentscomponents

138138

Implementing reference countsImplementing reference counts ImplementationImplementation

• must ‘know’ the location of every pointermust ‘know’ the location of every pointer must know which parts contain pointersmust know which parts contain pointers

in stack frames (subroutine return)in stack frames (subroutine return) in heap objects (reclaim in heap objects (reclaim update rc in pointed sub-objects) update rc in pointed sub-objects)

• type descriptortype descriptor contains this information contains this information for each distinct type (class)for each distinct type (class) for each subroutinefor each subroutine

• epilogue code uses this to update reference countersepilogue code uses this to update reference counters e.g. a table containing e.g. a table containing

• an offset to each pointeran offset to each pointer• pointer to the type descriptor of each pointerpointer to the type descriptor of each pointer

• counter = 0 counter = 0 reclaim object (and update sub-objects) reclaim object (and update sub-objects)• each pointer must be initialized to NIL to prevent the garbage each pointer must be initialized to NIL to prevent the garbage

collector from following dangling pointerscollector from following dangling pointers

139139

Cost of reference countsCost of reference counts SpaceSpace

• extra counter field in every heap objectextra counter field in every heap object• may be significant for small objects (e.g. cons cells)may be significant for small objects (e.g. cons cells)

TimeTime• updating reference countsupdating reference counts• depends on the ‘nature’ of the programdepends on the ‘nature’ of the program

ProblemProblem• object may be useless even if rc > 0 (next slide)object may be useless even if rc > 0 (next slide)• caused by circular structures caused by circular structures

not a problem with non-recursive structures (e.g. strings)not a problem with non-recursive structures (e.g. strings) not a problem in purely functional languages (no cycles)not a problem in purely functional languages (no cycles)

Reference counts may be used with tombstonesReference counts may be used with tombstones• explicit reclaiming of objectsexplicit reclaiming of objects• automatic reclaiming of tombstonesautomatic reclaiming of tombstones• rc > 0 rc > 0 programmer has programmer has not not reclaimed the referred object (cyclic or reclaimed the referred object (cyclic or

not)not)

140140

141141

Mark-and-sweep collectionMark-and-sweep collection Better definition of “object X is not Better definition of “object X is not

useful”useful”• X can not be reached from valid pointers X can not be reached from valid pointers

outsideoutside the heap the heap• covers the situation of the previous figurecovers the situation of the previous figure

Mark-and-sweep garbage collectionMark-and-sweep garbage collection1.1. mark all heap objects as ‘useless’mark all heap objects as ‘useless’2.2. mark all reachable objects as ‘useful’mark all reachable objects as ‘useful’

begin from stack frames & recurse into structuresbegin from stack frames & recurse into structures if a block is already marked ‘useful’ if a block is already marked ‘useful’ return return

3.3. move all ‘useless’ blocks of heap to free list move all ‘useless’ blocks of heap to free list (reclaim)(reclaim)

142142

Potential problemsPotential problems

Steps 1 & 3Steps 1 & 3• collector must know where every ‘in-use’ heap collector must know where every ‘in-use’ heap

block begins and endsblock begins and ends• variable sizes variable sizes each block must each block must

start with its size start with its size contain a free/used indicatorcontain a free/used indicator

Step 2Step 2• collector must know the locations of pointerscollector must know the locations of pointers place a pointer to object’s type descriptor place a pointer to object’s type descriptor

into each heap blockinto each heap block

143143

Cost of ‘mark-and-sweep’Cost of ‘mark-and-sweep’ Extra space for heap objectsExtra space for heap objects

• address to type descriptoraddress to type descriptor type descriptor contains the sizetype descriptor contains the size

• if type descriptor addresses are word-alignedif type descriptor addresses are word-alignedthen last 2 bits of the address can be used forthen last 2 bits of the address can be used for

‘‘free’ flag andfree’ flag and ‘‘useful’ flaguseful’ flag

Step 2Step 2• needs a recursion stack for the explorationneeds a recursion stack for the exploration

garbage collection is done because we are OUT of space!garbage collection is done because we are OUT of space!• Schorr & Waite -67: no stack neededSchorr & Waite -67: no stack needed

redirect pointers to find the way backredirect pointers to find the way back

144144

Schorr-Waite techniqueSchorr-Waite technique Figure Figure next slidenext slide

• Embeds the stack in the fields of heap blocksEmbeds the stack in the fields of heap blocks keep track of current & previous block (Y,R)keep track of current & previous block (Y,R)

• Exploring from Y to WExploring from Y to W reverse reverse the pointer to W to point to Rthe pointer to W to point to R set current block to W, previous to Yset current block to W, previous to Y

• Returning from W to YReturning from W to Y use the reversed pointer in Y to find the previous block Ruse the reversed pointer in Y to find the previous block R flip reversed pointer back to Wflip reversed pointer back to W set current block to Y, previous to Rset current block to Y, previous to R

Fact: at most one pointer per block is reversedFact: at most one pointer per block is reversed• must be marked somehow must be marked somehow bookkeeping data in block bookkeeping data in block

145145

146146

Storage compactionStorage compaction Remove external fragmentationRemove external fragmentation

• easy with tombstoneseasy with tombstones Stop-and-copy techniqueStop-and-copy technique

• compaction while eliminating steps 1 and 3 of mark-and-compaction while eliminating steps 1 and 3 of mark-and-sweep algorithmsweep algorithm

• divide heap into 2 halves (virtual memory!), say H1 & H2divide heap into 2 halves (virtual memory!), say H1 & H2• all allocations are done in H1all allocations are done in H1• memory full memory full copy all reachable data to H2 copy all reachable data to H2

use ‘useful’ flags to keep track of shared structuresuse ‘useful’ flags to keep track of shared structures not ‘useful’ not ‘useful’ pointer points to H1 pointer points to H1 copy data to H2, copy data to H2,

update pointer to H2update pointer to H2 ‘‘useful’ useful’ pointer points to H2 pointer points to H2 just copy the reference just copy the reference

• swap H1 & H2swap H1 & H2

147147

Cost of ‘stop-and-copy’Cost of ‘stop-and-copy’

Only half of the heap is in useOnly half of the heap is in use• not a problem with virtual memorynot a problem with virtual memory

Time overhead Time overhead • proportional to the amount of non-proportional to the amount of non-

garbage blocksgarbage blocks• mark-and-sweep: all blocksmark-and-sweep: all blocks

148148

M-a-S vs. RCM-a-S vs. RC Time usageTime usage

• M-a-S has lower overhead than RC in ‘normal’ M-a-S has lower overhead than RC in ‘normal’ operationoperation

costs only when a GC is madecosts only when a GC is made• suffers from “stop-the-world” symptomsuffers from “stop-the-world” symptom

everything freezes at GCeverything freezes at GC execution happens in burstsexecution happens in bursts the more GC is needed the more it costs (lot of heap the more GC is needed the more it costs (lot of heap

data)data) Space usage comparableSpace usage comparable

• reversed pointer indicator / reference counterreversed pointer indicator / reference counter• address to type descriptoraddress to type descriptor

149149

Improved M-a-SImproved M-a-S Idea: trade GC accuracy to GC speedIdea: trade GC accuracy to GC speed

• divide heap to permanent and dynamic halfdivide heap to permanent and dynamic half• GC is performed only in the dynamic halfGC is performed only in the dynamic half• data is moved to permanent half if it lives over data is moved to permanent half if it lives over

1 or 2 GCs1 or 2 GCs• like ‘stop-and-copy’ but no swappinglike ‘stop-and-copy’ but no swapping• risk: permanent area may get fullrisk: permanent area may get full

should not happen with ‘normal’ programsshould not happen with ‘normal’ programs Avoiding ‘stop-the-world’Avoiding ‘stop-the-world’

• interleave normal execution & GCinterleave normal execution & GC• multiprocessor computers: P1 executes, P2 multiprocessor computers: P1 executes, P2

does GCdoes GC

150150

GC and weak typingGC and weak typing Most GC techniques use type descriptorsMost GC techniques use type descriptors

• need to find pointers in objectsneed to find pointers in objects Weakly typed languages & GC?Weakly typed languages & GC?

• probabilistic approachprobabilistic approach number of block in the heap << number of possible bit patterns in number of block in the heap << number of possible bit patterns in

addressesaddresses probability that a non-pointer data area contains a ‘heap probability that a non-pointer data area contains a ‘heap

address’ is smalladdress’ is small assume that everything that looks like a pointer assume that everything that looks like a pointer is is a pointer & a pointer &

apply standard mark-and-sweep algorithm apply standard mark-and-sweep algorithm • propertiesproperties

never reclaims useful blocks never reclaims useful blocks • unless programmer ‘hides’ pointers (possible in C)unless programmer ‘hides’ pointers (possible in C)

some useless blocks may get marked as usefulsome useless blocks may get marked as useful compaction impossible: we never know which ‘pointers’ should be compaction impossible: we never know which ‘pointers’ should be

changedchanged

151151

ListsLists

recursive definition: list isrecursive definition: list is• an empty list oran empty list or• a pair consisting of an object and a lista pair consisting of an object and a list

‘‘arrays of functional languages’arrays of functional languages’• useful in imperative programs, toouseful in imperative programs, too• can be implemented in any language with can be implemented in any language with

records and pointersrecords and pointers homogeneous in typed languages (ML)homogeneous in typed languages (ML) Lisp lists are heterogeneous (untyped Lisp lists are heterogeneous (untyped

language)language)

152152

ImplementationImplementation

Chain of blocks (ML)Chain of blocks (ML)• component object component object maymay be contained in be contained in

the blockthe block useful for primitive typesuseful for primitive types

• or the block contains a pointer to the or the block contains a pointer to the componentcomponent

must have some ‘tag bit’ to tell which case must have some ‘tag bit’ to tell which case holdsholds

Chain of ‘cons-cells’ (Lisp)Chain of ‘cons-cells’ (Lisp)• combination of 2 pointerscombination of 2 pointers

153153

Basic operationsBasic operations Convenience notationConvenience notation

• ML: [a,b,c,d]ML: [a,b,c,d]• Lisp: (a b c d) Lisp: (a b c d)

also: (a.(b.(c.(d.nil)))) (dotted pair notation)also: (a.(b.(c.(d.nil)))) (dotted pair notation) note: (a.b) is NOT a proper listnote: (a.b) is NOT a proper list

List manipulationList manipulation• construction, extraction, concatenationconstruction, extraction, concatenation• LispLisp

car, cdr, cons, appendcar, cdr, cons, append car & cdr (coulder) are ‘historical accidents’car & cdr (coulder) are ‘historical accidents’ Common Lisp: illegal uses just return nilCommon Lisp: illegal uses just return nil

• MLML hd, tl, ::, @ (infix notation)hd, tl, ::, @ (infix notation) illegal uses cause runtime exceptionillegal uses cause runtime exception

154154

155155

List functionsList functions Typical built-in functionsTypical built-in functions

• test for emptinesstest for emptiness• lengthlength• n-n-th elementth element• reversalreversal

Polymorphic functionsPolymorphic functions• filter, map, accumulatefilter, map, accumulate

Haskell (successor of ML)Haskell (successor of ML)• list comprehension =list comprehension =• convenience notation for combinations of generation, convenience notation for combinations of generation,

filtering and mappingfiltering and mapping• much like corresponding mathematical definition of setsmuch like corresponding mathematical definition of sets

[i*i | i<- [1..100]; i mod 2 = 1][i*i | i<- [1..100]; i mod 2 = 1]

156156

Assignment & equalityAssignment & equality

Primitive typesPrimitive types• obvious semantics & implementationobvious semantics & implementation• bitwise copyingbitwise copying• bitwise comparisonbitwise comparison

Structured types, abstract data types?Structured types, abstract data types? Example: strings s & t, does s=t mean s & tExample: strings s & t, does s=t mean s & t

• are aliases?are aliases?• occupy a bitwise identical storage?occupy a bitwise identical storage?

uninteresting (garbage bits)uninteresting (garbage bits)• contain the same sequence of characters?contain the same sequence of characters?• would appear the same if printed?would appear the same if printed?

157157

Deep and shallow equality & Deep and shallow equality & assignmentassignment

E1 = E2 (in reference model)E1 = E2 (in reference model)• E1, E2 are the same object = E1, E2 are the same object = shallow equalityshallow equality• E1 & E2 refer to objects that are (in some sense) equal = E1 & E2 refer to objects that are (in some sense) equal = deep deep

equalityequality may require recursive testingmay require recursive testing

E1 := E2 in reference modelE1 := E2 in reference model• suppose E2 refers to object Osuppose E2 refers to object O• shallow assignmentshallow assignment

make E1 a reference to Omake E1 a reference to O• deep assignmentdeep assignment

create a create a copy, copy, say C, of Osay C, of O make E1 a reference to Cmake E1 a reference to C

E1 := E2 in value modelE1 := E2 in value model• ‘‘deep’ for primitive typesdeep’ for primitive types• always shallow for pointersalways shallow for pointers

158158

Language designLanguage design

Most languages provide only the ‘shallow’ Most languages provide only the ‘shallow’ versionsversions

Scheme (most well-known Lisp dialect)Scheme (most well-known Lisp dialect)• provides 3 equality testing functionsprovides 3 equality testing functions• eq?, eqv?, equal?eq?, eqv?, equal?

Deep assignment is rareDeep assignment is rare• Clu: copy1, copyClu: copy1, copy

Languages with ADTsLanguages with ADTs• programmer should carefully think which programmer should carefully think which

versions to implementversions to implement

159159

1 data types. 2 two principal purposes two principal purposes provide implicit context forprovide...

Documents