modern programming languages: bytecode and virtual machines

97
Modern programming languages: ByteCode and Virtual Machines CSE 6329, Spring 2011 Christoph Csallner, UTA

Upload: others

Post on 10-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Modern programming languages:

ByteCode and Virtual Machines

CSE 6329, Spring 2011

Christoph Csallner, UTA

Old days: No Virtual Machine

You write: Program in source language

Source language

specification

MyProg.cpp

Book: “The C++

programming

language”

Source-to-machine

compiler

1

Program in machine code MyProg.exe in MS

Windows x86 binaryMachine instruction set

Actual machine

(Old) Microsoft

Visual Studio

language”

Old days: No Virtual Machine

You write: Program in source language

Source language

specification

MyProg.cpp

Book: “The C++

programming

language”

Source-to-machine

compiler

2

Program in machine code MyProg.exe in MS

Windows x86 binaryMachine instruction set

Actual machine

(Old) Microsoft

Visual Studio

language”Program in

Intermediate

Representation

Today: Virtual Machines popular

You write: Program in source language MyProg.java

Source language spec Java Specification

Src-to-bytecode comp. javac MyProg.java

Program in bytecode MyProg.class

MyProg in MS

Windows x86 binary

3

Machine instruction set

Actual machine

Program in machine code (mostly)

Bytecode language spec JVM Specification

java MyProgVirtual machine

Program Analysis today

• Many programs compiled to bytecode

– Virtual machine executes bytecode

• Bytecode has advantages over source language

• Many Program Analyses analyze bytecode• Many Program Analyses analyze bytecode

– Results translated back to your original Java/C#/… source

program

• Example program anlyses that are very easy to use:

– For Java: FindBugs: http://findbugs.sourceforge.net/

– For C#: Pex for fun: http://www.pexforfun.com/

4

Big picture

You write: MyProg.java

Source to bytecode compiler

E.g.: javac, MS Visual Studio

Program analysis

E.g.: FindBugs, Pex

5

Bytecode: MyProg.class

Virtual machine, e.g.:

JVM, .Net runtime

OS, machine code

E.g.: Windows x86

Why is bytecode good for Program

Analysis?

Simple yet powerful

• Bytecode is simpler than source language

– Similar to compiler IR

– Simplifies analysis

– Java, C#, VB, F#, etc. are far more complex

7

Simple yet powerful

• Bytecode is simpler than source language

– Similar to compiler IR

– Simplifies analysis

– Java, C#, VB, F#, etc. are far more complex

• Retains most information of source language

– Similar to compiler IR

– Enables meaningful analysis

8

Simple

• Fewer language elements = less “syntactic sugar”

• Example: Explicit loop constructs in Java

– Sourcecode: 4

• Which ones?

9

Simple

• Fewer language elements = less “syntactic sugar”

• Example: Explicit loop constructs in Java

– Sourcecode: 4

• while, do (“until”), basic for, enhanced for

– Bytecode: 0

• ?

10

Simple

• Fewer language elements = less “syntactic sugar”

• Example: Explicit loop constructs in Java

– Sourcecode: 4

• while, do (“until”), basic for, enhanced for

– Bytecode: 0

• All 4 are mapped to jumps

– Makes program analysis easier to implement

11

Powerful

• Still a non-trivial, Turing-complete language

– As least as expressive as Java source language

– Supports all legal Java source programs (and more)

• Bytecode retains most information of original source

program

– Allows automatic reconstruction of source from bytecode

– “Dis-assembler” fast, powerful, and convenient

12

Accessible

• Several “dis-assembler” libraries provide a nice API to

retrieve and even change bytecode

– Beyond capability of Java or C# built-in reflection

– BCEL and ASM for Java bytecode

– ExtendedReflection (part of Pex) for .Net bytecode

13

Documented Standard

• Carefully designed and specified

– Better than most compiler IR

• Java Virtual Machine specification

– http://java.sun.com/docs/books/jvms/second_edition/ht– http://java.sun.com/docs/books/jvms/second_edition/ht

ml/VMSpecTOC.doc.html

• .Net Virtual Machine specification

– http://www.ecma-

international.org/publications/standards/Ecma-335.htm

14

Shared Standard

• Shared standard among different languages

– Java, C#, VB, F#, etc. all compiled to same bytecode

– Programs in many source languages can be checked with

single Program Analysis tool

• Shared standard among different operating systems• Shared standard among different operating systems

– Cell phones, mainframe, etc. all run same bytecode

– Programs on many OS can be checked with single tool

15

Old days: Typically no shared intermediate

language

You write: MyProg.cpp You write: MyProg.ada

MyProg.exe in Windows x86

16

MyProg in Linux x86

Linux

machine

Windows

machine

Visual

Studio

gcc gnat

Bytecode:

Shared intermediate language

You write: MyProg.java You write: MyProg.cs You write: MyProg.ada

Source-to-

bytecode compiler

17

MyProg in Windows x86

Windows

machine

MyProg in Linux x86

Linux

machine

MyProg in XYZ binary

Cell

phone

Bytecode: MyProg.class

Java virtual

machine

Many software engineering papers focus on

combination of Java source with Java bytecode

• Probably easiest to understand

• Other combinations work similarly

• Well documented, many research papers

• Industrial-strength, but still relatively simple• Industrial-strength, but still relatively simple

– C# started with Java-like features

– But C# grew faster � more complex now

– C++ more complex than Java

– Other combinations more obscure

20

javac compiler implements our

source-bytecode combination

• Comes with JDK

• Open source

• Translates any legal Java

program into a Java

You write: MyProg.java

javac

Java spec

program into a Java

bytecode program

• Eclipse compiler is an

alternative

21…

Bytecode: MyProg.class

Java virtual

machine

JVM spec

Overview

• Following overview gives a flavor

– Slightly simplified: Details may differ from JVM

– Omits several parts: Exceptions, floating point, …

• May be intimidating • May be intimidating

– But remember that you can typically use a powerful

disassembler to help with bytecode

• Following mostly copied from Java virtual machine

specification 2nd edition:http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html

22

JVM Specification:

STRUCTURE OF THE JAVA VIRTUAL

MACHINE

JVM Specification:http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html

Iain D. Craig, Virtual Machines, Springer, 2005, Chapter 3:http://www.amazon.com/Virtual-Machines-Iain-D-Craig/dp/1852339691

23

Structure of the Java Virtual Machine

= Sections of chapter 3 of JVM Spec

1. The class file format

2. Data types

3. Primitive types and values

4. Reference types and values4. Reference types and values

5. Runtime data areas

6. Frames

7. …

24

Class file format

• Standard format for Java bytecode

• JVM accepts bytecode only in class file format

• JVM Spec, Section 4, defines class file format

– Contents – Contents

– Order

– Representation

– Verification [Section 4.9]

25

Class file format

• Binary format

• Independent of hardware and OS

– Fixes byte order (“endianness”),

regardless of byte order of current machine

• Independent of actual files, despite the name

• Class may arrive at runtime as a byte array from

elsewhere

– From a class generator

– From the web

26

Class/interface � class file

• 1:1 mapping between (class or interface) and class

file

– Class file can define a class or an interface

– Each class is defined in its own class file

– Each interface is defined in its own class file

• Applies to top-level types and nested types

– Java compiler creates a separate class file for each nested

class

27

Basic organization

• Class file = stream of bytes, 1 byte = 8 bits

• Multibyte items stored in big-endian = High byte first

• Read consecutive bytes

• Interpret consecutive bytes as unsigned number• Interpret consecutive bytes as unsigned number

– 8 bit item = 1 byte [0..255]

– 16 bit item = 2 consecutive bytes [0..65,535]

– 32 bit item = 4 consecutive bytes [0..4,294,967,295]

– 64 bit item = 8 consecutive bytes [0..18,446,744,073,709,551,615]

28

Class file data types

• Own simple data types

– Different from Java data types

– Different from JVM data types

– Neither “byte” nor “int” nor “long”

• Just three types

– u1 = unsigned byte

– u2 = unsigned 2 consecutive bytes: (high, low)

– u4 = unsigned 4 consecutive bytes

29

Class file structure

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

30

Class/Interface

Header

Class/Interface Header

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

32

Magic

• Magic number

• First four bytes of a Java class file

• Each class file has the same magic number

• Helps OS recognize this file as a Java class file• Helps OS recognize this file as a Java class file

• Value is 3405691582 = CAFEBABE in hex

• More on CafeBabe:

– http://www.artima.com/insidejvm/whyCAFEBABE.html

33

minor_version, major_version

• Together define the version of the class file format

used in the class file

• Tells JVM if it understands the format of the class file

– An older JVM can reject to load a class file, if the class file – An older JVM can reject to load a class file, if the class file

is in a class file format that was defined after the JVM was

released

34

Constant Pool

of this Class/Interface

Constant Pool

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

36

Constant Pool

• Constants from user source program

– Constant String objects, int, float, long, double

• Internal String values

– Unicode character sequences– Unicode character sequences

• Names and signatures of

– Classes, interfaces, methods, fields

37

Constant Pool

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;

cp_info {

u1 tag;

u1 info[];

}u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

38

Constant Pool

• constant_pool_count = Number of entries in the

constant_pool (+ 1)

• constant_pool = Sequence of cp_info items

• cp_info = {u1 tag; u1 info[]; }• cp_info = {u1 tag; u1 info[]; }

• Tag byte defines the kind of cp_info, e.g.:

– 3 indicates a CONSTANT_Integer_info

• Info array holds the actual data, e.g.:

– Info array of CONSTANT_Integer_info is one u4

{3; byte; byte; byte; byte}

39

Index into Constant Pool

• u2 value

– Greater than zero

– Less than constant_pool_count

• Example• Example

– constant_pool_count = 7

– 1 = Index of first element

– 6 = Index of last element

40

Constant String Objects

• Declared in the user program as constant objects of

the type String, e.g.:

– String s = “CSE 6329 rocks”;

• CONSTANT_String_info = { • CONSTANT_String_info = {

u1 tag; // 8

u2 string_index; } // index into cp

• cp at string_index must be a CONSTANT_Utf8_info

41

Internal String Values

• Holds a character sequence

– Each character is a Unicode character

– Each character represented by 1, 2, or 3 bytes

• Used for both user program constant objects and • Used for both user program constant objects and

internal Strings (method signatures, etc.)

• CONSTANT_Utf8_info = {

u1 tag; // 1

u2 length; // nr of bytes

u1 bytes[length];} // not null-terminated

42

Access Rights

of this Class/Interface

Access Rights

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

44

Class/Interface Access Rights: access_flags

• Bit mask – each bit represents a flag

• Each flag represents an access permission or a

property of this class or interface

– Flag = (class/interface) was declared …– Flag = (class/interface) was declared …

– 0x0001 = public

– 0x0010 = final, no subclasses allowed

– 0x0020 = [low-level detail]

– 0x0200 = an interface, not a class

– 0x0400 = abstract, may not be instantiated

45

Class/Interface Access Rights:

Public or Default

• Class/interface either has public flag set or not

– No “private” or “protected” flags

• Public flag set

– Access from within or outside its package– Access from within or outside its package

• Default access rights, if public flag not set

– Access only from within its package

46

Direct Subclass Relation

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

47

Name and Direct Super-Class

of this Class/Interface

this_class

• Index into the constant_pool (cp)

• cp[this_class] is a CONSTANT_Class_info = {

u1 tag; // 7

u2 name_index; } // index into cpu2 name_index; } // index into cp

• cp[name_index] = CONSTANT_Utf8_info

– Name of class or interface

– In “internal” notation: Replace “.” with“/”

– Example: “java/lang/Object”

49

super_class

• If this class file defines a class,

– super_class must be zero or an index into the cp

• If super_class is zero

– This class file must represent java.lang.Object – the root – This class file must represent java.lang.Object – the root

class of the Java class hierarchy

• If super_class is non-zero,

– cp at super_class must be a CONSTANT_Class_info

representing the direct super class

50

super_class

• If this class file defines an interface

– super_class must be an index into the cp

– cp at super_class must be a CONSTANT_Class_info for

java.lang.Object

• This is a bit confusing • This is a bit confusing

– An interface does not have a super class

– E.g., the instance method getSuperclass() of java.lang.Class

returns null if invoked on an interface

51

Direct Interfaces

of this Class/Interface

Implemented interfaces

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

53

interfaces[interfaces_count]

• interfaces_count = Number of direct super interfaces

• Interfaces = Array of indices into cp

• Cp at each index must be a

CONSTANT_Class_info that represents a direct super CONSTANT_Class_info that represents a direct super

interface

54

Fields

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

55

Fields

of this Class/Interface

fields[fields_count]

• fields_count = Number of fields declared by this class

or interface

– Includes static fields and instance fields

– Does not include any inherited fields

• fields = Sequence of field_info items

– Each field_info represents one field declared by this class

or interface

57

field_info

• field_info = {

– u2 access_flags; // Access rights

– u2 name_index; // Simple name

– u2 descriptor_index; // Type

– u2 attributes_count; // Attributes

– attribute_info attributes[attributes_count]; }

58

Field Access Rights

field_info access_flags

• Flag = Field was declared …

– The field is accessible …

• 0x0001 = public

– Within or outside its package– Within or outside its package

• 0x0002 = private

– Only within its defining class

• 0x0004 = protected

– Within its package

– From subclasses inside or outside its package

59

Field Access Rights

• Only one of the access flags (public, private,

protected) may be set

• “Default” access, if no access flag is set

– Only within its package– Only within its package

• Reminder from Java Spec: Class X can access a field

C.f only if it can access class C.

– Public field f may not be accessible for class X

60

More Field Access Rights

field_info access_flags

• Flag = Field was declared …

• 0x0008 = static

– Class field (one per class)

– Not an instance field (one per instance)– Not an instance field (one per instance)

• 0x0010 = final

– No further assignment after initialization

• 0x0040 = volatile

• 0x0080 = transient

61

Field Signature

• cp[name_index] is a CONSTANT_Utf8_info

– Simple name of field, e.g.:

– double[] foo; // “foo”

– static Object bar; // “bar”

• cp[descriptor_index] is CONSTANT_Utf8_info

– Type of field, e.g.:

– double[] foo; // “[D”

– static Object bar; // “Ljava/lang/Object;”

62

Descriptor Notation

• Cryptic type notation used in Java bytecode

– Notation Java type interpretation

– Z boolean true or false

– C char Unicode character

– L<name>; reference instance of <name>

– [ reference one array dimension

63

Descriptor Notation

– Notation Java type interpretation

– B byte 8 bit signed integer

– S short 16 bit signed integer

– I int 32 bit signed integer– I int 32 bit signed integer

– J long 64 bit signed integer

– F float 32 bit floating-point

– D double64 bit floating-point

64

Field Attributes:

attributes[attributes_count]

• attributes_count = Number of attributes for this field

• attributes = Sequence of attribute_info items

– Each attribute_info represents one attribute

• Examples:• Examples:

– @Deprecated int myDeprecatedField = 0;

– @Deprecated @MyAttribute int otherField = 1;

65

Methods

of this Class/Interface

Methods

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

67

methods[methods_count]

• methods_count = Number of methods declared by

this class or interface

– Includes static methods and instance methods

– Includes constructors and static initializers

– Does not include inherited methods

• methods = Sequence of method_info items

– Each method_info represents one method declared by this

class or interface

68

method_info

• method_info {

– u2 access_flags; // Method access rights

– u2 name_index; // Simple name

– u2 descriptor_index; // Signature

– u2 attributes_count; // Attributes

– attribute_info attributes[attributes_count]; }

69

Method Access Rights

method_info access_flags

• Next two slides identical to field access rights

– Public, private, protected, default

• Fields, constructors, methods are all “members” of a

class or interfaceclass or interface

– Similar access right rules

70

Method Access Rights

method_info access_flags

• Flag = Method was declared …

– The method is accessible …

• 0x0001 = public

– Within or outside its package– Within or outside its package

• 0x0002 = private

– Only within its defining class

• 0x0004 = protected

– Within its package

– From subclasses inside or outside its package

71

Method Access Rights

• Only one of the access flags (public, private,

protected) may be set

• “Default” access, if no access flag is set

– Only within its package– Only within its package

• Reminder from Java Spec: Class X can access a

method C.m only if it can access class C.

– Public method m may not be accessible for class X

72

More Method Access Rights

method_info access_flags

• Flag = Field was declared …

• 0x0008 = static

– Class method (called independent of instance)

– Not an instance method (which needs an instance as a – Not an instance method (which needs an instance as a

“receiver instance” or “this parameter”)

• instance.method(p2, p3, ..)

• 0x0010 = final

– May not be overridden by sub-classes

73

More Method Access Rights

method_info access_flags

• Flag = Field was declared …

• 0x0020 = synchronized

• 0x0100 = native

– Implemented in a language other than Java– Implemented in a language other than Java

• 0x0400 = abstract

– No implementation is provided

• 0x0800 = strictfp

74

Method Name

• cp[name_index] is a CONSTANT_Utf8_info

– Simple name of method

– Constructor: “<init>”

– Class initializer: “<clinit>”

• Examples

– public int foo() { } // “foo”

– MyClass(long p) {} // “<init>”

– static { bar = 5; }// “<clinit>”

75

Method Signature

• cp[descriptor_index] is CONSTANT_Utf8_info

– (Parameter types) Return type

– In same cryptic notation as field types

– “V” = void is also a legal return type

– Never includes a “receiver type”

• Examples

– public int foo() { } // “()I” -- instance method

– MyClass(long p) {} // “(J)V” -- constructor

– static { bar = 5; } // “()V”

76

Method Attributes:

attributes[attributes_count]

• attributes_count = Number of attributes for this

method

• attributes = Sequence of attribute_info items

– Each attribute_info represents one attribute– Each attribute_info represents one attribute

– Code attribute, present iff the method is neither abstract

nor native

– Exceptions attribute, lists declared exceptions

– @Deprecated attribute

77

Code of a method/constructor/clinit:

In a Code Attribute

• Code_attribute {

– u2 attribute_name_index; // “Code”

– u4 attribute_length; // length of attribute

– u2 max_stack; // max size of operand stack

– u2 max_locals; // max nr of local variables– u2 max_locals; // max nr of local variables

– u4 code_length; // nr bytes in code array

– u1 code[code_length]; // atcual byte codes

– u2 exception_table_length; // exception handlers

– { u2 start_pc; u2 end_pc; u2 handler_pc; u2 catch_type; } exception_table[exception_table_length];

– u2 attributes_count; // debugging info, etc.

– attribute_info attributes[attributes_count]; }

78

Attributes

of this Class/Interface

Class/Interface Attributes

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

80

Class/Interface Attributes:

attributes[attributes_count]

• attributes_count = Number of attributes for this class

or interface

• attributes = Sequence of attribute_info items

– Each attribute_info represents one attribute– Each attribute_info represents one attribute

– @Deprecated attribute

– Etc.

• Example:

– @Deprecated interface Foo { /*empty*/ }

81

Referring to fields/methods

in other classes

• So far: How to define the elements of a class

– Class name

– Access rights of the class

– Fields of the class

– Etc.

• Next: How can a byte code instruction refer to a field

of another class?

– Add ref items to constant pool of this class file

83

CONSTANT_Fieldref_info

CONSTANT_Methodref_info

• Reference to a field/method/constructor

• CONSTANT_Fieldref_info { // similar for all

– u1 tag;

– u2 class_index; // type declaring this member– u2 class_index; // type declaring this member

// CONSTANT_Class_info

– u2 name_and_type_index;

// simple name and descriptor

// CONSTANT_NameAndType_info

}

84

Wait,

Aren’t there many more details?

Yes, see the JVM Specification or ask in

class, office hours, mailing list

Do you have a small example?

The simplest possible class file

Interface I in Java source code

• /**

* @author [email protected] (Christoph Csallner)

*/

public interface I {

}

87

Interface I in Java bytecode (hex)

• cafebabe00000032000707000201000149070004010

0106a6176612f6c616e672f4f626a65637401000a536

f7572636546696c65010006492e6a61766106010001

000300000000000000010005000000020006

88

Interface I in Java bytecode (hex)

• Cafebabe 0000 0032 // Header

• 0007 // Constant Pool

– 07 0002 // cp[1]

– 01 0001 49 // cp[2]

– 07 0004 // …– 07 0004 // …

– 01 0010 6a6176612f6c616e672f4f626a656374

– 01 000a 536f7572636546696c65

– 01 0006 492e6a617661 // cp[6]

• 0601 // Access Rights

• 0001 0003 000000000000 0001 0005000000020006

89

Header

• Ca fe ba be // magic

• 00 00 // minor_version: 0

• 00 32 // major_version: 50

90

Constant Pool

Count and first two items

• 00 07 // constant_pool_count: 7

• 07 // cp[1]: First cp item, index 1

// cp_info tag 7 = Class

– 00 02 // CONSTANT_Class name_index: 2– 00 02 // CONSTANT_Class name_index: 2

• 01 // cp[2], cp_info tag: 1 = Utf8

– 00 01 // CONSTANT_Utf8 length: 1 byte

– 49 // only byte of Utf8 String value: “I”

91

Rest of Constant Pool

• 07 0004 // cp[3] = Class, points to item 4

• 01 0010 // cp[4] = Utf8, 16 bytes

– 6a 61 76 61 2f 6c 61 6e 67 2f 4f 62 6a 65 63 74

// “java/lang/Object”

• 01 000a // cp[5] = Utf8, 10 bytes

– 53 6f 75 72 63 65 46 69 6c 65 // “SourceFile”

• 01 0006 // cp[6] = Utf8, 6 bytes

– 49 2e 6a 61 76 61 // “I.java”

92

Access rights, name, subclass relation,

interfaces, fields, methods

• 06 01 // Access flags

– 0x0001 = Public

– 0x0200 = Interface

– 0x0400 = Abstract

• 00 01 // this class: cp[1]

• 00 03 // super class: cp[3]

• 00 00 // interfaces count: zero

• 00 00 // fields count: zero

• 00 00 // methods count: zero

93

Attributes

• 00 01 // attributes count: one

• 00 05 // attributes[1], index of name

// points to cp[5] = “SourceFile”

• 00 00 00 02 // length of attribute: 2 bytes• 00 00 00 02 // length of attribute: 2 bytes

• 00 06 // index of source file

// points to cp[6] = “I.java”

94

Interface I in Java bytecode (hex)

• Cafebabe 0000 0032 // Header

• 0007 // Constant Pool

– 07 0002 // cp[1]

– 01 0001 49 // cp[2]

– 07 0004 // …– 07 0004 // …

– 01 0010 6a6176612f6c616e672f4f626a656374

– 01 000a 536f7572636546696c65

– 01 0006 492e6a617661 // cp[6]

• 0601 // Access Rights

• 0001 0003 000000000000 0001 0005000000020006

95

How can I …?

Low-level tool support

Editor for binary files

• Frhed for MS Windows, open source

– http://frhed.sourceforge.net/

97

Class File Disassembler

• Part of JDK, usage:

javap -verbose <ClassName>

98