64 bits for developers

64 bits for developers

By Roman Okolovich

Introduction

x86-64 is a superset of the x86 instruction set

architecture. x86-64 processors can run existing 32-bit or

16-bit x86 programs at full speed, but also support new

programs written with a 64-bit address space and other

additional capabilities.

The x86-64 specification was designed by Advanced

Micro Devices (AMD), who have since renamed it

AMD64. This was the first time any company other than

Intel made significant additions to the IA-32 architecture.

x86-64 is backwards compatible with 32-bit code without

any performance loss.

x64 architectural features 64-bit integer capability: All general-purpose registers (GPRs) are expanded from 32 bits to 64 bits, and all arithmetic and logical operations,

memory-to-register and register-to-memory operations, etc. can now operate directly on 64-bit integers. Pushes and pops on the stack are always in 8-byte strides, and pointers are 8 bytes wide.

Additional registers: In addition to increasing the size of the general-purpose registers, the number of named general-purpose registers is increased from eight (i.e. eax,ebx,ecx,edx,ebp,esp,esi,edi) in x86-32 to 16. It is therefore possible to keep more local variables in registers rather than on the stack, and to let registers hold frequently accessed constants; arguments for small and fast subroutines may also be passed in registers to a greater extent. However, AMD64 still has fewer registers than many common RISC processors (which typically have 32–64 registers) or VLIW-like machines such as the IA-64 (which has 128 registers).

Additional XMM (SSE) registers: Similarly, the number of 128-bit XMM registers (used for Streaming SIMD instructions) is also increased from 8 to 16.

Larger virtual address space: Current processor models implementing the AMD64 architecture can address up to 256 TB (281,474,976,710,656 bytes)[4] of virtual address space. This limit can be raised in future implementations to 16 EB (18,446,744,073,709,551,616 bytes). This is compared to just 4 GB (4,294,967,296 bytes) for 32-bit x86. This means that very large files can be operated on by mapping the entire file into the process' address space (which is sometimes faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space.

Larger physical address space: Current implementations of the AMD64 architecture can address up to 1 TB (1,099,511,627,776 bytes) of RAM; the architecture permits extending this to 4 PB (4,503,599,627,370,496 bytes) in the future (limited by the page table entry format). In legacy mode, Physical Address Extension (PAE) is included, as it is on most current 32-bit x86 processors, allowing access to a maximum of 64 GB (68,719,476,736 bytes).

Instruction pointer relative data access: Instructions can now reference data relative to the instruction pointer (RIP register). This makes position independent code, as is often used in shared libraries and code loaded at run time, more efficient.

SSE instructions: The original AMD64 architecture adopted Intel's SSE and SSE2 as core instructions. SSE3 instructions were added in April 2005.SSE2 replaces the x87 instruction set's IEEE 80-bit precision with the choice of either IEEE 32-bit or 64-bit floating-point mathematics. This provides floating-point operations compatible with many other modern CPUs. The SSE and SSE2 instructions have also been extended to operate on the eight new XMM registers. SSE and SSE2 are available in 32-bit mode in modern x86 processors; however, if they're used in 32-bit programs, those programs will only work on systems with processors that have the feature. This is not an issue in 64-bit programs, as all AMD64 processors have SSE and SSE2, so using SSE and SSE2 instructions instead of x87 instructions does not reduce the set of machines on which x64 programs can be run. SSE and SSE2 are generally faster than, and duplicate most of the features of, the traditional x87 instructions, MMX, and 3DNow!.

No-Execute bit: The "NX" bit (bit 63 of the page table entry) allows the operating system to specify which pages of virtual address space can contain executable code and which cannot. An attempt to execute code from a page tagged "no execute" will result in a memory access violation, similar to an attempt to write to a read-only page. This should make it more difficult for malicious code to take control of the system via "buffer overrun" or "unchecked buffer" attacks. A similar feature has been available on x86 processors since the 80286 as an attribute of segment descriptors; however, this works only on an entire segment at a time. Segmented addressing has long been considered an obsolete mode of operation, and all current PC operating systems in effect bypass it, setting all segments to a base address of 0 and a size of 4 GB (4,294,967,296 bytes). AMD was the first x86-family vendor to implement no-execute in linear addressing mode. The feature is also available in legacy mode on AMD64 processors, and recent Intel x86 processors, when PAE is used.

Removal of older features: A number of "system programming" features of the x86 architecture are not used in modern operating systems and are not available on AMD64 in long (64-bit and compatibility) mode. These include segmented addressing (although the FS and GS segments are retained in vestigial form for use as extra base pointers to operating system structures)[5], the task state switch mechanism, and Virtual 8086 mode. These features do of course remain fully implemented in "legacy mode," thus permitting these processors to run 32-bit and 16-bit operating systems without modification.

Virtual address space details Although virtual addresses are 64 bits wide in 64-bit mode, current implementations (and

any chips known to be in the planning stages) do not allow the entire virtual address space of 16 EB (18,446,744,073,709,551,616 bytes) to be used. Most operating systems and applications will not need such a large address space for the foreseeable future (for example, Windows implementations for AMD64 are only populating 16 TB (17,592,186,044,416 bytes), or 44 bits' worth).

AMD therefore decided that, in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup).

Current 48-bit implementation 56-bit implementation 64-bit implementation

http://en.wikipedia.org/wiki/File:AMD64-canonical--48-bit.svg



Advantages of using x64

Compiling of a 64-bit code increases performance

the expected performance growth caused by an ordinary

compilation is 5-15%

Adobe Company claims that new 64-bit "Photoshop CS4" is 12%

faster than its 32-bit version

Some programs dealing with large data arrays may

increase their performance when expanding address

space

Using ptrdiff_t, size_t and derivative types allows to

optimize program code up to 30%.

Support x64 in Visual Studio (1 of 2) /wp64 - Detects 64-bit portability problems on types that are also marked with the

__w64 keyword.

The /Wp64 compiler option and __w64 keyword are deprecated and will be removed in a future version of the compiler.

Instead of using this option and keyword to detect 64-bit portability issues, use a Visual C++ compiler that targets a 64-bit platform. For more information, see 64-Bit Programming with Visual C++.

Support x64 in Visual Studio (2 of 2) How to: Configure Visual C++ Projects

to Target 64-Bit Platforms

Click Configuration Manager to open the

Configuration Manager Dialog Box.

Click the Active Solution Platform list,

and then select the <New…> option to

open the New Solution Platform Dialog

Box.

Click the Type or select the new

platform drop-down arrow, and then

select a 64-bit platform.

Click OK. The platform you selected in the

preceding step will appear under Active

Solution Platform in the Configuration

Manager dialog box.

Click Close in the Configuration

Manager dialog box, and then click OK in

the <Projectname> Property Pages

dialog box.

8

Data modelData Type LP32 ILP32 ILP64 LLP64 LP64

char 8 8 8 8 8

short 16 16 16 16 16

int32 32

int 16 32 64 32 32

long 32 32 64 32 64

long long (int64) 64

pointer 32 32 64 64 64

• ISO/IEC 9899:1990, Programming

Languages - C (ISO C) left the definition of

the short int, the int, the long int, and the

pointer to avoid constraining hardware

architectures that might benefit from

defining these data types independently

from one another.

• The relationship between the

fundamental data types can be

expressed as:

• Notation: int (I), long (L), and pointer (P)

• ILP32 is used in Win32

• LLP64 is used in Win64

• LP64 is used in UNIX

sizeof(char) <= sizeof(short) <= sizeof(int)

<= sizeof(long) = sizeof(size_t)

General issues relating to x64 (1 of 4)

An int and a long are 32-bit values on 64-bit Windows

operating systems.

You should not to assign pointers to 32-bit variables. Pointers

are 64-bit on 64-bit platforms, and you will truncate the pointer

value if you assign it to a 32-bit variable.sizeof(int) = sizeof(long) = sizeof(pointer)

Common assumptions about the relationships

between the fundamental data types may no

longer be valid in a 64-bit data model

General issues relating to x64 (2 of 4) size_t, time_t, and ptrdiff_t (STL) are 64-bit values on 64-bit

Windows operating systems.

size_t n = bigValue;

for(unsigned i = 0; i != n; ++i)

{ ... }

Fine if bigValue <= UINT_MAX

size_t a;

int b = (int)a;

int b = (int)(a);

int b = int(a);

int b = static_cast<int>(a);

Result value will be

truncated

size_t n = bigValue;

unsigned index = 0;

for(size_t i = 0; i != n; ++i)

{

array[index++] = 10;

}

Don’t mix up size_t and

fundamental data types.

unsigned int <= UINT_MAX

Use only size_t because an array

can contains more when

UINT_MAX items

General issues relating to x64 (3 of 4)

The %x (hex int format) printf modifier will not work as

expected on a 64-bit Windows operating system. It will

only operate on the first 32 bits of the value that is

passed to it.

Use %I32x to display an integer on a Windows 32-bit

operating system.

Use %I64x to display an integer on a Windows 64-bit

operating system.

The %p (hex format for a pointer) will work as expected on a

64-bit Windows operating system.

General issues relating to x64 (4 of 4) data alignment

The MyStruct2 structure size equals to 12 bytes in a 32-bit program, and in a 64-bit program, it is only 16 bytes. Therewith, from the point of view of data access efficiency, the MyStruct1 and MyStruct2 structures are equivalent.

common recommendation is the following: the objects should be distributed in descending order of their size.

References

x86-64 (wikipedia)

64-bit Programming (How Do I in Visual C++)

64-Bit Programming with Visual C++

FAQ for Development on 64-bit Windows

Common Visual C++ 64-bit Migration Issues

Viva64, a tool for porting your applications to 64-bit

platforms

Optimization of 64-bit programs

http://en.wikipedia.org/wiki/X64



http://msdn.microsoft.com/en-us/library/ms177550.aspx



http://msdn.microsoft.com/en-us/library/h2k70f3s.aspx



http://msdn.microsoft.com/en-us/isv/bb190527.aspx



http://msdn.microsoft.com/en-us/library/3b2e7499.aspx



http://www.viva64.com/




http://www.codeproject.com/KB/winsdk/Optimization_64_bit.aspx



64 bits for developers

Technology

x86 programs

sse instructions

x86 instruction

amd64 processors

sse2 instructions

modern x86 processors

additional registers

fewer registers