Close ad

Mike Ash dedicated on his blog the practical implications of switching to 64-bit architecture in the iPhone 5S. This article draws on his findings.

The reason for this text is mainly due to the large amount of misinformation being spread about what the new iPhone 5s with a 64-bit ARM processor actually means for users and the market. Here we will try to bring objective information about the performance, capabilities and implications of this transition for developers.

"64 bit"

There are two parts of a processor that the "X-bit" label can refer to - the width of the integer registers and the width of the pointers. Fortunately, on most modern processors these widths are the same, so in the case of the A7 this means 64-bit integer registers and 64-bit pointers.

However, it is equally important to point out what "64bit" does NOT mean: RAM physical address size. The number of bits to communicate with RAM (thus the amount of RAM a device can support) is not related to the number of CPU bits. ARM processors have anywhere between 26- and 40-bit addresses and can be changed independently of the rest of the system.

  • Data bus size. The amount of data received from RAM or buffer memory is similarly independent of this factor. Individual processor instructions may request different amounts of data, but they are either sent in chunks or received more than needed from memory. It depends on the size of the data quantum. The iPhone 5 already receives data from the memory in 64-bit quanta (and has a 32-bit processor), and we can encounter sizes up to 192 bits.
  • Anything related to floating point. The size of such registers (FPU) are again independent of the internal workings of the processor. ARM has been using 64-bit FPU since before ARM64 (64-bit ARM processor).

General advantages and disadvantages

If we compare otherwise identical 32bit and 64bit architectures, they are generally not that different. This is one of the reasons for the general confusion of the public looking for a reason why Apple is moving to 64bit in mobile devices as well. However, it all comes from the specific parameters of the A7 (ARM64) processor and how Apple uses it, not just from the fact that the processor has a 64-bit architecture.

However, if we still look at the differences between these two architectures, we will find several differences. The obvious one is that 64-bit integer registers can handle 64-bit integers more efficiently. Even before, it was possible to work with them on 32-bit processors, but this usually meant dividing them into 32-bit long pieces, which caused slower calculations. So a 64-bit processor can generally compute with 64-bit types just as fast as with 32-bit ones. This means that applications that generally use 64-bit types can run much faster on a 64-bit processor.

Although 64bit does not affect the total amount of RAM that the processor can use, it can make it easier to work with large chunks of RAM in one program. Any single program running on a 32-bit processor only has about 4 GB of address space. Taking into account that the operating system and standard libraries take up something, this leaves the program with somewhere between 1-3 GB for application use. However, if a 32-bit system has more than 4 GB of RAM, using that memory is a bit more complicated. We have to resort to forcing the operating system to map these larger chunks of memory for our program (memory virtualization), or we can split the program into multiple processes (where each process again theoretically has 4GB of memory available for direct addressing).

However, these "hacks" are so difficult and slow that a minimum of applications use them. In practice, on a 32-bit processor, each program will only use its 1-3 GB of memory, and more available RAM can be used to run multiple programs at the same time or use this memory as a buffer (caching). These uses are practical, but we'd like any program to be able to easily use chunks of memory larger than 4GB.

Now we come to the frequent (actually incorrect) claim that without more than 4GB of memory, a 64-bit architecture is useless. A larger address space is useful even on a system with less memory. Memory-mapped files are a handy tool where part of the file's contents are logically linked to the process's memory without the entire file having to be loaded into memory. Thus, the system can, for example, gradually process large files many times larger than the RAM capacity. On a 32-bit system, such large files cannot be reliably memory-mapped, whereas on a 64-bit system, it is a piece of cake, thanks to the much larger address space.

However, the larger size of pointers also brings one big disadvantage: otherwise identical programs need more memory on a 64-bit processor (these larger pointers have to be stored somewhere). Since pointers are a frequent part of programs, this difference can burden the cache, which in turn causes the entire system to run slower. So in perspective, we can see that if we just changed the processor architecture to 64-bit, it would actually slow down the whole system. So this factor has to be balanced by more optimizations in other places.

ARM64

The A7, the 64-bit processor powering the new iPhone 5s, isn't just a regular ARM processor with wider registers. ARM64 contains major improvements over the older, 32-bit version.

Apple A7 processor.

registry

ARM64 holds twice as many integer registers as 32-bit ARM (be careful not to confuse the number and width of registers - we talked about width in the "64-bit" section. So ARM64 has both twice as wide registers and twice as many registers). The 32-bit ARM has 16 integer registers: one program counter (PC - contains the number of the current instruction), a stack pointer (a pointer to a function in progress), a link register (a pointer to the return after the end of the function), and the remaining 13 are for application use. However, the ARM64 has 32 integer registers, including one zero register, a link register, a frame pointer (similar to a stack pointer), and one reserved for the future. This leaves us with 28 registers for application use, more than double the 32-bit ARM. At the same time, the ARM64 doubled the number of floating-point number (FPU) registers from 16 to 32 128-bit registers.

But why is the number of registers so important? Memory is generally slower than CPU calculations and reading/writing can take a very long time. This would make the fast processor have to keep waiting for memory and we would hit the natural speed limit of the system. Processors try to hide this handicap with layers of buffers, but even the fastest one (L1) is still slower than the processor's calculation. However, registers are memory cells directly in the processor and their reading/writing is fast enough to not slow down the processor. The number of registers practically means the amount of the fastest memory for processor calculations, which greatly affects the speed of the entire system.

At the same time, this speed needs good optimization support from the compiler so that the language can use these registers and does not have to store everything in the general application (the slow) memory.

Instruction set

ARM64 also brings major changes to the instruction set. An instruction set is a set of atomic operations that a processor can perform (eg 'ADD register1 register2' adds the numbers in two registers). The functions available to individual languages ​​are composed of these instructions. More complex functions must execute more instructions, so they can be slower.

New in ARM64 are instructions for AES encryption, SHA-1 and SHA-256 hash functions. So instead of a complex implementation, only the language will call this instruction - which will bring a huge speedup to the computation of such functions and hopefully added security in applications. E.g. the new Touch ID also uses these instructions in encryption, allowing for real speed and security (in theory, an attacker would have to modify the processor itself to access the data - which is impractical to say the least given its miniature size).

Compatibility with 32bit

It is important to mention that the A7 can run fully in 32-bit mode without the need for emulation. It means that the new iPhone 5s can run applications compiled on 32-bit ARM without any slowdown. However, then it cannot use the new ARM64 functions, so it is always worthwhile to make a special build just for the A7, which should run much faster.

Runtime changes

Runtime is the code that adds functions to the programming language, which it is able to use while the application is running, until after translation. Since Apple doesn't need to maintain application compatibility (that a 64-bit binary runs on 32-bit), they could afford to make a few more improvements to the Objective-C language.

One of them is the so-called tagged pointer (marked pointer). Normally, objects and pointers to those objects are stored in separate parts of memory. However, new pointer types allow classes with little data to store objects directly in the pointer. This step eliminates the need to allocate memory directly for the object, just create a pointer and the object inside it. Tagged pointers are only supported in 64-bit architecture also due to the fact that there is no longer enough space in a 32-bit pointer to store enough useful data. Therefore, iOS, unlike OS X, did not yet support this feature. However, with the arrival of ARM64, this is changing, and iOS has caught up with OS X in this regard as well.

Although pointers are 64 bits long, on the ARM64 only 33 bits are used for the pointer's own address. And if we are able to reliably unmask the rest of the pointer bits, we can use this space to store additional data – as in the case of the mentioned tagged pointers. Conceptually, this is one of the biggest changes in the history of Objective-C, although it is not a marketable feature - so most users will not know how Apple is moving Objective-C forward.

As for the useful data that can be stored in the remaining space of such a tagged pointer, Objective-C, for example, is now using it to store the so-called reference count (number of references). Previously, the reference count was stored in a different place in memory, in a hash table prepared for it, but this could slow down the whole system in the case of a large number of alloc/dealloc/retain/release calls. The table had to be locked due to thread safety, so the reference count of two objects in two threads could not be changed at the same time. However, this value is newly inserted into the rest of the so-called isa indicators. This is another inconspicuous, but huge advantage and acceleration in the future. However, this could never be achieved in a 32-bit architecture.

Information about associated objects, whether the object is weakly referenced, whether it is necessary to generate a destructor for the object, etc., is also newly inserted into the remaining place of pointers to the objects. Thanks to this information, the Objective-C runtime is able to fundamentally speed up the runtime, which is reflected in the speed of each application. From testing, this means about 40-50% speedup of all memory management calls. Just by switching to 64-bit pointers and using this new space.

záver

Although competitors will try to spread the idea that moving to a 64-bit architecture is unnecessary, you will already know that this is just a very uninformed opinion. It's true that switching to 64-bit without adapting your language or applications doesn't really mean anything - it even slows down the entire system. But the new A7 uses a modern ARM64 with a new instruction set, and Apple has taken the trouble to modernize the entire Objective-C language and take advantage of the new capabilities - hence the promised speedup.

Here we have mentioned a large number of reasons why a 64-bit architecture is the right step forward. It is another revolution "under the hood", thanks to which Apple will try to stay at the forefront not only with design, user interface and rich ecosystem, but mainly with the most modern technologies on the market.

Source: mikeash.com
.