This is a follow on post to A look inside blocks: Episode 1 in which I looked into the innards of blocks and how the compiler sees them. In this article I take a look at blocks that are not constant and how they are formed on the stack.
In the first article we saw the block have a class of
_NSConcreteGlobalBlock. The block structure and descriptor were both fully initialised at compile time since all variables were known. There are a few different types of block, each with their own associated class. However for simplicities sake, we just need to consider 3 of them:
_NSConcreteGlobalBlockis a block defined globally where it is fully complete at compile time. These blocks are those that don’t capture any scope such as an empty block.
_NSConcreteStackBlockis a block located on the stack. This is where all blocks start out before they are eventually copied onto the heap.
_NSConcreteMallocBlockis a block located on the heap. After copying a block, this is where they end up. Once here they are reference counted and freed when the reference count drops to zero.
A block that captures scope
This time we’re going to look at the following bit of code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
The function called
foo is just there so that the block captures something, by having a function to call with a captured variable. Once again, we look at the armv7 assembly produced, relevant bits only:
1 2 3 4 5 6 7
First of all the
runBlockA function is the same as before. It’s calling the
invoke function of the block. Then onto
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Well this is very different to before. Instead of seeing a block get loaded from a global symbol, it looks like a lot more work is being done. It might look daunting, but it’s pretty easy to see what’s going on. It’s probably best to consider the function rearranged, but believe me that this doesn’t alter anything functionally. The reason the compiler has emitted the instructions in the order it has is for optimisation to reduce pipeline bubbles, etc. So, rearranged the function looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
This is what that is doing:
r7is pushed onto the stack because it’s going to get overwritten and is a register which must be preserved across function calls.
lris the link register and contains the address of the next instruction to execute when this function returns. See the function epilogue for more on that. Also, the stack pointer is saved into
Subtract 24 from the stack pointer. This makes room for 24 bytes of data to be stored in stack space.
This little block of code is doing a lookup of the
L__NSConcreteStackBlock$non_lazy_ptrsymbol, relative to the program counter such that it works wherever the code may end up in the binary when finally linked. The value is then stored to the address of the stack pointer.
1073741824is stored to the stack pointer + 4.
0is stored to the stack pointer + 8. By now it may be becoming clear what’s going on. A
Block_layoutstructure is being created on the stack! Up until now there’s the
reservedvalues being set.
The address of
___doBlockA_block_invoke_0is stored at the stack pointer + 12. This is the
invokeparameter of the block structure.
The address of
___block_descriptor_tmpis stored at the stack pointer + 16. This is the
descriptorparameter of the block structure.
128is stored at the stack pointer + 20. Ah. If you look back at the
Block_layoutstruct you’ll see that there’s only 5 values in it. So what is this being stored after the end of the struct then? Well, you’ll notice that the value is
128which is the value of the variable captured in the block. So this must be where blocks store values that they use – after the end of the
The stack pointer, which now points to a fully initialised block structure is put into
runBlockAis called. (Remember that
r0contains the first argument to a function in the ARM EABI).
Finally the stack pointer has 24 added back to it to balance out the subtraction at the start of the function. Then 2 values are popped off the stack into
r7balances the push from the prologue and the
pcwill now get the value that was in
lrwhen the function began. This effectively performs the return of the function as it sets the CPU to continue executing (the
pc, program counter) from where the function was told to return to,
lrthe link register.
Wow! You still with me? Brilliant!
The final bit of this little section is to check what the invoke function and the descriptor look like. We would expect them to be not much different to the global block from episode 1. Here they are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
And yep, there’s not much difference really. The only difference is the
size parameter of the block descriptor. It’s now
24 rather than
20. This is because there’s an integer value captured by the block and so the block structure is 24 bytes rather than the standard 20. We saw the extra bytes being added to the end of the structure when it was created.
Also in the actual block function, i.e.
__doBlockA_block_invoke_0, you can see the value being read out of the end of the block structure, i.e.
r0 + 20. This is the variable captured in the block.
What about capturing object types?
The next thing to consider is what if instead of capturing an integer, it was an object type such as an
NSString. To see what happens there, consider the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
I won’t go into the details of
doBlockA because that doesn’t change much. What is interesting is the block descriptor structure that’s created:
1 2 3 4 5 6 7 8 9
Notice there are pointers to functions called
___destroy_helper_block_. Here are the definitions of those functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I assume these functions are what gets run when blocks are copied and destroyed. They must be retaining and releasing the object that was captured by the block. It looks like the copy function takes 2 parameters as both
r1 are addressed as if they contain valid data. The destroy function looks like it just takes 1. All of the hard work looks like it’s done by
_Block_object_dispose. The code for that is within the block runtime code, part of the
compiler-rt project within LLVM.
If you want to go away and have a read of the code for the blocks runtime then take a look at the source which can be downloaded from http://compiler-rt.llvm.org. In particular,
runtime.c is the file to look at.
In the next episode I shall take a look into the blocks runtime by investigating the code for
Block_copy and see just how that does its business. This will give an insight into the copy and destroy helper functions we’ve just seen get created for blocks that capture objects.