objc_msgSend
is a hand optimized, magical, function of the Objective-C
runtime. It is how messages are implemented. In
order to consume it’s API in C, a user must select the correct message send
variant and explicitly cast the function. The compiler can abstract the machine
details away with the information provided by static typing. While implementing
the message send API, I found a bit of compile time magic in the C and
Objective-C implementations.
Cupertino.js, a “JavaScript to
Cocoa compiler” I’ve been excited about lately, uses objc_msgSend
to dispatch
call invocations. Since the JavaScript runtime is implemented on top of the Objective-C
runtime, all variants of message send need to be supported.
Armed with the public runtime header for guidance:
* Sends a message with a data-structure return value to an instance of a class.
*
* @see objc_msgSend
*/
OBJC_EXPORT void objc_msgSend_stret(id self, SEL op, ...)
I derived a broken prototype:
Much like the C , it declared a pointer to objc_msgSend_stret
and casted it to
the proper type. It was invoked with the target, selector, and other arguments
like objc_msgSend
. It accessed the return value like objc_msgSend
.
My test device was a 32 bit X86 machine and the message send worked great for
most cases. Things got interesting when I started testing calls to -[UIScreen
bounds]
, which return a struct CGRect
. The test machine had a 32 bit arm
cpu, formally armv7. It was obvious the implementation was wrong after
objc_msgSend_stret
exploded.
So I coded a msgSend_stret
in Objective-C.
UIView *view = [UIView new];
CGRect frame = view.frame;
Here’s part of the ARM (armv7) assembly Clang emitted for the above code:
movw r0, :lower16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC2_2+4))
Ltmp9:
mov r1, r4
Ltmp10:
movt r0, :upper16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC2_2+4))
LPC2_2:
add r0, pc
ldr r2, [r0]
mov r0, sp
blx _objc_msgSend_stret
The standard arguments are shifted over by one!
The standard argument registers are: r0
, r1
, r2
, and r4
. The stack
pointer, which is used for the struct variable, is at r0
, the UIView
instance view
is at r1
, and the selector frame
at r3
. More specifically,
directly before objc_msgSend_stret
is called, the stack pointer is moved to
r1
. Up a few lines, the selector is loaded in r2
and the local variable view
is moved to r1
.
From the perspective of the C API, this seems odd. However, in practice, the API
is intuitive. The temporary struct frame
is just a stack variable. Its
lifetime is the scope of this frame. As a result, its memory location must be
passed to the function.
When it comes to the Objective-C runtime, there is no better reference than a
working implementation. I read the message send implementation in Clang. There’s
a comment that explains the proper use of objc_msgSend
. Take a look at
CGObjCMac.cpp
:
/// void objc_msgSend_stret (id, SEL, ...)
///
/// The messenger used when the return value is an aggregate returned
/// by indirect reference in the first argument, and therefore the
/// self and selector parameters are shifted over by one.
There are other nuances of implementation objc_msgSend
. Not all architectures
use objc_msgSend_stret
for struct returns and some conditionally use
objc_msgSend
for depending on the struct size.
It’s instructive to read the logic of CGObjCCommonMac::EmitMessageSend
,
specifically this bit:
llvm::Constant *Fn = nullptr;
if (CGM.ReturnSlotInterferesWithArgs(MSI.CallInfo)) {
if (!IsSuper) nullReturn.init(CGF, Arg0);
Fn = (ObjCABI == 2) ? ObjCTypes.getSendStretFn2(IsSuper)
: ObjCTypes.getSendStretFn(IsSuper);
} else if (CGM.ReturnTypeUsesFPRet(ResultType)) {
Fn = (ObjCABI == 2) ? ObjCTypes.getSendFpretFn2(IsSuper)
: ObjCTypes.getSendFpretFn(IsSuper);
} else if (CGM.ReturnTypeUsesFP2Ret(ResultType)) {
Fn = (ObjCABI == 2) ? ObjCTypes.getSendFp2RetFn2(IsSuper)
: ObjCTypes.getSendFp2retFn(IsSuper);
} else {
// arm64 uses objc_msgSend for stret methods and yet null receiver check
// must be made for it.
if (!IsSuper && CGM.ReturnTypeUsesSRet(MSI.CallInfo))
nullReturn.init(CGF, Arg0);
Fn = (ObjCABI == 2) ? ObjCTypes.getSendFn2(IsSuper)
: ObjCTypes.getSendFn(IsSuper);
}
Conclusion
The runtime’s nice user API is quite deceptive: There’s magic happening
under the hood for objc_msgSend
! It’s a sensible design to allocate stack
memory for this variable and function of the prolog. To use the C API, the user
must cast the function to the correct type because, the compiler needs the type
information to allocate the correct amount of memory and select a messenger.
Higher level languages abstract machine away. Static typing is required for
objc_msgSend
in C and Objective-C.
When in doubt, use the source Luke.
It’s worth noting that official runtime documentation is quite useful.
objc_msgSend
and its many variants are well documented:
Sends a message with a data-structure return value to an instance of a class.
void objc_msgSend_stret(void * stretAddr, id theReceiver, SEL theSelector, ...)
Conclusion Pt 2: What would this look like in JavaScript?
Cupertino.js introduces a “cast operator” to add static typing to JavaScript function calls. The cast operator is named after the corresponding type and syntactically works like a function call:
var frame = CGRect(view.frame)
This provides enough information for the compiler to allocate the local variable and select the correct messenger.
The cast operator is necessary becase the static typing information provided by
-[UIView frame]
isn’t enough. In a completely dynamic language it would be
incorrect to assume that all methods with the name frame
return a CGRect. A
method could legally have the name frame
but not return a struct.
If you spotted any errors in this post, please let me know!