Finally, I wrote a post about [[trivial_abi]]!
This is a new proprietary feature in the Clang trunk, new as of February 2018. This is a vendor extension of the C ++ language, it is not standard C ++, it is not supported by the GCC trunk, and there are no active proposals by WG21 to include it in the C ++ standard, as far as I know.
I did not participate in the implementation of this feature. I just looked at the patches on the cfe-commits mailing list and applauded silently to myself. But this is such a cool feature that I think everyone should know about it.
So, the first thing we will start with: this is not a standard attribute, and the Clang trunk does not support the standard spelling of the [[trivial_abi]] attribute for it. Instead, you should write it in the old style, as shown below:
__attribute__((trivial_abi)) __attribute__((__trivial_abi__)) [[clang::trivial_abi]]
And, since this is an attribute, the compiler is very picky about where you paste it, and passively aggressively silent if you paste it in the wrong place (since unrecognized attributes are simply ignored without messages). This is not a bug, this is a feature. The correct syntax is this:
#define TRIVIAL_ABI __attribute__((trivial_abi)) class TRIVIAL_ABI Widget {
What problem does this solve?
Remember my post on 04/17/2018 where I showed two versions of the class?
Note perev: Since the post of 04/17/2018 has a small volume, I did not publish it separately, but inserted it right here under the spoiler.
post from 04/17/2018 Disadvantages of Missing Trivial Destructor Call
See the C ++ Standard Proposal Mailing List. Which of the two functions, foo or bar, will have the best code generated by the compiler?
struct Integer { int value; ~Integer() {}
Compiling with GCC and libstdc ++. Guess right?
foo: movq 8(%rdi), %rax imull $-559038737, -4(%rax), %edx subq $4, %rax movl %edx, (%rax) movq %rax, 8(%rdi) ret bar: subq $4, 8(%rdi) ret
Here's what happens here: GCC is smart enough to understand that when a destructor for a memory region is started, its lifetime ends, and all previous entries to this memory region are “dead”. But GCC is also smart enough to understand that a trivial destructor (such as the pseudo destructor ~ int ()) does nothing and produces no effects.
So, the bar function calls pop_back, which runs ~ Integer (), which makes vec.back () dead, and GCC completely removes the multiplication by 0xDEADBEEF.
On the other hand, foo calls pop_back, which launches the ~ int () pseudo-destructor (it can completely skip the call, but does not), GCC sees that it is empty and forgets about it. Therefore, GCC does not see that vec.back () is dead, and does not remove the multiplication by 0xDEADBEEF.
This happens for a trivial destructor, but not for a pseudo destructor such as ~ int (). Replace our ~ Integer () {} with ~ Integer () = default; and see how the imull instruction appeared again!
struct Foo { int value; ~Foo() = default;
In that post, the code is given in which the compiler generated code for Foo worse than for Bar. It is worth discussing why this was unexpected. Programmers intuitively expect “trivial” code to be better than “nontrivial” code. This is the case in most situations. In particular, this is the case when we make a function call or return:
template<class T> T incr(T obj) { obj.value += 1; return obj; }
incr
compiles to the following code:
leal 1(%rdi), %eax retq
(leal is the x86
command meaning “add.") We see that our 4-byte obj is passed to incr in the% edi register, and we add 1 to its value and return it to% eax. Four bytes at the input, four bytes at the output, easy and simple.
Now let's look at incr (the case with a nontrivial destructor).
movl (%rsi), %eax addl $1, %eax movl %eax, (%rsi) movl %eax, (%rdi) movq %rdi, %rax retq
Here, obj is not passed in the register, despite the fact that here the same 4 bytes with the same semantics. Here obj is passed and returned to the address. Here the caller reserves some space for the return value and passes us a pointer to this space in rdi, and the caller gives us a pointer for the return value obj in the next register of arguments% rsi. We extract the value from (% rsi), add 1, save it back to (% rsi) to update the value of obj itself, and then (trivially) copy 4 bytes of obj to the slot for the return value pointed to by% rdi. Finally, we copy the original pointer passed by the caller from% rdi to% rax, since the
x86-64 ABI document (p. 22) tells us to do this.
The reason Bar is so different from Foo is because Bar has a nontrivial destructor, and the
x86-64 ABI (p. 19) specifically states:
If a C ++ object has a nontrivial copy constructor or a nontrivial destructor, it is passed through an invisible link (the object is replaced with a pointer [...] in the parameter list)
A later
Itanium C ++ ABI document defines the following:
If the parameter type is nontrivial for the purpose of the call, the caller must allocate a temporary place and pass a link to this temporary place:
[...]
A type is considered nontrivial for the purpose of the call if:
It has a non-trivial copy constructor, a moving constructor, a destructor, or all of its moving and copying constructors are deleted.
So this explains everything: Bar has poorer code generation because it is passed through an invisible link. It is transmitted through an invisible link because an unlucky combination of two independent circumstances occurred:
- ABI document says objects with non-trivial destructor are passed through invisible links
- Bar has a nontrivial destructor.
This is a classic
syllogism : the first point is the main premise, the second is private. As a result, Bar is transmitted through an invisible link.
Let someone give us a syllogism:
- All people are mortal
- Socrates is a man.
- Consequently, Socrates is mortal.
If we want to refute the conclusion “Socrates is mortal,” we must refute one of the premises: either to refute the main thing (perhaps some people are not mortal), or to refute the private (perhaps Socrates is not a man).
In order for Bar to be passed in a register (like Foo), we must refute one of two premises. The standard C ++ path is to give Bar a trivial destructor, destroying the private premise. But there is another way!
How [[trivial_abi]] solves the problem
The new Clang attribute destroys the main premise. Clang extends the ABI document as follows:
If the parameter type is nontrivial for the purpose of the call, the caller must allocate a temporary place and pass a link to this temporary place:
[...]
A type is considered nontrivial for the purpose of the call if it is marked as [[trivial_abi]] and:
It has a non-trivial copy constructor, a moving constructor, a destructor, or all of its moving and copying constructors are deleted.
Even if a class with a nontrivial moving constructor or destructor can be considered trivial for the purpose of the call, if it is marked as [[trivial_abi]].
So now, using Clang, we can write like this:
#define TRIVIAL_ABI __attribute__((trivial_abi)) struct TRIVIAL_ABI Baz { int value; ~Baz() {}
compile incr <Baz>, and get the same code as incr <Foo>!
Warning # 1: [[trivial_abi]] sometimes does nothing
I would hope that we could make “trivial for calling purposes” wrappers over standard library types, like this:
template<class T, class D> struct TRIVIAL_ABI trivial_unique_ptr : std::unique_ptr<T, D> { using std::unique_ptr<T, D>::unique_ptr; };
Alas, this does not work. If your class has any base class or non-static fields, which are “nontrivial for the purpose of the call” by themselves, then the Clang extension in the form in which it is written now makes your class “irreversibly nontrivial” and the attribute will have no effect. (No diagnostic messages are issued. This means that you can use [[trivial_abi]] in the class template as an optional attribute, and the class will be “conditionally trivial”, which is sometimes useful. The disadvantage, of course, is that you can mark the class as trivial, and then find that the compiler quietly fixed it.)
The attribute is ignored without messages if your class has a virtual base class, or virtual functions. In these cases, it may not fit in the registers, and I don’t know what you want by passing it by value, but you probably know.
So, as far as I know, the only way to use TRIVIAL_ABI for “standard utility types”, such as optional <T>, unique_ptr <T> and shared_ptr <T>, is
- implement them yourself from scratch and apply the attribute, or
- break into your local copy of libc ++ and insert the attribute there with your hands
(in the open source world, both methods are essentially the same)
Warning # 2: destructor responsibility
In the example with Foo / Bar, the class has an empty destructor. Let our class actually have a nontrivial destructor.
struct Up1 { int value; Up1(Up1&& u) : value(u.value) { u.value = 0; } ~Up1() { puts("destroyed"); } };
This should be familiar to you, this is unique_ptr <int>, simplified to the limit, with the message printed when deleted.
Without TRIVIAL_ABI, incr <Up1> just looks like incr <Bar>:
movl (%rsi), %eax addl $1, %eax movl %eax, (%rdi) movl $0, (%rsi) movq %rdi, %rax retq
With TRIVIAL_ABI, incr looks
bigger and scarier !
pushq %rbx leal 1(%rdi), %ebx movl $.L.str, %edi callq puts movl %ebx, %eax popq %rbx retq
In the traditional calling convention, types with a non-trivial destructor are always passed by an invisible link, which means that the receiving side (incr in this case) always accepts a pointer to a parameter object without owning this object. The object is owned by the caller. This makes elision work work!
When a type with [[trivial_abi]] is passed in registers, we essentially make a copy of the parameter object.
Since x86-64 has only one register to return (applause), the called function has no way to return the object at the end. The called function must take ownership of the object that we passed to it! This means that the called function must call the destructor of the parameter object when it finishes.
In our previous example, Foo / Bar / Baz, the destructor is called, but it was empty and we did not notice it. Now in incr <Up2> we see additional code that is generated by the destructor on the side of the called function.
It can be assumed that this additional code may be generated in some user cases. But, on the contrary, the call of the destructor does not appear anywhere! It is called in incr because it is
not called in the calling function. And in general, price and benefits will be balanced.
Warning # 3: Destructor Order
The destructor for the parameter with the trivial ABI will be called by the called function, and not the calling one (warning No. 2). Richard Smith points out that this means that it means that he will not be called in the order in which the destructors of the other parameters are located.
struct TRIVIAL_ABI alpha { alpha() { puts("alpha constructed"); } ~alpha() { puts("alpha destroyed"); } }; struct beta { beta() { puts("beta constructed"); } ~beta() { puts("beta destroyed"); } }; void foo(alpha, beta) {} int main() { foo(alpha{}, beta{}); }
This code prints:
alpha constructed beta constructed alpha destroyed beta destroyed
when TRIVIAL_ABI is defined as [[clang :: trivial_abi]], it prints:
alpha constructed beta constructed beta destroyed alpha destroyed
The relation with the “trivially relocatable” / “move-relocates” object (“trivially relocatable” / “move-relocates”)
No relation ..., huh?
As you can see, there are no requirements for the [[trivial_abi]] class to have any specific semantics for the moving constructor, destructor, or default constructor. Any particular class will probably be trivially relocatable, simply because most classes are trivially relocatable.
We can simply make the offset_ptr class so that it is not trivially relocatable:
template<class T> class TRIVIAL_ABI offset_ptr { intptr_t value_; public: offset_ptr(T *p) : value_((const char*)p - (const char*)this) {} offset_ptr(const offset_ptr& rhs) : value_((const char*)rhs.get() - (const char*)this) {} T *get() const { return (T *)((const char *)this + value_); } offset_ptr& operator=(const offset_ptr& rhs) { value_ = ((const char*)rhs.get() - (const char*)this); return *this; } offset_ptr& operator+=(int diff) { value_ += (diff * sizeof (T)); return *this; } }; int main() { offset_ptr<int> top = &a[4]; top = incr(top); assert(top.get() == &a[5]); }
Here is the complete code.
When TRIVIAL_ABI is defined, the Clang trunk passes this test at -O0 and -O1, but at -O2 (i.e. as soon as it tries to inline calls to trivial_offset_ptr :: operator + = and the copy constructor), it crashes on the assert.
So one more warning. If your type does something so crazy with a this pointer, you probably won't want to pass it in registers.
Bug 37319 , in fact, a request for documentation. In this case, it turns out that there is no way to make the code work the way the programmer wants. We say that the value of value_ must depend on the value of the this pointer, but on the border between the calling and the called functions, the object is in registers and the pointer to it does not exist! Therefore, the calling function writes it to memory, and passes the this pointer again, and how should the called function calculate the correct value in order to write it to value_? Perhaps it is better to ask how it generally works at -O0?
This code should not work at all.
So, if you want to use [[trivial_abi]], you should avoid member functions (not just special, but any in general) that depend heavily on the object’s own address (with some indefinite meaning of the word “essential”).
Intuitively, when a class is marked as [[trivial_abi]], whenever you expect to copy, you can get copy plus memcpy. And similarly, when you expect a move, you can actually get the move plus memcpy.
When a type is “trivially relocatable” (as defined by me in
C ++ Now ), then anytime you expect copy and destroy, you can actually get memcpy. And similarly, when you expect relocation and destruction, you can actually get memcpy. In fact, calls to special functions will be lost if we talk about “trivial relocation”, but when the class has the [[trivial_abi]] attribute of Clang, calls are not lost. You just get (as it were) memcpy in addition to the calls you expected. This (sort of) memcpy is the price you pay for a faster, call register convention.
Links for further reading:
Akira Hatanaka's cfe-dev thread from November 2017
Official Clang documentation
The unit tests for trivial_abi
Bug 37319: trivial_offset_ptr can't possibly work