perhaps you missed one of arguments I mentioned before. Yes, indeed, the compiler is not required to insert the symbol reference and is able to inline the code. But the symbols must still be present, at least to let legacy/other-vendor-implementation code work. Please, refer to the documentation on atomics for gcc and clang. If I'm not mistaken, gcc keeps these symbols at libatomic; I'm not sure of the place where clang does, though I recall reading that clang supports these calls as well. Also, this is the place where all emulated stuff goes (non-native atomics). So attempts to have some generic routine which takes size are insufficient anyway, and you _have_ to keep all these suffixed versions.
Also, I don't believe you can get clear and evident error messages, instead of rather unhelpful stuff macros usually throw at you. You lose an opportunity to optimize the implementation, be it needed: what if you _have_ to inline the code for some architecture, otherwise things won't work? All the complexity is moved at the user-visible level, and I don't agree this is a good idea. And things are especially tricky since you suggest to have a custom-behaved _Generic use, and I have not seen such usage in the wild; so when things go wrong, the end user is inevitably forced to investigate on their own what's that and how it works. Compare this with semantics which is documented for others and at least somewhat expected.
So, whilst the duplication argument is the only one that seems convincing, I don't see other options which cover needs exactly as efficient and offer a compatible flexibility. Perhaps we could decide how the code duplication in the implementation can be decreased? I admit that I don't see _that_ much duplication, though: it's a function call with more checks and more options (e.g. supporting "compatible" types, telling the exact problem with arguments and so on).