The OP question was educational to me. Previously I assumed (rather naively) that rip relative relocations on x64 pretty much enable position independence even without "-fpic". It turns out that calls and accesses are covered, but address arithmetic is not.
One of the errors here is due to address subtraction in crtstuff.c. There is branch condition which involves the offset of one symbol from another symbol in the same section. The subtraction is between absolute addresses. It is true that the result could be made identical if rip relative relocations were used, but the compiler needs to work harder to infer this. If crtbegin.o is recompiled with fpic, it could fix some of the errors.
And another surprise for me. Normally, fpic uses indirection through GOT, which is not important performance penalty here, but is still redundant. The linker however has an optimization that eliminates the indirection in an executable and actually replaces a mov opcode with a lea opcode. It is described in the AMD64 ABI draft that is linked in this LLVM
ticket.