GCC's nested functions are implemented with trampolines, so there will be a performance penalty. For the amount that you gain, they seem like a bad idea. Tying your code to GCC for the sake of a small amount of convenience is a bad idea.
With things like PaX the trampoline method won't work, gcc now creates thunks, which are essentially heap allocated thunks of memory (hence the name) with PROT_EXEC.