I don't understand your complaints. You clearly have some task you have in mind that you wish to perform: why not tell me what it is?
> Please show a code example of changing European to African in this sentence in your language of choice, working on the bytes in any multi-byte encoding:
מהי מהירות האווירית של סנונית ארופאית ללא משא?
I don't see the string 'European' in that sentence, it seems to be solely comprised of Hebrew characters.
edit to attempt to answer your question:
struct m {
pos_t start;
pos_t end;
}
int findsn(char* str, char* substr, match m) {
next: for( int c_i = 0; c_i++; s[c_i] != '\0' ) {
match.start = c_i;
int s_i = 0;
for( ; s_i++; substr[s_i] != '\0' ) {
if( str[c_i] != substr[s_i] ) goto next;
}
match.end = c_i + s_i;
return 1;
}
return 0;
}
char* replacesn(char* str, char* needle, char* rpl) {
match m;
if( findsn(str, needle, &m) ) {
splicesn(str, m.start, m.end, rpl);
}
return str;
}
splicesn should be obvious, and you normalise your strings before calling replacesn. This is just me crappily re-implementing a fraction of the wchar API without checking MSDN.
edit 2:
> Is each application to maintain their own dictionary of code points?
No, you use the system/standard library for composing/decomposing/normalising codepoints.
> If the map is to be in a library, then why not have it in the language itself?
Why not indeed? What a great idea.