#!/usr/bin/perl
use strict;
use warnings;
use Encode qw( decode FB_QUIET );
binmode STDIN, ':bytes';
binmode STDOUT, ':encoding(UTF-8)';
my $out;
while ( <> ) {
$out = '';
while ( length ) {
$out .= decode( "utf-8", $_, FB_QUIET );
$out .= decode( "iso-8859-1", substr( $_, 0, 1 ), FB_QUIET ) if length;
}
print $out;
}The problem is that Perl internally encodes strings as sequences of numbers. Not even sequences of bytes, but sequences of numbers that could either be codepoints or bytes resulting from the encoding of such a sequence of codepoints. ...as a developer you are perfectly free to make this assumption any way you please at any given point in your codebase. It's not even clear that any one of those two is particularly "preferred" at large or a best practice or anything like that.
To make things worse, there is no way to know which is which, i.e. a string itself is happily ignorant about the assumptions that people will/should make about it. And Perl will happily concatenate strings making different kinds of assumptions, or double- or triple-encode them as you please, or decode something that hasn't been encoded in the first place.
This leads to jumbles of numbers that aren't anything in particular. They simply work well enough for sloppy programmers to not realize when they are making mistakes, but badly enough to almost guarantee that encoding errors will crop up on users' screens regularly.
Now, given that this is how the language works, be my guest jumping into a 100k loc Perl codebase that dozens of programmers have touched over a decade, passing around and munging together strings not just within their own codebase, but also using strings stored to and retrieved from elsewhere, in some case places where no one knows anymore where they initially came from or where they will ultimately go to.
I don’t care about hard problems, and easy problems.
But it's a very different niche. Perl and Ruby scale to mid sized applications quite well, but above that fault tolerance and QoS become hard.