undefined | Better HN

0 pointslupire3y ago0 comments

"Perl makes easy things easy and hard things possible."

0 comments

Getting strings to have the right encodings should be easy. On the last Perl codebase I touched it's proven impossible for all practical intents and purposes.

wazoox3y ago

It's markedly easier than with Python, though. Here's a short script that will recode a file with mixed iso-8859-1 and utf8 data into proper utf8:

    #!/usr/bin/perl
    use strict;
    use warnings;
    
    use Encode qw( decode FB_QUIET );
    
    binmode STDIN, ':bytes';
    binmode STDOUT, ':encoding(UTF-8)';
    
    my $out;
    
    while ( <> ) {
        $out = '';
        while ( length ) {
            $out .= decode( "utf-8", $_, FB_QUIET );
            $out .= decode( "iso-8859-1", substr( $_, 0, 1 ), FB_QUIET ) if length;
        }
        print $out;
    }

gyulai3y ago

Thanks for posting the happily ignorant code snippet that I have been waiting for.

The problem is that Perl internally encodes strings as sequences of numbers. Not even sequences of bytes, but sequences of numbers that could either be codepoints or bytes resulting from the encoding of such a sequence of codepoints. ...as a developer you are perfectly free to make this assumption any way you please at any given point in your codebase. It's not even clear that any one of those two is particularly "preferred" at large or a best practice or anything like that.

To make things worse, there is no way to know which is which, i.e. a string itself is happily ignorant about the assumptions that people will/should make about it. And Perl will happily concatenate strings making different kinds of assumptions, or double- or triple-encode them as you please, or decode something that hasn't been encoded in the first place.

This leads to jumbles of numbers that aren't anything in particular. They simply work well enough for sloppy programmers to not realize when they are making mistakes, but badly enough to almost guarantee that encoding errors will crop up on users' screens regularly.

Now, given that this is how the language works, be my guest jumping into a 100k loc Perl codebase that dozens of programmers have touched over a decade, passing around and munging together strings not just within their own codebase, but also using strings stored to and retrieved from elsewhere, in some case places where no one knows anymore where they initially came from or where they will ultimately go to.

1 more reply

fijiaarone3y ago

What we need from a programming language is to make medium complexity things, at worst, medium difficulty.

I don’t care about hard problems, and easy problems.

nextos3y ago

Erlang/OTP does medium difficulty things, i.e. very large applications with good fault tolerance and QoS, really well.

But it's a very different niche. Perl and Ruby scale to mid sized applications quite well, but above that fault tolerance and QoS become hard.

cutler3y ago

I think with languages like Perl, Ruby and Python you just need a static, compiled language to migrate to at a certain scale, preferably with similar features. Kotlin and Scala seem to be currently the best options for Ruby, Python and OO Perl. For procedural Perl maybe Golang.

2 more replies

j / k navigate · click thread line to collapse

0 comments

gyulai3y ago

Getting strings to have the right encodings should be easy. On the last Perl codebase I touched it's proven impossible for all practical intents and purposes.

wazoox3y ago

It's markedly easier than with Python, though. Here's a short script that will recode a file with mixed iso-8859-1 and utf8 data into proper utf8:

    #!/usr/bin/perl
    use strict;
    use warnings;
    
    use Encode qw( decode FB_QUIET );
    
    binmode STDIN, ':bytes';
    binmode STDOUT, ':encoding(UTF-8)';
    
    my $out;
    
    while ( <> ) {
        $out = '';
        while ( length ) {
            $out .= decode( "utf-8", $_, FB_QUIET );
            $out .= decode( "iso-8859-1", substr( $_, 0, 1 ), FB_QUIET ) if length;
        }
        print $out;
    }

gyulai3y ago

Thanks for posting the happily ignorant code snippet that I have been waiting for.

1 more reply

fijiaarone3y ago

What we need from a programming language is to make medium complexity things, at worst, medium difficulty.

I don’t care about hard problems, and easy problems.

nextos3y ago

Erlang/OTP does medium difficulty things, i.e. very large applications with good fault tolerance and QoS, really well.

But it's a very different niche. Perl and Ruby scale to mid sized applications quite well, but above that fault tolerance and QoS become hard.

cutler3y ago

2 more replies

j / k navigate · click thread line to collapse