Go enjoy Python3 (opens in new tab)

(blog.surgut.co.uk)

106 pointscrncosta10y ago66 comments

66 comments

There are several ways to solve this in Go. The first that comes to mind, assuming you want to truncate to the first 12 runes, not bytes:

        func main() {
            v := []rune(os.Args[1])
            if len(v) > 12 {
                v = v[:12]
            }
            fmt.Println(string(v))
        }

Or more in the spirit of the C example in the post:

        func main() {
                res := make([]rune, 12)
                copy(res, []rune(os.Args[1]))
                fmt.Println(string(res))
        }

Note that res will stay on the stack, just like C.

I expect the author is trying to say something about Go that I'm not quite getting. Perhaps that it is not an expression-based language, so to make code readable you need to make use of multiple statements. That's by design, but I understand it may be unappealing if you want to program in an expression-heavy style.

jerf10y ago

"I expect the author is trying to say something about Go that I'm not quite getting."

I assume "Go sucks because, look, this one weird case is a bit ugly." (that is, as rhetoric, not dialectic; it is not literally claiming "one case bad" -> "Go is bad" in the logical sense.) A weird case that I've programmed many thousands of lines of Go code in but never once encountered. Taking a slice out of a string blind like that is actually a bit rare; usually in some way it turns out you actually have length information somewhere in the environment. It's hardly like "slice index out of bounds" is some sort of terrible error... it is, at least, arguable that Python is in the wrong here for being so willing to return a string generated by [0:12] that is not 12 bytes/characters in length, which seems like a reasonable assumption to make of such an operation.

Now, if we want to talk about little examples like this, let's talk about sending on something like a channel in Python, to say nothing of Python's implementation of the "go" keyword... oh, yes, I see, suddenly this is an unfair way to compare languages.

Yes, it is.

bsaul10y ago

This posts shows two very common issues that programmer have with the GO language when they start using it (that includes me), especially since go is advertised as compiled with the feeling of a dynamic language :

A low-level feeling when manipulating arrays (or slice), and a poor support for generic functions ( that would be math.min in this example).

1 more reply

rdtsc10y ago

> , let's talk about sending on something like a channel in Python:

  import Queue; q=Queue.Queue(); q.put(1)

> to say nothing of Python's implementation of the "go" keyword...

Why would Python have a go keyword? Go doesn't have the "except" keyword that Python has, not sure what the point it?

pekk10y ago

Go is frequently presented as a replacement of Python. When people hear that, it sets up an expectation that Go will have the same pleasant qualities of Python, when it doesn't, any more than Python has goroutines.

1 more reply

Jabbles10y ago

fmt.Printf("%.12s", os.Args[1])

johannesboyne10y ago

+1 for simplicity

pjmlp10y ago

I assume it has to do with Unicode support.

masklinn10y ago

> Simple enough, in essence given first argument, print it up to length 12. As an added this also deals with unicode correctly

That's not true, Python 3 uses codepoint-based indexing but it will break if combining characters are involved. For instance:

    > python3 test.py देवनागरीदेवनागरी
    देवनागरीदेवन

because there is no precombined version of the multi-codepoint grapheme clusters so some of these 10 user-visible characters takes more than a single you end up with 8 user-visible characters rather than the expected 10.

edit: the original version used the input string "ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇȋȏȗ" where clusters turn out to have precomposed versions after all. Replaced it by devanāgarī repeated once (in the devanāgarī script)

Veedrac10y ago

The easy Python way:

    import sys
    import regex
    print(regex.match("\X{,12}", sys.argv[1]).group())

with the regex[1] package that should be in the stdlib Any Day Now™.

[1]: https://pypi.python.org/pypi/regex

Spiritus10y ago

Interesting, I had no idea the `re` module was getting revamped. Scheduled for 3.5 or later?

1 more reply

stevenbedrick10y ago

Yup. A long time ago, while working on a project with some particularly gnarly Unicode issues, I got in the habit of thinking in terms of grapheme clusters instead of code points (or "characters", for whatever definition of "character" one wishes to use), and it has served me very well. Combining characters pop up in the most interesting places, often where and when you least expect them! ٩(•̃̾●̮̮̃̾•̃̾)۶

Ruby's unicode_utils gem has a nice implementation of the standard grapheme cluster segmentation algorithm, and Python's wrapper around ICU works quite well. Go's concept of runes is certainly an improvement, but it doesn't handle combining characters out of the box...

masklinn10y ago

> Combining characters pop up in the most interesting places, often where and when you least expect them! ٩(•̃̾●̮̮̃̾•̃̾)۶

The good news is Unicode 8 will make them way more frequent! (alternate emoji skin colors are specified via combining characters) much as Unicode 6 made astral characters way more "in your face" (by standardising emoji in the SMP)

hahainternet10y ago

That's a shame, it works as you'd expect in perl6:

  sub MAIN($s) { say $s.substr(0,12) }

  $ perl6 test.p6 ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇȋȏȗ
  ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇ

masklinn10y ago

Turns out there are precomposed versions of these clusters, so your system might just be using these.

Could you retry with the input "देवनागरीदेवनागरी"?

1 more reply

bmn_10y ago

Languages that cannot deal with graphemes are lame. I daresay this solution below should score 20 in OP's imaginary scale.

    $ perl -CADS -E'say $ARGV[0] =~ /(\X{5})/' देवनागरीदेवनागरी
    देवनागरी

Length of input string is: 10 graphemes, 16 codepoints, 48 octets (UTF-8).

Length of output string is: 5 graphemes, 8 codepoints, 24 octets (UTF-8).

flohofwoe10y ago

Doesn't the C version have a serious bug? If the input string has 12 or more characters, the destination string will not be zero-terminated.

From the strncpy docs:

"No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case, destination shall not be considered a null terminated C string (reading it as such would overflow)."

ansible10y ago

I'm usually sticking +1s to the storage for any strings for this purpose. So if I want to operate on MAXLEN number of characters, I'll allocation MAXLEN+1 for the character array.

And often times I'll be memset()'ing the destination to all NULLs when doing a string copy operation. I'm not real happy with string handling in C... as if that should be surprising to anyone.

Say, is there nice, small, suitable for embedded use string library anyone would care to recommend in C? I just want a nice string type that carries around its length and storage length, handles copies properly, and has the usual utilities. I suppose I could just write one...

rch10y ago

You might look at the one from Redis:

https://github.com/antirez/sds

1 more reply

Ianvdl10y ago

The author awards some arbitrary points to C even though his implementation of the solution is broken. His similarly poor Go implementation receives zero of these arbitrary points.

Why does this deserve the attention of everyone here? The author did not compare languages, he compared his aptitude with these languages, and considered broken implementations to somehow be comparable.

A more meaningful comparison would be to implement simple, efficient, working solutions to these problems and comparing them. This, as it stands, does not lead to any useful discussion.

BinaryIdiot10y ago

I'm not sure what the takeaway is from this blog entry. Is it that Python 3 can do substrings easier than the other languages therefore we should use Python 3? That was what I thought it was, anyway.

Seems silly to pick a language based off this single, silly criteria otherwise why not JavaScript or probably other languages that can make the code even smaller?

console.log(mystring.substring(0, 12));

So it just seems arbitrary and weak in my opinion.

steeleduncan10y ago

The entire scenario seems to have been constructed to highlight the runtime panic caused by out of bounds slices in Go. Either that or the well-known and well-discussed lack of generics.

_kst_10y ago

There are at least three major flaws in the 7-line C program, even ignoring character set issues. (main returns int, argv[1] can be null, and strncpy doesn't always null-terminate the target). If you're going to compare languages, you should find someone who knows each of them well.

2 more replies

Daishiman10y ago

The Unicode situation in most languages is dismal.

Honestly though, the lack of generics for that Math.min function makes me happy I'm not programming in Go.

insertnickname10y ago

    if a > b {
        // use a
    } else {
        // use b
    }

ridiculous_fish10y ago

Oh dear. You had one job, min!

1 more reply

Veedrac10y ago

That's actually the wrong way around.

1 more reply

BossHogg10y ago

Article content aside, the slide out side menu that covers the scroll bar is incredibly annoying. Is that Blogger? Whatever it is needs to stop. Now.

Sir_Cmpwn10y ago

The C code there fails if the unicode string includes characters whose width is greater than one octet.

zokier10y ago

Which is noted right in the post:

> This treats things as byte-array instead of unicode, thus for unicode test it will end up printing just 車賈滑豈.

rakoo10y ago

Which is useless then, because the output can't safely be considered a string anymore. I don't really see the point of writing the C "equivalent" and giving it any point when it doesn't even do the right thing.

1 more reply

darkstalker10y ago

Rust version:

    fn main()
    {
        if let Some(arg) = std::env::args().nth(1)
        {
            println!("{}", arg.chars().take(12).collect::<String>()); // chars() iteraters over codepoints
        }
    }

Veedrac10y ago

Idiomatic Rust would probably avoid allocations, which means something more like

    fn main() {
        if let Some(arg) = std::env::args().nth(1) {
            println!("{}", {
                match arg.char_indices().nth(12) {
                    Some((idx, _)) => &arg[..idx],
                    None => &*arg
                }
            });
        }
    }

With the `unicode-segmentation` crate[1], you can just swap `char_indices()` with `grapheme_indices(true)`.

[1] https://crates.io/crates/unicode-segmentation

Skunkleton10y ago

How is this on the front page of hacker news? What a shit post.

edofic10y ago

A mandatory smart-ass Haskell response

    import System.Environment (getArgs)
    main = do
      [str] <- getArgs
      putStrLn $ take 12 str

nicolast10y ago

Now with more operators!

    import System.Environment (getArgs)
    main = putStrLn =<< take 12 . head <$> getArgs

;-)

joeyh10y ago

The actual smart-ass haskell response is simply "take 12". The spec didn't specify this needed to be a impure shell command, so a pure function is obviously better.

coldtea10y ago

Well, for smart-ass (and I know you meant it as a joke) is not very impressive. Don't do anything more than the others, and the syntax is not so great either.

Veedrac10y ago

On the contrary, his is the only one that crashes when more arguments than expected are passed. Hooray progress!

_pmf_10y ago

Of course, the C version could be just

    printf("(%.12s)\n", argv[1]);

pjmlp10y ago

Assuming using 7 bit ASCII

_kst_10y ago

No, it merely assumes one byte per character. For example, it would work correctly in Latin-1 or EBCDIC.

In any case, the problem statement (though it's a bit vague) requires building a truncated string, not just printing it.

2 more replies

jackielii10y ago

why can't I downvote this!!! erhhhh

IshKebab10y ago

Now try distributing your Python code as a single statically linked exe.

PyComfy10y ago

http://nuitka.net/pages/overview.html

chapium10y ago

Completely off topic, so if you are looking for discussion about the article skip this.

The low contrast ratio and bright colors on this blog are a bit hard to read. I normally switch to readability mode in safari when I encounter this, but the sites layout prevents this from working.

jofer10y ago

The text is black on white... Am I missing something?

BinaryIdiot10y ago

Hmm, are you referring to something very specific? The contrast ratio is incredibly high (black text on white background). The navigation bar has terrible contrast but that's all I saw.

j / k navigate · click thread line to collapse

66 comments

crawshaw10y ago

There are several ways to solve this in Go. The first that comes to mind, assuming you want to truncate to the first 12 runes, not bytes:

        func main() {
            v := []rune(os.Args[1])
            if len(v) > 12 {
                v = v[:12]
            }
            fmt.Println(string(v))
        }

Or more in the spirit of the C example in the post:

        func main() {
                res := make([]rune, 12)
                copy(res, []rune(os.Args[1]))
                fmt.Println(string(res))
        }

Note that res will stay on the stack, just like C.

jerf10y ago

"I expect the author is trying to say something about Go that I'm not quite getting."

Yes, it is.

bsaul10y ago

A low-level feeling when manipulating arrays (or slice), and a poor support for generic functions ( that would be math.min in this example).

1 more reply

rdtsc10y ago

> , let's talk about sending on something like a channel in Python:

  import Queue; q=Queue.Queue(); q.put(1)

> to say nothing of Python's implementation of the "go" keyword...

Why would Python have a go keyword? Go doesn't have the "except" keyword that Python has, not sure what the point it?

pekk10y ago

1 more reply

Jabbles10y ago

fmt.Printf("%.12s", os.Args[1])

johannesboyne10y ago

+1 for simplicity

pjmlp10y ago

I assume it has to do with Unicode support.

masklinn10y ago

> Simple enough, in essence given first argument, print it up to length 12. As an added this also deals with unicode correctly

That's not true, Python 3 uses codepoint-based indexing but it will break if combining characters are involved. For instance:

    > python3 test.py देवनागरीदेवनागरी
    देवनागरीदेवन

Veedrac10y ago

The easy Python way:

    import sys
    import regex
    print(regex.match("\X{,12}", sys.argv[1]).group())

with the regex[1] package that should be in the stdlib Any Day Now™.

[1]: https://pypi.python.org/pypi/regex

Spiritus10y ago

Interesting, I had no idea the `re` module was getting revamped. Scheduled for 3.5 or later?

1 more reply

stevenbedrick10y ago

masklinn10y ago

> Combining characters pop up in the most interesting places, often where and when you least expect them! ٩(•̃̾●̮̮̃̾•̃̾)۶

hahainternet10y ago

That's a shame, it works as you'd expect in perl6:

  sub MAIN($s) { say $s.substr(0,12) }

  $ perl6 test.p6 ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇȋȏȗ
  ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇ

masklinn10y ago

Turns out there are precomposed versions of these clusters, so your system might just be using these.

Could you retry with the input "देवनागरीदेवनागरी"?

1 more reply

bmn_10y ago

Languages that cannot deal with graphemes are lame. I daresay this solution below should score 20 in OP's imaginary scale.

    $ perl -CADS -E'say $ARGV[0] =~ /(\X{5})/' देवनागरीदेवनागरी
    देवनागरी

Length of input string is: 10 graphemes, 16 codepoints, 48 octets (UTF-8).

Length of output string is: 5 graphemes, 8 codepoints, 24 octets (UTF-8).

flohofwoe10y ago

Doesn't the C version have a serious bug? If the input string has 12 or more characters, the destination string will not be zero-terminated.

From the strncpy docs:

ansible10y ago

I'm usually sticking +1s to the storage for any strings for this purpose. So if I want to operate on MAXLEN number of characters, I'll allocation MAXLEN+1 for the character array.

And often times I'll be memset()'ing the destination to all NULLs when doing a string copy operation. I'm not real happy with string handling in C... as if that should be surprising to anyone.

rch10y ago

You might look at the one from Redis:

https://github.com/antirez/sds

1 more reply

Ianvdl10y ago

The author awards some arbitrary points to C even though his implementation of the solution is broken. His similarly poor Go implementation receives zero of these arbitrary points.

A more meaningful comparison would be to implement simple, efficient, working solutions to these problems and comparing them. This, as it stands, does not lead to any useful discussion.

BinaryIdiot10y ago

I'm not sure what the takeaway is from this blog entry. Is it that Python 3 can do substrings easier than the other languages therefore we should use Python 3? That was what I thought it was, anyway.

Seems silly to pick a language based off this single, silly criteria otherwise why not JavaScript or probably other languages that can make the code even smaller?

console.log(mystring.substring(0, 12));

So it just seems arbitrary and weak in my opinion.

steeleduncan10y ago

The entire scenario seems to have been constructed to highlight the runtime panic caused by out of bounds slices in Go. Either that or the well-known and well-discussed lack of generics.

_kst_10y ago

2 more replies

Daishiman10y ago

The Unicode situation in most languages is dismal.

Honestly though, the lack of generics for that Math.min function makes me happy I'm not programming in Go.

insertnickname10y ago

    if a > b {
        // use a
    } else {
        // use b
    }

ridiculous_fish10y ago

Oh dear. You had one job, min!

1 more reply

Veedrac10y ago

That's actually the wrong way around.

1 more reply

BossHogg10y ago

Article content aside, the slide out side menu that covers the scroll bar is incredibly annoying. Is that Blogger? Whatever it is needs to stop. Now.

Sir_Cmpwn10y ago

The C code there fails if the unicode string includes characters whose width is greater than one octet.

zokier10y ago

Which is noted right in the post:

> This treats things as byte-array instead of unicode, thus for unicode test it will end up printing just 車賈滑豈.

rakoo10y ago

1 more reply

darkstalker10y ago

Rust version:

    fn main()
    {
        if let Some(arg) = std::env::args().nth(1)
        {
            println!("{}", arg.chars().take(12).collect::<String>()); // chars() iteraters over codepoints
        }
    }

Veedrac10y ago

Idiomatic Rust would probably avoid allocations, which means something more like

    fn main() {
        if let Some(arg) = std::env::args().nth(1) {
            println!("{}", {
                match arg.char_indices().nth(12) {
                    Some((idx, _)) => &arg[..idx],
                    None => &*arg
                }
            });
        }
    }

With the `unicode-segmentation` crate[1], you can just swap `char_indices()` with `grapheme_indices(true)`.

[1] https://crates.io/crates/unicode-segmentation

Skunkleton10y ago

How is this on the front page of hacker news? What a shit post.

edofic10y ago

A mandatory smart-ass Haskell response

    import System.Environment (getArgs)
    main = do
      [str] <- getArgs
      putStrLn $ take 12 str

nicolast10y ago

Now with more operators!

    import System.Environment (getArgs)
    main = putStrLn =<< take 12 . head <$> getArgs

;-)

joeyh10y ago

The actual smart-ass haskell response is simply "take 12". The spec didn't specify this needed to be a impure shell command, so a pure function is obviously better.

coldtea10y ago

Well, for smart-ass (and I know you meant it as a joke) is not very impressive. Don't do anything more than the others, and the syntax is not so great either.

Veedrac10y ago

On the contrary, his is the only one that crashes when more arguments than expected are passed. Hooray progress!

_pmf_10y ago

Of course, the C version could be just

    printf("(%.12s)\n", argv[1]);

pjmlp10y ago

Assuming using 7 bit ASCII

_kst_10y ago

No, it merely assumes one byte per character. For example, it would work correctly in Latin-1 or EBCDIC.

In any case, the problem statement (though it's a bit vague) requires building a truncated string, not just printing it.

2 more replies

jackielii10y ago

why can't I downvote this!!! erhhhh

IshKebab10y ago

Now try distributing your Python code as a single statically linked exe.

PyComfy10y ago

http://nuitka.net/pages/overview.html

chapium10y ago

Completely off topic, so if you are looking for discussion about the article skip this.

The low contrast ratio and bright colors on this blog are a bit hard to read. I normally switch to readability mode in safari when I encounter this, but the sites layout prevents this from working.

jofer10y ago

The text is black on white... Am I missing something?

BinaryIdiot10y ago

Hmm, are you referring to something very specific? The contrast ratio is incredibly high (black text on white background). The navigation bar has terrible contrast but that's all I saw.

j / k navigate · click thread line to collapse