undefined | Better HN

0 pointsmseepgood3y ago0 comments

How would Rust help here? Isn't it famous for having too many string types?

0 comments

It's the 'too many string types' that helps.

With C++, if you have char*'s (because you don't need to own the memory) and you pass it to a function that takes a const std::string& (because it also doesn't want to own the memory), then there will still be an implicit conversion to a temporary std::string (involving an allocation) despite neither the caller or the callee needing to own any memory.

With Rust, if you have a &str (because you don't need to own the memory) and you pass it to any function that takes a String (or even the unidiomatic &String), then you will get a compile error. There won't be any implicit conversion of types and therefore no implicit allocation. If you really want to pass it, you need to explicitly convert it, making the cost of the allocation explicit.

Rust's "too many strings" model says "there are many different ways in which you can use string-like objects, each with their own performance tradeoffs. Know which one you want to use in your code or I won't compile".

mwcampbell3y ago

This discussion is making me wonder if windows-rs [1], the crate with official Rust bindings for all Windows APIs, is doing something that's not idiomatic Rust. Specifically, for any Windows API function that takes a UTF-16 string as a parameter, the signature for that parameter is something like "impl IntoParam<PCWSTR>". The crate then implements that trait for String and &str, so you can pass a normal Rust UTF-8 string (even a string literal), and it'll be automatically converted to a freshly-allocated, null-terminated UTF-16 string (which gets freed after the function call). That seems like it could lead to the same thoughtless inefficiency as in the story about the Chrome omnibox.

[1]: https://github.com/microsoft/windows-rs

infogulch3y ago

Well that will be necessary until windows gets UTF-8 APIs. Probably not soon. Until then there are various optimizations you can do, like caching the UTF-16 conversion alongside the UTF-8 string (good for calling OS APIs frequently with with long-lived strings), allocating temporary UTF-16 conversions on the stack (good for infrequent calls with strings up to a certain size), or storing raw UTF-16 strings as opaque bytes in Rust memory (good for providing strings back to the OS that you got from the OS).

You should try to avoid calling OS APIs in general and cache the results as much as possible. Who knows what the performance characteristics are of an API that has to serve 7 layers of historical OSes simultaneously. Unless you're directly interfacing with the kernel you shouldn't expect much. Omnibar-like layered calls between your app and the OS are a worst-case scenario regardless of conversions.

1 more reply

bluejekyll3y ago

That might hide it from the caller, but the function that receives that IntoParam type will still need to explicitly call the conversion function.

1 more reply

flohofwoe3y ago

It would most likely suffer from similar problems when interacting with the C and C++ APIs in the rest of Chrome though (e.g. what to do if you have a Rust String, but the other side wants a const ref to a C++ std::string).

imron3y ago

Use a CxxString: https://cxx.rs/binding/cxxstring.html

At some point there will need to be an allocation when crossing Rust -> C++ boundary because Rust strings are not null-terminated.

The difference Rust makes is that unlike C++, it is always explicit when the allocation occurs.

1 more reply

throwaway8943453y ago

My minor, unpolished grievance with Rust's approach is that you have to do this for all kinds of types (e.g., Path vs PathBuf). It's tedious to have to write these pairs all the time, along with all of the trait implementations and so on. It almost feels like it would be nice if the type system could allow us to write `String` or `PathBuf` and automatically generate the corresponding `str` or `Path` types.

josefx3y ago

> With C++, if you have char*'s (because you don't need to own the memory)

If you are using C strings in C++ you are either doing something incredibly low level or don't care about performance at all. C strings require strlen calls or something equivalent for basic operations and you can easily run into code with exploding runtime if you aren't extremely careful.

saagarjha3y ago

> If you are using C strings in C++ you are either doing something incredibly low level or don't care about performance at all.

…or interoperating with C code?

1 more reply

hoseja3y ago

psst... std::string_view

imron3y ago

A step in the right direction if you have a compiler with c++17 support.

Note: chrome only supported c++17 features in Dec 2021 [0], and whether std::string_view is allowed to be used is still 'to be determined'.

0: https://chromium.googlesource.com/chromium/src/+/HEAD/styleg...

1 more reply

seritools3y ago

Ah yes, the one that still just wraps a raw pointer in the end

https://github.com/isocpp/CppCoreGuidelines/issues/1038

andreidd3y ago

And then someone will convert a std::string_view to a const char* and things will explode...

tialaramex3y ago

A mixture of culture and technology.

Technologically, Rust's only built-in string type, &str, is a reference to a string slice - that is, you can't change it (the reference isn't mutable) and it is both a pointer to the start of some UTF-8 and the length of the UTF-8.

What encoding? Always UTF-8. Only UTF-8. Not "Well, it's kinda UTF-8 but..." it's just always UTF-8. This moves the burden to a single place, your text decoding code, to do things correctly, and great news - the entire world is moving to UTF-8, so you're on a downhill gradient where every week this works better without you lifting a finger.

That reference knowing the length is brilliant. Trimming whitespace off a string? You can just make another immutable reference to the smaller trimmed string. Zero copies. Slicing a URL up into components? You can do that too, zero copies. And yet it's all memory safe.

Now, Chromium is not some raw firmware for a $1 micro-controller, so it has library types like Rust's alloc::string::String (you can just name it "String" in normal Rust code but that is its full name) which, as its presence in alloc suggests, is an allocating String type, you can concatenate them, you can make them by formatting a bunch of other variables, the default ones are empty, the data goes on your heap and so on. But, String is AsRef<str> which means if what you've got is a String, and what you're doing is calling a function that wants &str Rust is OK with that and it costs nothing at runtime. Why? Because that &str is just two of the elements of the String type you had, the pointer into the heap and the length, it's easy.

Rust has lots of other types for stuff like Foreign Interfaces, like CStr and CString (for the C-style NUL-terminated array of bytes which might be text) but your pure Rust code shouldn't care about those, often it can say (unsafely) "Look, the C++ promises this is UTF-8, we'll take their word for it" or "I only need it to have bytes in it, let's make [u8] and we're done".

Culturally, Rust programmers write &str when that would do. There's a strong cultural pressure not to write String when you really mean &str, and the compiler won't let you write &str if you needed String. So this results in less thunking of the sort complained about in C++

cowmoo7283y ago

In C++ when I see `DoX(y)` I have to worry every time about temporary lifetimes, copy vs move operator, and a bunch of other things that are easy to miss during code review. It is so easy to accidentally copy large strings around many times in a performance critical loop.

Rust makes all of that easier to see during code review. It is very explicit about these things.

I'm a Google employee working on chromium and chromeOS and have been asking internally about rust support for over a year now, so it's exciting that it's making progress.

_nalply3y ago

It's a complicated but well-thought out system which tends to avoid copies by making them explicit in the source code and preferring taking references or slices which are cheap operations.

The string slice for example is an Unicode-capable view into bytes of the string (immutably pre-compiled static bytes in the binary, bytes of fixed length on the stack or heap-allocated). The aliasing rules are enforced by the compiler, so it is safe to throw around pointers and sizes and not to worry about buffer overflows, as long as it compiles.

vlovich1233y ago

Everyone’s trying to justify the response when the honest truth is that no, Rust doesn’t solve the problem of abstraction layer impedance mismatches causing ownership to be dropped only to be reacquired at the next level. On a sufficiently large/complicated code base, the problem will arise.

As others have mentioned, various kinds of string types are baked into the language which makes it ergonomic to do “the right thing” from the get go, but hard to say. I would be skeptical of claims that it would make a difference, especially in the interim where you now have an added impedance mismatch with C++, Rust, C.

tialaramex3y ago

> Rust doesn’t solve the problem of abstraction layer impedance mismatches causing ownership to be dropped only to be reacquired at the next level

The expression of what C++ does here in Rust is awkward and, I think, nobody has proposed it because you'd never write that. Basically C++ char* is a raw pointer. Rust does have those, but you'd never use them in this context.

What you would use is either the borrowed slice reference &str or the owning String type, but in both cases we have an owned object and there's our crucial difference. If you've got the owned String, and I needed an owned String, I should ask for your owned String, and we're done.

In C++ "dropping" ownership as you describe is no big deal, the C++ design doesn't care, but in Rust if you actually drop(foo) it's gone. The references to it can't out-live that, if it's gone then they're gone. If you write code that gives away references and then tries to drop the thing they're references to, Rust will object that this is nonsense, because it is nonsense, you need to ensure those references are gone before dropping the thing they refer to.

As a result I feel you're greatly under-estimating the ergonomic difference.

vlovich1233y ago

> In C++ "dropping" ownership as you describe is no big deal, the C++ design doesn't care, but in Rust if you actually drop(foo) it's gone

I think you’ve built a straw man of my argument and then argued with that.

Clearly I meant that it seems possible that a sufficient complicated call stack could still be set up to jump between needing the owned String type and the borrowed &str type. That’s what I meant by dropping ownership as that’s what’s happening in the c++ code when you go between char*/string (the API is dropping its need for ownership). The argument of “ If you've got the owned String, and I needed an owned String, I should ask for your owned String, and we're done” is weak because that same argument would apply to C++ code and yet the code still ended up that way when you pasted together components in a very large code base. Now maybe it’s a bit simpler because you have string, string&, const string&, and const char* and doing that antipattern that happened in C++ just wouldn’t be ergonomic in Rust. Maybe. But that feels like a very thin argument and not “this is impossible in Rust”.

1 more reply

j / k navigate · click thread line to collapse

0 comments

imron3y ago

It's the 'too many string types' that helps.

mwcampbell3y ago

[1]: https://github.com/microsoft/windows-rs

infogulch3y ago

1 more reply

bluejekyll3y ago

That might hide it from the caller, but the function that receives that IntoParam type will still need to explicitly call the conversion function.

1 more reply

flohofwoe3y ago

imron3y ago

Use a CxxString: https://cxx.rs/binding/cxxstring.html

At some point there will need to be an allocation when crossing Rust -> C++ boundary because Rust strings are not null-terminated.

The difference Rust makes is that unlike C++, it is always explicit when the allocation occurs.

1 more reply

throwaway8943453y ago

josefx3y ago

> With C++, if you have char*'s (because you don't need to own the memory)

saagarjha3y ago

> If you are using C strings in C++ you are either doing something incredibly low level or don't care about performance at all.

…or interoperating with C code?

1 more reply

hoseja3y ago

psst... std::string_view

imron3y ago

A step in the right direction if you have a compiler with c++17 support.

Note: chrome only supported c++17 features in Dec 2021 [0], and whether std::string_view is allowed to be used is still 'to be determined'.

0: https://chromium.googlesource.com/chromium/src/+/HEAD/styleg...

1 more reply

seritools3y ago

Ah yes, the one that still just wraps a raw pointer in the end

https://github.com/isocpp/CppCoreGuidelines/issues/1038

andreidd3y ago

And then someone will convert a std::string_view to a const char* and things will explode...

tialaramex3y ago

A mixture of culture and technology.

cowmoo7283y ago

Rust makes all of that easier to see during code review. It is very explicit about these things.

I'm a Google employee working on chromium and chromeOS and have been asking internally about rust support for over a year now, so it's exciting that it's making progress.

_nalply3y ago

It's a complicated but well-thought out system which tends to avoid copies by making them explicit in the source code and preferring taking references or slices which are cheap operations.

vlovich1233y ago

tialaramex3y ago

> Rust doesn’t solve the problem of abstraction layer impedance mismatches causing ownership to be dropped only to be reacquired at the next level

As a result I feel you're greatly under-estimating the ergonomic difference.

vlovich1233y ago

> In C++ "dropping" ownership as you describe is no big deal, the C++ design doesn't care, but in Rust if you actually drop(foo) it's gone

I think you’ve built a straw man of my argument and then argued with that.

1 more reply

j / k navigate · click thread line to collapse