Oh wow. That is really not very much pain, as described. I have to say, I never ...

antonvs · on April 27, 2022

> Makes me wonder how much of a programming language can be hotswappable.

For a research language that can make a lot of sense, not so much for a language to be used in industry.

The downside is that different libraries will have different string representations, so you can end up being forced to do a lot of conversion if you're using different libraries that have made different choices from each other, or from your own code.

There are at least 5 commonly used string types - String (linked list of Char), ByteString lazy & strict, and Text lazy and strict. The latter two have a good rationale for being different - byte strings are not necessarily text - but, for various reasons, they're often used to represent text anyway.

These five also have corresponding `readFile` functions - see https://www.snoyman.com/blog/2016/12/beware-of-readfile/ . As Snoyman recommends in that post, it's probably best to "Stick with Data.ByteString.readFile for known-small data, use a streaming package (e.g, conduit) if your choice for large data, and handle the character encoding yourself. And apply this to writeFile and other file-related functions as well."

The first comment on that post starts out with "This problem extends well beyond readFile." Having the string handling more standardized at the language level can make life quite a bit simpler for developers.

jbboehr · on April 28, 2022

> String (linked list of Char)

That... sounds awful for performance. Is that a real thing?

antonvs · on April 30, 2022

It is a real thing, but it's a design decision that dates back to around 1990 when Haskell was very much purely a research language.

Haskell's type class capability (similar to traits or interfaces) was still new/experimental at that time, and one benefit of strings as lists is it allowed them to easily be manipulated using the same syntactic and semantic machinery - pattern matching, recursive processing etc. - as other list data.

These days real, non-trivial code uses much more optimized string representations, but the original String type still exists and is used in various standard library functions, like "error". But as the original commenter pointed out, "you can just upgrade strings like any other dependency," so if you want, you can always import some other library that uses e.g. the Text type for error message, if you care for some reason.

crdrost · on May 1, 2022

Yes and it's even worse than you think... The characters are not even raw characters, they are “boxed” into data structures and then the linked-list code guards them with a “thunk,” so “If I have computed this char return the cached char, else compute it and save it to the cache and then return it.”

The performance is good enough for teaching and learning and mocking out examples... But the reason for the user libraries is that real string processing workloads need something way better as their default.

kaba0 · on April 28, 2022

While it is surely bad for performance, it is not that bad in practice because haskell has many optimizations for dealing with these lazy, possibly infinite sequences.

thaumasiotes · on April 28, 2022

Absolutely; that's how Erlang represents strings. (Technically, a linked list of integers; there's no such thing as a Char.)

It generalizes really well to Unicode; encoding is unnecessary. :p

On the other hand, I think the name of the function that converts an integer to its own string representation, integer_to_list, could have been chosen better.

resoluteteeth · on April 27, 2022

Utf8 vs utf16 as the internal representation of the Unicode string type is mostly just an implementation detail.

This is very different from going from python2, which conflated bytes and ascii strings, to python3, which intentionally changed the api to propely distinguish sequences of bytes and strings.