Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Similarly, "ss" and "sz" (very rare) should match "ß" and vice versa, so "Strasse" is the same as "Strasze" and "Straße".

This is unworkable, since "Busse" (several autobuses) should not match "Buße" (repentance). But "Busse" (incorrect spelling of "Buße" due to character set limitations/Swiss German spelling I believe) should match "Buße" by your argument.

Anyway, modern Firefox lets you choose whether to match "Apfel" against "Äpfel".



You are right in that my argument is somewhat incomplete, one could be stricter and not match "Buße" when searching for "Busse" because that would be a somewhat uncommon almost-misspelling in German German.

But: "Busse" is not a always incorrect spelling, it's just an "emergency" one, if you really can't use "ß" for some reason. One example is allcaps text, here "BUSSE" and "BUSSE" are not distinguishable, although they have different meanings. Also, there is Swiss German, where replacing ß with ss is the normal form and not at all "accepted in an emergency" like in German German: https://www.galaxus.de/de/page/schweizerhochdeutsch-fuer-anf...

Incidentially this is one of the instances where you really have to know the language and the context to be able to do a lowercase -> uppercase -> lowercase roundtrip, otherwise you might screw up with "Buße -> BUSSE -> Busse", changing the meaning. Or not, if the current locale is de_CH instead of de_DE.

Together, since very often search will be case-insensitive, I think that while you may be correct, being as strict as to not match here would not be what the user would expect.

Yes, again, I18N is hard and sometimes it is impossible to do correctly for a machine.


Text search implementations do not have the luxury of assuming that the text they are used against us correctly spelled. Indeed one common usecase for find and replace is searching for misspellings.


Well, I think now the Germans invented a uppercase "ß" letter (in 2017), although there didn't used to be.


Yes, I know. Just more headaches, because you have one more equivalent variant to worry about (use of ẞ is correct but very rare), and fonts that do not support the new character ẞ yet or ever (so it will not be standard capitalisation for quite some time, if ever).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: