Filenames with legacy encodings converted to UTF-8

10.02.2016 - 23:24 - 11.02.2016 - 00:24

As previously noted, non-UTF-8 filenames on the department fileserver have caused some difficulties, most notably with directory listings being broken on user and group home directories containing such files.

All invalid UTF-8 filenames in those directories have been converted to UTF-8. Filenames with old DOS codepage 850 scandinavian characters in addition to ASCII were autodetected as such and converted appropriately. Otherwise non-UTF-8 names were assumed to be ISO-8859-15 (Latin-9), which was the department default before UTF-8. If both the valid and invalid forms of the filename were present in a directory, the invalid name was nevertheless converted. To avoid loss of data, in such cases a suffix of the form ".autoconverted.XXXXXX", where XXXXXX is a random string, was added to the converted filename.

While we acknowledge that the change might cause some problems for marginal use cases, the fact of the matter is that the problems with having legacy encoded filenames were acute and more severe. We apologize for any inconvenience and will provide assistance if you have trouble due to this procedure. We are able to reconstruct the original filenames if necessary, for instance if the legacy character set used was not one of the above.

We will investigate if users can still create invalid filenames and see if that can be minimized and/or mitigated. If such files slip through the cracks, we will likely repeat this operation.

For normal usage, naming a file by simply typing it in on department Linux systems in their default configuration will pose no problems of this sort, regardless of any exotic characters that you might use. (Configuring one's user account to use a non-UTF-8 encoding is unsupported.) Nevertheless, we remind you that for file interchange (eg. via web pages), unless you know what you're doing, restricting yourself to US ASCII remains, sadly, the surest option.

As always, contact with any questions.

11.02.2016 - 01:01 Mikko J Rauhala
11.02.2016 - 00:56 Mikko J Rauhala