Clean path names for taxonomies and terms

My danish-english websites about open source and open standards continues to raise i18n-issues. To be more precise: The danish part does!

All the included sites are configured with two taxonomies. And it turns out to be possible to bridge between the languages on the same templates - details about this here.

So far so good.

Now it turns up, that the danish taxonomies are transferred directly to there respective folder/path names in the file system. That means also taxonomy-words and terms with special danish characters like æ, ø and å.

The generated index-urls work fine, even if they contain one of the special letters. But out of experience I would rather keep all special characters out of all urls right from the start.

So I started to search for a template function, that replaces special characters in path-names for taxonomies and terms, the same way that RemovePathAccents, preserveTaxonomyNamesand disablePathToLowerdo. But no luck.

Unless I missed something that kind of function would be really useful. Not only for scandinavians but also for germans and many other nationalities with more than the 25 english/latin alphabet.

Suggested name in config-file: ReplacePathSpecialCharacters: true/false

To start with the danish/norwegian special letters:

æ → ae
ø → oe
Å → aa
Æ → Ae
Ø → Oe
Å → Aa

And not to forget all the diacritics. Unfortunately these are almost countless, and this fact may complicate the project. But here are some, that first come into mind:

ä → ae
ö → oe
ü → ue
Ä → Ae
Ö → Oe
Ü → Ue

It would be really nice, if the titles on the rendered taxonomy- and term-pages keep their native letters, while only the path-names (and urls) are affected by the conversion.

While not being able to find a simpler solution, my quick fix goes the other way around.

The words containg special characters are a minority in danish after all, so I have decided to name the danish taxonomies and terms without special characters - and then replace the index-titles for them to make them 100 pct. danish.

This hack requires a lot of substring replacesments using this code. But it does the job for the time being.