JavaScript native2ascii - Convert Unicode strings to ASCII compatible \uxxxx notation

Motivation: Converting Unicode text in a way that it only contains ASCII characters. Why? Diacritics (accents) can cause many problems. Some closed source browsers process JavaScript files as iso8859-1 regardless of the Content-Type header sent by the server. This makes non-ASCII characters display as beautiful question marks...

This odd behaviour can be circumvented by bringing text constants to a '\u'-escaped hexadecimal notation in the source file. A perfect commandline tool for this very task is native2ascii which is part of the Java SDK, but sometimes you need this functionality on the client side. A bit of inspiration and 15 minutes yielded the following dozen lines of JavaScript.

The code


/*jslint browser: false, white: true, undef: true, nomen: true */

function native2ascii(str) {
	var out = "";
	for (var i = 0; i < str.length; i++) {
		if (str.charCodeAt(i) < 0x80) {
			out += str.charAt(i);
		} else {
			var u = "" + str.charCodeAt(i).toString(16);
			out += "\\u" + (u.length === 2 ? "00" + u : u.length === 3 ? "0" + u : u);
		}
	}
	return out;
}

Try it!

Type or copy text containing special characters into the first text box, the script prints the results into the second one.

© 2003-2020 lithium.io7.org
Content on this site is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.