XML2Ascii- storing XML content using ASCII characters only

The task: Serialize arbitrary XML structure into an equivalent representation that only contains ASCII characters, so it remains readable by any standard compliant XML parser.

While working on some custom Tivoli Security stuff, I was required to store an XML with UTF-8 encoding in an LDAP server that only supported iso-8859-1 characters. For various reasons, converting the charset of the XML or storing in Base64 were out of scope. Numeric escape sequences came to rescue in the case of XML as well, however, CDATA sections required further ideas, as you cannot escape sequences within those sections. The trivial solution could have been an identity transformation, however it does not enable the detection and preservation of CDATA sections, which was indispensable in this situation.

The following few lines solve the task with minimal modifications to the DOM. The beauty of the solution is in it's simplicity.


package hu.lithium.xml;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.io.PrintStream;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class XML2Ascii {
	
	public static final String help = "usage: java -jar xml2ascii.jar <input.xml> [<output.xml>]\nReplace all non-ASCII characters with NCRs while doing minimal modification to the document (e.g. splitting CDATA sections)";

	public static void main(String[] args) {
		if (args.length < 1) {
			System.err.println(help); System.exit(1);
		}
		InputStream input = System.in;
		PrintStream output = System.out;
		try {
			input = new FileInputStream(args[0]);
		} catch (FileNotFoundException e1) {
			System.err.println("[error] unable to open input file '"+args[0]+"', using stdin instead.");
		}
		if (args.length > 1) {
			try {
				output = new PrintStream(args[1]);
			} catch (FileNotFoundException e) {
				System.err.println("[error] unable to open output file '"+args[1]+"', using stdout instead.");
			}
		}
		try { 			
			Transformer t = TransformerFactory.newInstance().newTransformer();
			t.setOutputProperty(OutputKeys.ENCODING, "us-ascii");
			t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
			output.println("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
			t.transform(new StreamSource(input), new StreamResult(output));
		} catch (TransformerException e) {
			System.err.println("[fatal] unable to transform XML");
			e.printStackTrace();
		}
	}
}
© 2003-2020 lithium.io7.org
Content on this site is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.