Importing and Exporting Content

Importing and exporting structured content for bulk editing is one of the common requirements of every project. Check out these tips to make this easier.

Importing and exporting structured content for bulk editing is one of the common requirements of every project. the data opens in Excel, the editor changes and saves it, uploads the Excel file back into CQ5 and everything is there. The usual approach to implement this use case is to render a CSV file, which can be easily created and parsed, so everything should be fine, right?

Excel and CSV 

Your dreaded enemy in this case is called character encoding. CSV files do not provide any means of defining the character encoding to be used, so the application has to 'guess' the proper encoding. If you are using UTF-8 (and you should), you can at least add an indicator, the so-called BOM that which can be used by the application to identify the encoding. Enter Excel. If you are running on Windows, an UTF-8 encoding with BOM is displayed correctly, but not on a Mac, and apparently the only way to get it right on both systems ist to use UTF-16LE with BOM. Which is awful. But if you get over the encoding, and even have the field and line separators right, the fun really starts. Because Excel doesn't save the file as is, maintaining the encoding (or the BOM). It saves the file in the locale of the current computer. So for a global company, you can be pretty sure that whatever the editors will upload ends up as gibberish. So, how to cope with the situation? We have found a not-so-pretty but at least reliably working approach covering the entire use case. Here's how it goes:

Exporting 

To export the file, go with html. A minimal header suffices; and the data can go into the table just as is. You may even use th tags, so you get something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
</head>
<body>
<table>
<tr>
<th>Field 1</th>
<th>Field 2</th>
<th>Field 3</th>
</tr>
<tr>
<td>Value 1</td>
<td>Value 2</td>
<td>Value 3</td>
</tr>
...
</table>
</body>
</html>

This page is then delivered as .xls file, i.e. the content type and the content disposition are set to xls:

response.setHeader("Content-Type", "application/vnd.ms-excel;charset=utf-8");
response.setHeader("Content-Disposition", "attachment;filename=" + currentPage.getTitle() + ".xls");

This file opens nicely in Excel on all tested machines (and the headers are even bold and centered, look at that!)

Import 

Still, how do you get that information back into CQ5 after the editor manipulated the data? Saving and uploading the file is kind of tricky, as Excel will create very different output based on the locale of the editor's machine. The easiest way to enable an import therefore is - an input field.

If you provide an input field like a textarea via a dialog and the editor copies over the cells from excel, the cells are filled in as TSV. You can easily parse it with a CSV parser, for example the one by Stephen Ostermiller. The character encoding on the other hand can be ensured by the browser: Since the content is copied from Excel to the browser on the same machine, it doesn't mess up the encoding. The browser is triggered to submit the content in a fixed encoding, so you receive all answers in UTF-8.

Et voilà, you have covered the entire cycle of export, editing and import in a editor-friendly way and still pretty stable against the quirks of character encoding and various plattforms.

Do you have a simpler solution, or do you see any issues with this approach?

 

Next Blog Post

by Stefan Franck

Content Management,  Developer

OSGi Configuration via JCR Nodes

Everything is content - that's the CQ5 credo. It also holds for configurations of OSGi services, which can be set by content nodes. Here's a short overview of how it can be used.

Read more...
Stefan Franck