Tuesday, October 25, 2011

Hiatus!

Hi all!

I been on a lengthy hiatus entitled "full-time fall semester with 2 jobs and an internship".  Please pardon me.  Fortunately, I have been engaged both in digital preservation and web design/information architecture, so I also have a lot to share.  I fully intend to complete my excursion into 23 Things for Archivists, but during the semester it's not really feasible.  Instead I am happy to share some of my digital preservation work here.  I recently delved into the XENA software from the National Archives of Australia, and have included a brief report below (with an introduction to XENA).  May you find it useful!

XENA

Besides being a leather-clad television warrior princess with a saucy blond sidekick, XENA, is also a digital preservation tool created by the National Archives of Austalia.  XENA (XML Electronic Nomalising for Archives) is a free, open source software (yay!) it can be downloaded from Sourgeforge.net, bearer of all (okay, many) things open source! Download the National Archives of Australia XENA software at Sourceforge.net.  XENA takes certain of your digital/media files and converts them to preservation-friendly formats (open source, well-supported, community-driven).  It then wraps them in metadata and goodies to help render them in the future.  The result is a .xena file that XENA will be able to open in the future, unlike your dusty old MS Powerpoints that you made in Windows 95 (and copyright, and copyright, and copyright, Microsoft).

Here's the rundown, from what I gathered:
PROS:
  • Converts proprietary file formats to a limited number of open formats the ideas here being that:
      • 1. Open formats have greater longevity and there will always be means to open them
      • 2. Supporting fewer file formats is easier and cheaper for a repository.
  • Fast and simple*     (*once configured.  I'll come back to that).
  • Viewer is included with Xena
    • Even though the original file is wrapped into an XML file, the reader will allow you to view what's in there).
  • Files can be exported via the viewer.
  • Unsupported files can still be binary normalised ("normalized", if you're not using The Queen's English :) ).
    • You can still create a .xena file, but the file will not be normalized into a preservation-friendly format.
CONS:
  • The list of formats accepted by XENA is decently long, but still limited. 
  • *Installation and configuration can be a bit tricky. 
    • That was the asterisk above.  The program is very easy to run, as long as it is configured properly.  You'll want to read the documentation about configuration carefully.
  • If not installing from Windows .zip file, other downloads are required. 
    • XENA uses several plug-ins to make the conversions successful.  It needs Open Office for all downloads, and additional plugins if you're working from Mac OS or Linux.  Again, make sure to read the documentation.
  • There are some bugs. 
POTENTIAL CONS:
  • XENA supports .gzip, .jar, .tar, .zip, .mp3, .wav, .aiff, .ogg, .flac, .sql, .csv, .tsv, .ppt, .doc, .pps, .xsl, .xlsx, .pptx, .docx, .mpp, .rtf, .sylk, .sxc, .sxi, .sxw, .wpd, .pst, .trim, .bmp, .cur, .gif, .pcx, .psd, .ras, .svg, .tiff, .css, .xlst, and etc., etc., etc.... but who supports .xena!?
This was my largest overarching question (and one that a PhD student in the field was also unsure how to answer) about XENA: What is the National Archives of Australia's commitment level to sustaining the longevity of this format?  This format is their baby, and they are going to maintain it.  But, the .xena file is a file format in and of itself, and will eventually face preservation issues as well.  Because NAA is currently (that I am aware of, I could certainly be wrong!) the only major supporter of this .xena format, they will need to make a nearly interminable support statement for this software.  I wonder what kind of information is out there on how they see this format branching out to be more widely supported.  Will the NAA support a program that runs .xena files forever?  Just something to think about.

Report on Working with Xena

   The above is a screen shot taken from plugin installation with XENA, with 2 failed conversions in the background.  For a brief report, see below.

Normalization and Encapsulation using XENA

My experience with encapsulation and the Xena software from the National Archives of Australia was generally a good one. The software bundles certain files into preservation-friendly formats with metadata into a file format called .xena. This process is called “normalization”. The program runs across several popular platforms, is open source, and was written in a Java Runtime Environment. The .xena file format comes as a result of a Base-64 encoding of a file, wrapped in XML metadata. The .xena file is plain-text and human readable (though the enclosed file will not be) (Xena Help File, National Archives of Australia).

I could certainly see how this program would be helpful for data preservation, but I also had questions about the environment that Xena is using. The identification of files, one of Xena's functions, is of course helpful for preservation. Current or recent files usually have metadata that include at least the program and the date, saved it its Properties. As I learned with Exercise 3, though, slightly older files may have deceptive file extensions, wrong file extensions, or no file extensions. In the case of files that predated any kind of standardization, identification of file type is crucial to understanding how to preserve files. If the file in question was created using a spreadsheet software, for example, the preservation process will need to take that into account so that functionality can be salvaged. Likewise, the process that converts this pre-recognized files into preservation-friendly formats is useful for the long-term curation of materials. This process is akin to using The National Archives' DROID and then saving in a preservation-friendly format, all rolled into a single program. I could see how this program could potentially be used to automate a work flow that would facilitate preservation for a repository.

The program has some kinks, though. I had trouble getting Xena to register with OpenOffice in order to covert files to open formats. After configuring the base directory of Openoffice to C:\Program Files\OpenOffice.org 3 (the home directory of Open Office on my computer), file normalizations were still not successful. I tried using an older version of OpenOffice (1.1.5), and significantly expanding the sleep time for OpenOffice to open with Xena. I was able to binary normalize the files (wrap them into a .xena file without converting them to an open format). Alternatively, the files could be manually converted one of the output formats for Xena (in this case, a corresponding Open Document Format like .odt or .ods). This process would be certainly an annoying (or, worse, impossible?) task and work addition for a repository.
After much troubleshooting, I came across this in the documentation: Normalisation will be more reliable if a single version of OpenOffice.org is installed on your computer. For this reason, it is recommended that you remove any earlier versions of OpenOffice.org from your system, making sure you only have the latest version installed.” I uninstalled the 1.1.5 version of OpenOffice and tried again. I was still unsuccessful. I had read in the documentation, also, that there was a known bug in which Windows XP and Vista can delay the launch of OpenOffice: http://sourceforge.net/apps/mediawiki/xena/index.php?title=Known_Issues. I was not able to amend this by increasing the sleep time. The only functional workaround I was able to establish was by changing the formats myself. I will say that all other plugins loaded successfully, and I had no trouble with any of the non-Office file formats.

In the larger picture, I wonder about the program of Xena itself as pertains to preservation. The specifications are open, and the project is in the long-term purview of the Australian Archives. However, I wonder if the program itself, and its corresponding unique file type, could become obsolete in themselves. What would this mean for Xena as a preservation aid? Would backwards mobility be an essential development goal? Will there be other ways of “releasing” the original files if Xena and Xena Reader are not available? Presumably, the XML metadata will be human readable, but the files may not be. I would be interested in discussing this further.

Below is a screen shot from the plugin installation, with 2 failed conversions in the background. I later chose 2 Office files and binary normalized them using Xena.

No comments:

Post a Comment