XMark — An XML Benchmark Project

xmlgen - The Benchmark Data Generator

xmlgen produces XML documents modeling an auction website, a typical e-commerce application. The high-lights of the data generation are:
  • Generation of well-formed, valid, and meaningful XML data.
  • Efficient, scalable generation of XML documents the size of several GBytes.
  • Observing of referential constraints concerning ID/IDREF pairs.
  • Low, constant memory requirements, independent of the size of the generated document.
Number and type of elements are chosen according to a template and parameterized with certain probability distributions. The words for textparagraphs are taken from Shakespeare's plays.

The design assures reproducibility across platforms (marginal differences in documents may result from round-off errors though). Moreover, the characteristics of a document are fully preserved under scaling, aiding the analysis of bottlenecks and how they evolve with increasing data volume.

In the design of xmlgen, we deliberately reduced the number of parameters to only a single one: the size of the document. We believe that the diverse structure of the document captures all important features found in typical XML documents.

