This is the very limited documentation for DTD2CPP. It works like this: you write a carefully crafted DTD to represent your data structures and you include in that DTD a bunch of SGML comments that contain keywords that are instructions to the code generator. Using the keywords, you can specify what types each element and its properties will have as well as inheritance relationships among XML classes (ones generated by this script) and between XML classes and external classes. For a DTD called "dtd" containing entity "Entity", a class "xml_dtd_Entity" is generated. The class is capable of reading XML conforming to the (limited) DTD as given to the code generator. You can (obviously) have as many entities as you like, one class will be generated for each. A simplified DTD is also produced which has the fancy comments stripped out and it is modified to take the polymorphism into account and let your documents validate properly. The comment fields are as follows (examples are all grouped together at the end): where you may include as many typespecs for the entity as you like and a single typespec is propertyname:type:default propertyname is a property listed in the tag for the entity EntityName. $ indicates that this is a typespec for the text inside an entity. The supported types are: i: long I: long long u: unsigned long U: unsigned long long f: float s: std::string Type can have an integer after it, specifying that the value is an array of that type, the length being the specified number. The default value should be a literal of the appropriate type. This allows you to insert additional code into the class declaration (header) for the class that is generated for entity EntityName. Typical things to do in here include the insertion of extra methods and members. The special string @CLASS@ will be substituted by the code generator for the actual class name that is generated. If your inserted code will make the class abstract (ie, it has a pure virtual method), you must insert this comment so that the generated parser does not attempt to instantiate this class; instead it will look for subclasses and try to parse & instantiate them in its place. Insert "code" at the end of the constructor for the generated class. There are some restrictions here, notably that you can't use "--" or ">" in your code since it will break the DTD parsing process. Suck it up and put everything in a method & call that method from this section. Insert "code" at the beginning of the class' destructor. Same restrictions. Specify that the class generated for EntityName (ie xml_mydtd_EntityName) is a direct subclass of xml_mydtd_BaseEntityName. It also implies that wherever a BaseEntityName is called for in the DTD, an EntityName may be used and the parser will do that appropriately. If you do this, you will need to insert a default constructor into BaseEntityName using the EXTCC mechanism. In that default constructor, you will need to initialise everything in the BaseEntityName class since it won't get done by parsing because a subclass is being parsed. Specify that xml_mydtd_EntityName is a direct subclass of some other class called ClassName. You must use the INCLUDES mechanism to make sure that a declaration of ClassName is included. "code" will be inserted at the top of your file; typically this will cause the inclusion of a bunch of headers that contain declarations of base classes mentioned in EXTENDSC or types involved in declarations added with EXTCC. Once again, no -- or > permitted, which makes it difficult to #include . Instead, you can use < and > which will be appropriately substituted. Or you could collect all that crap in one header file and #include "theheaderfile.h" which will work fine. Vileness. "comments" will inserted immediately before a class declaration, allowing you to document the class with Doxygen. Documentation of auto-generated members is not supported yet. The usual DTD ELEMENT declaration. You're restricted here though in what the contents can be. It may be, either: - EMPTY (the word "EMPTY"), indicating that the entity has no contents, - (#PCDATA) indicating that the entity contains text that is to be parsed according to the rules of a TYPE comment with typespec for property "$". - (Ent1[*], Ent2[*], EntN[*]) indicating that the contents will include Ent1, Ent2 and EntN. If a * is present after an entityname, a std::vector of that type will be read in, otherwise only one. eg: (foo,bar*,jim) will allow 1 foo, 1 jim and n bars. (A|B)[*] is NOT PERMITTED... because such things are difficult to represent in C++ unless you have a class hierarchy. So: make a class hierarchy with EXTENDSX, call for the base class in the ELEMENT declaration and the parser will permit the base class (if not abstract) and any non-abstract child class. The simplified DTD that is output will contain a (A|Y) construct containing all the relevant classes so that libxml2 will correctly parse such things. See the example directory for details. The usual ATTLIST DTD declaration. You may define something a property as CDATA and #REQUIRED or #IMPLIED. If it's #IMPLIED, the default value specified in the TYPE comment will be used to initialise the value in case it's missing from the XML. You can also list something as an enumeration, causing a proper C++ enum to be generated, eg: (false|true) "false" All meta-comments should be before their relevant ELEMENT tags otherwise the code generator will miss stuff. Code is generated when the ELEMENT tag is found, any relevant comments found after that point are too late. Make sure that the full inheritance hierarchy below entity Base is specified before defining an ELEMENT that contains a Base. The example is in example.dtd. Run make in the example directory see what happens with each of the above tags and how to integrate external code. All example code is covered by the GPL. It is highly recommended that you have doxygen and graphviz installed so that you can browse the generated class hierarchy graphically. William Brodie-Tyrrell william@brodie-tyrrell.org 20060420