Before getting into the specifics of version 2.0 of Define-XML, let’s first take a look at some background information about define!
What is Define-XML and why is it important?
CDISC’s Define-XML is a document that describes the structure and content of data collected in a clinical trial process. Define-XML is extremely important in clinical trials and is required by the United States Food and Drug Administration (FDA) and the Japanese Pharmaceuticals and Medical Devices Agency (PMDA). The FDA’s Technical Conformance Guide explains that Define-XML is "arguably the most important part of the electronic dataset submission for regulatory review” because it helps reviewers gain familiarity with study data, its origins and derivations.
Define-XML version 2.0
Define-XML v2.0 represents a significant update from the previous version and was developed in response to implementation experience with v1.0, the evolution of the SDTM, SEND and ADaM standards and best practices by SDTM and ADaM metadata experts.
Differences between Define-XML v1.0 and v2.0
Define-XML 2.0 introduces many improvements over the original, which are summarized in Figure 1 below. These changes are focused around improving value level metadata, linking to CDISC/NCI Controlled Terminology standards and linking to annotated CRFs. The new version also removes ambiguity and provides improved support for ADaM metadata.
Define-XML: Technical deep dive
Value level metadata re-imagined
Value Level Metadata is needed where values in a table column may have different metadata depending on the row they are in. For example, the content of VSORRES might be DataType=”integer” for one value of VSTESTCD, but DataType=”float” for another.
Define-XML 1.0 supports this by providing a Value List that defines the content of VSORRES for each test code. It does this “by convention” however, with no clear, unambiguous way to know exactly what the Value List is defining. It also doesn’t describe how Value Lists can be used to provide Value Level Metadata for multiple columns, for example, where VSORRES and VSPOS both have different definitions, depending on the test code.
Different organizations interpreted the specification in different ways and ended up with incompatible implementations. Define-XML 2.0 removes all this ambiguity by allowing Value Lists to be provided explicitly for each variable in the dataset. This allows full description of the metadata for any value in any variable.
There is a new mechanism to describe the conditions under which each value definition is applicable. Where Clauses define a condition, such as “Where VSTESTCD=SYSBP”. These conditions are linked to the Value Level Metadata so that it is unambiguously known when each definition applies. Compound conditions can be used such as “Where VSTESTCD=SYSBP and VSPOS=SITTING” (see Figure 7).
Figure 7. Compound Where Clause in Define-XML 2.0
These were the most requested features for Define-XML 2.0, and also provide support for ADaM parameter-level metadata.
The Value List mechanism is focused around variables, grouping together the definitions of all the possible values for a given variable. Using the information provided by the new Where Clause mechanism, Value Level Metadata can be displayed in a transposed format that shows how a particular “Slice” of a dataset looks. Rather than seeing how a specific variable looks for each condition, a Slice shows how the whole dataset looks for a given condition.
Slices and Value Lists are simply two different ways of looking at the same underlying metadata. Figure 8 shows Value Level Metadata represented as Value Lists, specifically an example set of Values for the VSORRES and VSORRESU Variables.
Figure 8. Value Level Metadata represented as Value Lists
Viewing this metadata as a Slice presents what the entire Domain looks like for a given condition, as shown in Figure 9.
Figure 9. Value Level Metadata represented as a Slice
In Define-XML 1.0, Controlled Terminology definitions had to have both a coded value and a decode given. In most cases, these were the same as SDTM, which uses controlled lists of values rather than having codes and decodes. Define-XML 2.0 leverages the new enumerated items mechanism in ODM 1.3.1, so Controlled Terminology can be defined simply as a list of allowable values if there is no code/decode relationship. Figure 10 shows how this is used to describe a Severity Code List.
Figure 10. Use of Enumerated Items
Standardized controlled terminology
Define-XML 2.0 allows linking of code lists and even individual codes to the published CDISC and NCI Controlled Terminology standards using Aliases. It also allows codes to be flagged as “extended values” where a sponsor has added additional codes to an extensible code list from those Controlled Terminology standards.
Figure 11 shows how this is used to reference the SCTESTCD codes both at the Code List and the Code List Item level by adding an Alias that points at the standard “C” codes.
Figure 11. Referencing NCI Controlled Terminology
Enhance data types and data type guidance
The latest release of Define-XML introduces a richer set of data types and defines how these should be used in relation to the SAS Char and Num data types that are used in SDTM. This allows for better specification of the expected data and, as a result, the possibility of better checking of the data against those data types.
Define-XML 2.0 introduces the ability to link to a specific page or pages in a document. This is used in several places:
- Formedix Origin can now link to the specific page in an annotated CRF that a variable was collected on, as shown in Figure 12
- Comments can now link to sections in supplemental documents that provide information about variables
- Methods can now link to sections in supplemental documents that describe the derivation of a value, as shown in Figure 13.
Figure 12. Linking to pages in an annotated CRF
Figure 13. Linking to sections in a supplemental document
A similar mechanism can be used to link ADaM variables to their predecessors as shown in Figure 14.
Figure 14. Linking to ADaM predecessor variables
The old Define-XML 1.0 method of specifying derivations has been replaced by an improved implementation taken from the ODM 1.3 standard. This enables a variable that appears in multiple datasets to have a different derivation for each dataset it is present in.
Clarification of how to specify split domains
Define-XML 1.0 was released before split domains were introduced in SDTM 1.2, and so it does not define how the various properties of a domain should be used to specify both the core domain code (e.g., “QS”) and the extended split domain code (e.g., “QSCG”). Define-XML 2.0 now properly defines how this information should be specified.
Clarification of extending Define-XML
Due to ambiguity in Define-XML 1.0, there have been varying opinions on what the core Define-XML model includes, whether other parts of the underlying ODM model can be used, and what extensions to the model can be used for. Define-XML 2.0 states clearly that:
- Anything not defined in the specification is considered an extension, even if it is part of the underlying ODM model
- Use of extensions from the underlying ODM model is not prohibited, however, they have no meaning with regard to the standard; their meaning must be agreed between the sender and receiver of the metadata
- Extensions that duplicate functionality in the core Define-XML 2.0 model are not allowed – this is to ensure all users apply the same mechanism for all functionality defined in the model
- Extensions cannot fundamentally change the meaning of the model, i.e. if all extensions are removed, the metadata essentially must have the same meaning as it did with the extensions present.
These clarifications were added to prevent fragmented implementations of Define-XML and as such, allow applications to be confident of the meaning of a piece of Define-XML metadata.
Defining a model, not a view
The Define-XML 1.0 specification was intended to define the model that describes a set of datasets; however, due to the way it was presented, it was commonly interpreted to define how dataset metadata should be displayed for viewing.
Define-XML 2.0 tries to make it clear that it is the model that is being defined, not how it should be displayed. Define-XML 2.0 includes a stylesheet that demonstrates how the dataset metadata can be displayed, however, this display format is not part of the standard and implementers are free to display the dataset metadata in any way that is suitable for the receiver.
Define-XML 2.0 is based on and largely similar to the Define-XML 1.0 model, however, it is not completely backward-compatible with it. Compatibility has been sacrificed in order to produce a cleaner, less ambiguous model. For example, using a value list on the –TESTCD variable to define the contents of the –ORRES or other variable is no longer permitted; a value list now always describes the variable that references it. To provide Value Level Metadata for multiple variables, simply attach a value list to each. Existing Define-XML 1.0 files will require updating to make them Define-XML 2.0 compliant. This updating process is fairly simple and can be automated.
Due to the ambiguities in Define-XML 1.0, it would not be possible to provide a single upgrade routine that would correctly upgrade all files from all systems, so it is left to system implementers to provide upgrade routines for their implementation of Define-XML 1.0 if they feel this is required.
Define-XML: helping to optimize the end-to-end clinical trial process
Define-XML should not be just thought of as a submission deliverable but as a CDISC model that helps optimize the whole end-to-end clinical trial process. It can be used to establish dataset libraries that promote study-to-study re-use, as well as driving efficiencies through expedited study set-up and streamlined dataset conversions in study conduct and analysis. Using Define-XML at the start of a new study design makes it possible to machine-validate dataset deliverables guaranteeing that data quality and submission compliance are built-in with less reliance on downstream validation.
Define-XML 2.0 provides a substantially enhanced and more robust mechanism for describing dataset metadata by allowing full specification of value and parameter level metadata for any variable, and improves interoperability and machine readability by removing ambiguity. This will lead to increased opportunities for automation and as such, drive further efficiencies in the study process.
Want to find out more about Define-XML? Click the button below to download our guide to the 6 dos and don'ts of Define-XML.