<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=905310923417895&amp;ev=PageView&amp;noscript=1">

Using Define-XML for Dataset Design

Apr 14, 2020 12:00:00 AM



Kevin Burges

Find me on

In the past sponsors submitting to FDA were required to submit a PDF describing their submission datasets.  As we all know PDF is great for viewing on screen or printing, but the information inside it can’t be interpreted by a computer in any meaningful way. Enter CDISC’s Define-XML model…

Define-XML standardizes how to describe datasets in a machine-readable manner. It can be used to define any tabular dataset structure, though it’s primarily used to describe SDTM, ADaM and SEND datasets for regulatory submission.  FDA now requires that all submissions use Define-XML to describe their datasets, so it’s something we all need to be familiar with.




If you’re currently starting your Define-XML at the end of your study, you’re missing many of the benefits it can bring to your end to end study process. This post provides an overview of how it can be used throughout your process to drive efficiencies. We’ll not be diving into how it actually works.


Define SDTM, ADaM and SEND datasets upfront

If you don’t know where you’re going, how do you know if you’ve arrived at the right place? The answer is, of course, you don’t. However, this is how many organizations still approach clinical trials. They define their CRFs, collect data, then think about converting to SDTM datasets at the end. There are two problems with this process.

  • They don’t know when designing the CRFs if all relevant SDTM data is being collected
  • They don’t have a definition for what they want to submit, so can’t verify if the submission data is what they intended

This can lead to incomplete data, protocol amendments, complex mapping to standardized CDISC datasets, increased QA, and ultimately an elongated study process.

The solution is to define your study, end to end, right at the start. This gives confidence before you start even collecting your data, that your CRFs are correct, and can easily be converted into submission datasets that will satisfy the regulator.


Download our free guide to the 6 dos and don'ts of Define-XML


The first step in this is to define your submission datasets upfront, using Define-XML. The right dataset design software will help you rapidly define SDTM, SEND, and ADaM datasets and export their definition as Define-XML.

You can read our blog on How our visual define.xml editor gives you faster define!


Check compliance of Define-XML dataset definitions to SDTM, SEND and ADaM

You can verify the compliance of your submission dataset designs to CDISC standards before collecting any data, by running standard validation tools. Once you have the datasets defined, the next step is to define the mappings to them.


Mapping from EDC to submission datasets

Some EDC systems support exporting data in ODM format that matches your study design, however, most people use tools that are aimed at working with tabular datasets. Datasets are still by far the most popular type of data export from an EDC system. If you have created your study from an ODM study specification then the datasets will bear some resemblance to the ODM, but they’re not the same. When mapping your collected data to SDTM or SEND, you need to know what the datasets coming from your EDC system will look like.

And if CRFs are designed using CDASH, mapping should be a breeze!

Man at deskA clever study design environment can predict these EDC export datasets and generate a Define-XML describing them. You can define the mappings to your submission datasets, before collecting any data. 

Find out more here >

Benefits of standardization and re-use

Take a look at our previous blog on using ODM and CDASH for CRF design for details of how you can dramatically decrease your study setup time by standardizing and re-using your ODM and Define-XML study designs in a CDISC clinical metadata repository.


Verify CRO datasets against your Define-XML specification

If you’re using a CRO to generate your submission datasets, how do you know what they’ve delivered is correct? If you have defined your datasets upfront using Define-XML, you can automatically verify whether the delivered data conforms to your original specification. This greatly reduces the amount of QA resources required and will surface any problems much faster. No more having to manually check data against an Excel or PDF specification.


View as PDF or HTML

Define-XML is great for computers, but it’s not something most people want to look at. Thankfully it can easily be converted into PDF or HTML, making it simple for anyone to understand.


Working with legacy data

Organizations often have lots of legacy data in XPT datasets for which they have no machine-readable metadata. There may not even be something akin to an Excel description of the data, or if there is it may be incomplete or incorrect. To help make better use of this data, it’s possible to generate Define-XML metadata directly from the XPT datasets. This makes it easy to understand the content of the datasets and make appropriate use of them.

As you can see, Define.XML isn’t just something you should be using because you have to. It can bring real benefits to your study process.

Want to learn more about Define.XML? Click the button below to download our free guide: '6 dos and don'ts of Define-XML'.

Get your free Define-XML guide

Similar blogs you might like...