All you need to know about SDTM.
The SDTM standard is a CDISC standard, and it means the Study Data Tabulation Model. If you want to know more about CDISC standards you can read our introduction to CDISC standards, or you can read about how CDISC standards fit into the drug development process.
What is SDTM in clinical trials?
CDISC SDTM is the name of the model (or framework) used for organizing data collected in human and animal clinical trials. The model was developed by CDISC – the Clinical Data Interchange Standards Consortium – a standards development organization for dealing with medical research data.
Once you’ve gathered all the necessary data for your trial, it must be converted into a specific table format optimized for review, to be accepted by the FDA. The study data tabulation model – SDTM – is the name of that structure.
Why is SDTM required?
SDTM is there to give regulatory reviewers – namely the FDA – a clear description of the structure, attributes, and contents of each dataset, and the variables submitted as part of your clinical trial. Before CDISC SDTM was enforced, there were different domain names for each domain, different variables, and different variable names. Nothing was standard. As a result, reviewers spent huge amounts of time trying to get the data into a standard format – figuring out the domain names and names of the variables in each dataset – rather than reviewing the data itself. This ultimately prolonged the clinical trial process.
You can read more about what other CDISC standards are required for regulatory submissions.
The introduction of CDISC SDTM standardized this end to end required data structure. Now we have standard domain names and a standard structure for each domain. There are standard variables and standard names for SDTM datasets. It means that each bit of data collected can now be easily identified. Regulators can review the data much quicker, making the process far more efficient. Plus, it makes all of your studies consistent, because they’re all in the same standard format.
Formalizing the structure of the domains has also led to the development of conformance rules, by both the CDISC SDTM team and regulatory bodies such as FDA and PDMA. You can find out about the FDA’s clinical trial guidance documents. These rules are programmed into software validation tools to automate the checking of SDTM clinical datasets, against the conformance rule. |
What is the latest version of SDTM?
CDISC build the SDTM standards from two important models:
The core model provides a standardized set of variables, assembled into “classes”, which are refined and built into variable collections for specific uses cases (SDTM-IG domains), e.g. Vital Signs observations, Medical History reporting, etc. The SDTM core model also supports the non-human trial standard SEND-IG.
The latest versions being SDTM v2.0 for SDTM-IG v3.4 and SDTM-IG Medical Devices (SDTMIG-MD) v1.1. CDISC is always developing new domains. It’s important to regularly check the CDISC website for the latest updates.
Read more about planned updates to standards.
What are SDTM domains?
In order to be able to correctly implement the SDTM, it’s important to have a good understanding of its domains and how they’re structured.
SDTM is based on the observations that are collected from subjects taking part in a clinical trial. An observation is a piece of data collected during a study. For example “Subject 12 had a mild headache starting on study day 5”.
Most observations collected should be classified into one of the general observation classes (also known as data classes). These are:
- Interventions.
- Events.
- Findings.
- Findings about Events or Interventions.
“Special purpose” class covers important domains such as Demographics. “Trial design” class documents the structure of the study and “Relationships” is used to link observations together.
Each of the general observation classes has associated domains. A domain is simply a group of observations that share a common topic, such as Medical History or Vital Signs. Within each general class, sub-categories provide a further grouping. The general classes provide a framework for classifying data not covered by a specific domain, the sub-categories provide a more refined collection of variables for custom domains.
But we’ll get into the various SDTM domains after a bit more on variables.
The example below shows the observations “Nausea”, “Headache” and “Dizziness”. These are part of the Adverse Events domain. And the Adverse Events domain is in the Events observation class.
Domains are prefixed by a two-character domain code that’s used to map a variable to a domain. For example, the domain ‘Medical History’ is prefixed by the domain code MH. The variable –SEQ contains two hyphens that indicate a domain code is required. So the example becomes MHSEQ. Another example is the variable –TESTCD in the Vital Signs domain becomes VSTESTCD. Each domain has a dataset which is a collection of related data. SDTM datasets are described by a set of named variables. And each of these named variables is categorized by their role.
What are SDTM variable roles?
A role category conveys a particular type of information about a variable. And variables can have just one role.
Variable roles have 5 categories:
- Identifier variables allow the study, subject, domain and sequence number of a record to be identified.
- Topic variables describe the focus of an observation.
- Timing variables describe the date, time, and duration of an observation.
- Qualifier variables describe the results of an observation with text or numeric values.
- Rule variables describe algorithms or methods for calculations or looping conditions and are mainly used for the Trial Design domain.
In the example below, variable roles are shown in the top row of the table. The color-coded areas on the second row show the variables that correspond to the variable roles.
Qualifier variables are further categorized as follows:
- Grouping qualifiers group observations together.
- Result qualifiers describe the result for a finding.
- Synonym qualifiers contain another name for the observation.
- Record qualifiers define the supplementary attributes of an observation.
- Variable qualifiers describe the value of an observation.
What are SDTM core variables?
Core variables are a measure of compliance with the specific SDTM-IG domain model. The value of a core variable shows the importance of the variable to the overall domain structure.
Variables are divided into 3 categories:
- Required variables are needed to identify a data record, e.g STUDYID, and USUBJID. Or, they are needed to make a record easily understood, e.g TERM and TEST. They must always be included in the dataset and cannot be null.
- Expected variables are needed to make a record useful within a specific domain. They must always be included in the dataset but they can be null for some records. If no data is collected, a comment must be included to explain why.
- Permissible variables must be included in the dataset if results are collected or derived, but they can be left null or blank.
Variables from the parent class can also be inserted into the domain if required.
What SDTM domains are there?
Currently, there’s a large collection of domains, and CDISC is constantly developing more. These consist of names, with abbreviations. For example, Demographics (DM), Subject Visits (SV), Adverse Events (AE), Lab Results (LB), and Vital Signs (VS) to name a few. Each SDTM domain usually consists of a file, named after the domain (e.g AE.xpt).
Most observations that are collected fit into one of the general observation classes:
- Interventions datasets capture treatments and procedures that are given to a subject as specified by the protocol. Examples are Exposure (EX), Concomitant Medications (CM), and Substance Use (SU), e.g. tobacco, caffeine, and alcohol.
- Events datasets capture planned protocol milestones such as randomization and study completion. Unplanned incidents that occur before, or during a study are also captured. Examples are Adverse Events (AE), Disposition (DS), and Medical History (MH).
- Findings capture observations that address specific questions such as observations made during physical examinations, laboratory tests, ECG testing, etc. Findings About is included and captures data related to the Interventions and Events classes. Examples are Vital Signs (VS), Physical Exam (PE), Labs (LB), and Subject Characteristics (SC).
- Findings about Events and Interventions capture more details about e.g. an Adverse Event.
In addition to the general observation classes, there are 4 special case classes:
- Special Purpose datasets can be Demographics (DM), Comments (CO), Subject Elements (SE), and Subject Visits (SV).
- Trial Design has datasets that describe the design of a trial. Examples are Trial Summary (TS), Trial Arms (TA), and Trial Visits (TV).
- Relationship datasets represent the relationships between datasets and records.
- Study Reference datasets that provide structures for representing study-specific terminology used in subject data. Examples include Device Identifiers (DI) and Non-host Organism Identifiers (OI).
How to implement SDTM
The following section explains how to map source datasets to SDTM domains, considerations, and other necessary deliverables.
How to do an SDTM mapping
The SDTM-IG extends and refines the SDTM core model with specific domain implementations, business rules, assumptions, and examples. It should be used along with the relevant version of the SDTM. So make sure you have the correct versions of both of these documents.
Here are some basic steps to help keep you on the right track:
- Determine which SDTM domains to create.
- Compare the SDTM metadata to the SDTM metadata and map directly where possible.
- Map the rest of the source datasets to SDTM domains.
- Map variables in the source datasets to the variables in the SDTM domains.
- Decide whether custom domains and SUPPQUAL domains need to be created.
- Perform the data conversion – there are various mapping tools you can use to do this.
- Validate the SDTM datasets.
- Generate and validate Define.xml.
There are a number of different types of SDTM mappings you can do for steps 2, 3, and 4 above.
- Directly map to a domain variable without making any changes.
- Rename the source variable name and label without the need to make any other changes.
- Map values to standard units or terminology.
- Change the format of a source variable.
- Combine two or more source variables to make a single domain variable.
- Split a single source variable into two or more domain variables.
- Derive a domain variable from one or more source variables using logic, computation, algorithm or decoding.
And remember, you might need to use more than one type of mapping to create an SDTM variable.
SDTM mapping can be a complicated task, so it’s important to plan everything out in advance. By creating a mapping specification, you’ll know where data came from, how it came and where it’s to go to. There are various mapping scenarios you can use. It’s important to use the SDTM model and Implementation Guide during this process. And by using standard process and tools, you’ll maximize your chances of success.
SDTM mapping specifications should be developed at the same time as annotating CRFs. The mapping specification tells the user how to do a mapping. An annotated CRF is a visual representation of a mapping showing how the source data relates to the SDTM data.
If this sounds like a lot to take on, there’s some neat technology that can help to automate this process. See how SDTM conversion can be much quicker and easier with our SDTM mapping tools and SDTM automation. To find out more, download our free guide to SDTM mapping, typical scenarios and best practices.
SDTM annotated CRFs
A Blank CRF is a collection of pages that is a mandatory deliverable for submission to the FDA. The file is always called blankcrf.pdf. Each question on a form must be manually annotated to show the origin of variables. It links the fields on the form with the variables in the dataset (the source of the data). Annotations help the reviewer find where variables come from in the submitted SDTM datasets. Find out more about the benefits of automating annotated CRFs.
What is SDTM controlled terminology?
SDTM has standard codelists for particular variables with allowable values for these variables. These values are required for submission to the FDA and PMDA in CDISC complaint SDTM datasets. You should always use the most up to date version of controlled terminology when you start to map your SDTM datasets. Find out more about using NCI controlled terminology for standardizing data.
CDISC and NCI Enterprise Vocabulary Services partnered up to develop a standard controlled terminology. However, the CDISC / NCI controlled terms for Lab tests are not unique. They require additional information for differentiation.
Other medical dictionaries can be used, such as MedDRA and WHOdrug.
SDTM datasets and LOINC codes
Over the last 25 years, the LOINC project has provided a standard classification for health measurements. Most SDTM programmers will encounter “LOINC Code” information in Lab data. But the classification system has been extended to cover other measurements such as ECG. So what is LOINC? LOINC is an internationally recognized classification system and is often requested in regulatory data submissions to provide context to clinical measurement data, e.g. Labs and ECG. Read more here about LOINC codes and SDTM.
SDTM Define-XML
The FDA requires a Define.xml file to be included for all drug submissions. It describes the content and structure of data collected during the study. The Define.xml file makes the review of study data quicker and easier for the FDA. You can read our blog about using the Define XML standard for dataset design. View our free guide on how to overcome 6 common difficulties complying with Define-XML.
The latest version of the standard is Define 2.1. It describes the content and structure of data collected during the study which are domains, variables, methods, controlled terminology, and supporting documents. One of the things that crop up often is how to handle data coming in from multiple sources. You can read our blog on how to describe multiple origins for a value in Define-XML 2.0.
Creating a define.xml requires a lot of programming expertise. It takes a lot of time. That’s why it’s so important to make the process as quick and easy as possible.
How Formedix can help…
There’s a lot to get your head around! But, did you know we’re on the CDISC XML technical team? We were involved in creating the CDISC ODM and Define models. We’ve been in the business for over 20 years. So our CDISC knowledge isn’t too bad! Learn more about how we help you with CDISC Compliance. And, we’re well placed to give real-world, practical CDISC training.
Our clinical metadata repository and clinical trial automation software support all versions of CDISC standards and SDTM automation. We keep our platform updated in line with CDISC and NCI standards. That way your study designs and datasets are always regulatory compliant.
Why not download our free guide on how to overcome SDTM implementation problems?
Author's note: this blog post was originally published in July 2020 and has been updated for accuracy and comprehensiveness.