<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=905310923417895&amp;ev=PageView&amp;noscript=1">

All you need to know about NCI, CDISC and SDTM controlled terminology

Oct 11, 2022 3:33:51 PM



Ed Chappell

Find me on

The use of controlled terminology (CT) is vital to successful clinical study build, however, understanding the concept and its various subsets can be a challenge. 

In clinical trials, CT can be referred to in a number of different ways, such as:

  •  NCI controlled terminology
  • CDISC controlled terminology
  • SDTM controlled terminology

 While these terms are related, they don’t all mean exactly the same thing.

In this blog, we’ll explore each of these terms and examine the role of controlled terminology in speeding up and simplifying study build. We’ll also introduce some other significant controlled terminology resources of equal importance to the CDISC NCI project.

Let’s start with the basics.

What is controlled terminology?

The SDTM standard for clinical data submission came about in the early 2000s as a joint project between the Clinical Data Interchange Standards Consortium (CDISC) and the US Food and Drug Administration (FDA). The standard was introduced to give some structure to the clinical data they were receiving.

The standard laid out the specific column headings that should be used to present collected data, in order to make it easier for submissions to be reviewed and approved.

However, CDISC quickly realised they needed to specify not only the column headings, but also the way the data itself should be presented.

‘The yes and no problem’

To explain this, let’s use the example question ‘Did the subject have a headache?’

This question could be used to determine the likelihood of this adverse effect occurring; or to compare the number of subjects that developed a headache versus the total number of subjects in the study.

It sounds like an easy task. But, before the SDTM standard came into effect, there was no control over how a study collected the results of this question. For example, possible answers could be:

  • ‘Yes’ or ‘No’
  • ‘Y’ or ‘N’ (or ‘y’ or ‘n’!)
  • ‘True’ or ‘False’
  • ‘1’ or ‘0’

That’s several permutations for one simple yes or no question and, for the reviewer, it would be difficult to visualize the results.

To resolve this problem, industry regulators needed to specify exactly what the values within the columns should say for every submission. For example, they could specify that studies should use only ‘Y’, ‘N’, or ‘UNK’ for unknown.

These approved lists of controlled terminology are called code lists.

The advantage of code lists is that when a list has been specified, you can use that piece of metadata again and again on other questions. So, there’s no need for a unique code list for every column. But more importantly, data is standardized in a way that can be easily reviewed and analyzed by the regulator. This is the essence of what controlled terminology aims to achieve.

Recognizing CT as essential to ensuring that CDISC data standards could be applied consistently across all studies, CDISC partnered with the US National Cancer Institute to develop the CDISC NCI controlled terminology project.

The NCI host the control terminology code lists on their website and release new versions quarterly. Read more about using NCI controlled terminology for standardizing data.

 So, when people talk about NCI controlled terminology, or CDISC controlled terminology, they’re talking about the same project. Because the project was started to support the SDTM standard, CDISC NCI controlled terminology is commonly referred to as ‘SDTM controlled terminology’.

 However, this term requires its own examination.

What is SDTM controlled terminology?

Put simply, it refers to any controlled terminology used in the SDTM standard. However, it also refers to terminology outside of the CDISC NCI project.

There's a couple of scenarios where this might apply.

Scenario 1: Sponsor driven controlled terminology

Sponsor driven controlled terminology is data that is required by the sponsor, but might not be covered by NCI controlled terminology or is dependent on the particular study. For example, in the Adverse Events SDTM dataset, there are some variables (AEBODSYS, AECAT, AEREL, AETOX etc.) that are required to have a terminology attached, but the content is determined by the sponsor organization.

Scenario 2: New terms

This refers to any new terms not included within the existing code lists. For example, you might discover terms that aren’t on existing terminologies but that are relevant and important to the specific study you’re running. Any new suggested terms are required to be submitted to the CDISC NCI project for review and inclusion. They’ll check the term is valid and unique (i.e. not a synonym of an existing term).

Some NCI controlled terminologies are extensible i.e. you can add your own terms in addition to the existing NCI defined ones. For example, 'Anatomical Location' can be extended to include study-specific information. Other NCI controlled terminologies are non-extensible. For example, 'Action Taken with Study Treatment'.

In summary, SDTM controlled terminology mostly refers to the same CDISC NCI CT project, but it isn’t all of it. As we’ve seen, there can also be terms or entire code lists that are outside the scope of the project.


SDTM controlled terminology


common-file-text-edit Note

CDISC produce updates to the NCI list each quarter. These will include any new custom domains or terms that have been assessed and deemed useful for inclusion. It’s best practice to refer to the very latest version of the list for your submission, because the content is more controlled and it makes it easier for the FDA to analyze your data. However, it’s important to know that you don’t have to continuously update any submissions to use the very latest terminology. You can in fact align to an earlier release.

Other important controlled terminology

There are other controlled terminology projects separate to the CDISC NCI project, that are equally as important.

Here are three key projects to be aware of:

  • WhoDrug is ‘the most comprehensive and actively used drug reference dictionary in the world’. Through an indexing structure, it groups drugs according to their effects and helps to classify each type of drug and what it does. It’s a reliable resource of controlled terminology that can be used to identify drug names and ‘evaluate medicinal product information, including active ingredients and products’ anatomical and therapeutic classifications, from nearly 150 countries.’

  • MedDRA provides ‘highly specific standardized medical terminology to facilitate sharing of regulatory information internationally for medical products used by humans.’ Essentially, it records the effects of medicines in a standardized way. For example, different studies might refer to ‘stomach ache’ or ‘nausea’, or any number of terms that mean the same thing. However, there’s a specific medical term that should be used for all symptoms. MedDRA classifies these terms with the intention of creating consistency in clinical trials globally.

  • LOINC (Logical Observation Identifiers Names and Codes) is ‘the international standard for identifying health measurements, observations, and documents.’ A LOINC code is a unique identifier for every lab test and other measurements, for example ECG vital signs measurements. It classifies which test was done and ties it to the results.

These are just three examples of control terminology projects that exist outside of the CDISC NCI project and are essential for analyzing adverse event profiles, medical histories, lab test codes and the concomitant medication profiles.


Getting controlled terminology right (the first time!)

Study data that’s collected without controlled terminology in mind can require a lot of additional work to standardize it down the line. This can hold up the submission process and cost you time and money that’s better spent elsewhere. Therefore, it’s important to make sure data is captured in accordance with controlled terminology right from the start of the study design. This means creating case report forms that are set up to allow only the acceptable data to be collected.

 For example, when it comes to recording the location of an adverse event, such as ulcers or skin lesions on the hand, it’s important to be as specific as possible. It must be aligned to the NCI terminology for anatomical locations, which is very extensive. The location of the adverse event can be narrowed down to individual digits or parts of the finger. In this case, instead of a free text field, users could be required to select an acceptable term from a drop-down list. This will make a huge difference to the quality and cleanliness of the SDTM data down the line, which in turn means that analysis is easier and you get clearer, deeper insights.

The moral: design your CRFs with controlled terminology in mind, to avoid the unnecessary programming headache later on. See our blog on everything you need to know about CRFs to learn the dos and don’ts of CRF design, or download our guide to common CRF problems and how to fix them at the button below.

Get your free guide to CRF design




About the author



Ed Chappell

Solutions Consultant | Formedix


Ed Chappell has been working as a Solutions Consultant with Formedix for over 15 years, and has 22 years’ experience in data programming. He authored and presents our training courses for SEND, SDTM, Define-XML, ODM-XML, Define-XML and Dataset-XML.

Ed was heavily involved in the development of our ryze dataset mapper, and works closely with customers on SDTM dataset mapping. As an expert in clinical data programming, Ed also supports customers with Interim Analysis (IA) SDTM and FDA SDTM clinical trial submissions.



Similar blogs you might like...