<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=905310923417895&amp;ev=PageView&amp;noscript=1">

Clinical data standards vs legacy data

Dec 14, 2022 5:19:58 PM

Clinical data standards vs legacy data


Kevin Burges

Find me on

Data standards are a feature of many regulated industries, and the pharmaceutical industry is no exception. But we didn’t always have standards to help us collect, analyze and submit data. Even today, some organizations are not utilizing industry or company standards to ensure their study data is collected the same way every time - despite the many benefits of standardization.

Why are some organizations slow to adopt change?

Arguably, we’re a risk-adverse industry - one that’s constantly adapting to new regulations and often subject to long procurement processes. Change and innovation represent potential risk. Something that many companies aren’t willing to take on. This can lead to a situation where it never feels like the right time to put standards in place.

Many organizations find themselves having to decide between what is best strategically and what is best financially; those without a standards management team must spend large amounts of time and money putting one in place. Sometimes, standardization doesn’t seem like the best approach for a particular study. For example, one that requires content that is not currently part of the standards. In this case, the study may be delayed by the standardization process.


CDISC standards on laptop

Given the time pressure for getting studies up and running, some organizations or even some study teams are tempted to bypass the standardization process and instead do it their own way.

There’s no doubt that data standards come with their own set of headaches. Particularly for organizations entrenched in traditional processes. But it’s those very standards that have led to the accelerated discovery of new drugs and the delivery of life enhancing or lifesaving treatments to market.

So, what happens when non-standardized data meets clinical data standards?

Clinical data standards

Before the introduction of CDISC standards, there was no guidance to tell organizations how to collect and format data. It was a ‘free-for-all’, with every company free to create their own study questions and format collected data however they liked.


common-file-text-edit Example

Take the question ‘Is the patient pregnant?’

This is a common question in clinical trials, but organizations could choose to ask it and collect the answers how they saw fit. For example, the answers could be:

  1. ‘Yes’ or ‘No’
  2. ‘Yes’, ‘No’ or ‘Unknown’
  3. ‘Y’ or ‘N’
  4. ‘1’ or ‘0’

As a result, organizations were doing the same thing but in different ways. This meant there was no easy way for the US Food and Drug Administration (FDA) to quickly analyze the collected data. 

Clinical data standards were developed to ensure that clinical trials were run in a standardized way, from study design and data collection through to analysis.

If you want to find out more about clinical data standards, why not read our blog on CDISC standards used in the clinical research process?  

Collecting data in CDISC format

Without industry-wide clinical data standards to guide them, many organizations also didn’t have their own internal organizational data standards. That meant that even within a single organization, data might be collected in a completely different way for each trial. Using the example above, if different teams were designing different studies, one team might use ‘Yes’ or ‘No’ and another team might use ‘Y’ or ‘N’.

CDISC NCI terminology standards were introduced to tackle this problem by standardizing the various allowable responses to questions. The standards specify exactly what the values within the columns should say for every submission. For example, they could specify that studies should use only ‘Y’, ‘N’, or ‘U’ for unknown.



Legacy data mapping

Why might organizations not be using clinical data standards?

  1. They might be using old or ‘legacy’ data that was collected before the standards existed. This could be data that’s been collected from a previous trial, or data that is no longer ‘active’ but still has a purpose in modern research. 

  2. They might be using data that’s been collected since the inception of data standards, but wasn’t collected in line with those standards are therefore needs to be mapped appropriately. This also applies to data that’s still being collected. For example, if you’ve started a trial using a particular format or specification, you’re unlikely to go back and change it part-way through just because new standards have been introduced.

  3. A study might be pulling in data from some other external system. For example, via a lab system or a contract research organization (CRO) that doesn’t use CDISC formats. Increasingly, as more and more studies are outsourced to CROs, pharmaceutical companies are facing the challenge of producing very detailed specifications that address how data should be captured and submitted.

  4. While it isn’t so common nowadays because CDISC standards are so well established, there are still organizations that simply want to continue doing things the way that they’ve always done them. They might copy and paste from a previous spreadsheet at the start of a new study, because they’re used to a familiar format and version. There’s a big problem with this however: over time, they’re falling further and further behind current CDISC standards. They’ll be at a higher risk of errors, inconsistencies and misalignment with SDTM for every study. Ultimately, they’ll face the challenge of unpicking the format and mappings, and having to resolve terminology inconsistencies when it comes to standardizing down the line.


common-file-text-edit Note

Potentially both the content and format of collected data could be non-standard. For example, if organizations aren’t using content standards like SDTM or CDASH. Or if they’re not using file format standards such as XPT and Define-XML for dataset metadata, which are mandated by the FDA.

The problem with non-standardized data

If an organization does find themselves in a situation where important data that they’ve already collected is non-standard, they might face some problems when trying to get that data submission-ready.

Problems with mapping terminology

If you’re not collecting data using consistent terminologies that are compatible with CDISC and NCI standards, then you might have trouble mapping the data to those standards.

In another example, an organization may have collected data for a question that has 5 possible answer options. But in the standard terminology, only 4 responses are acceptable. The organization will need to consider how this data should be mapped. Is there a clear lineage between the non-standardized and standardized responses? Or are they incompatible? This situation might require the organization to get more information to work out what standardized term it should map to.

Problems with data structure

Quite simply, the structure of the collected non-standardized data might not match with the SDTM structure. This problem is usually fixable, but it will require work to manipulate the structure of the data to fit the standardized rules. For example, you may have to do horizontal-to-vertical transformations; transposing collected data from wide form to lean form, as required by SDTM. Or you might have to combine data from lots of different sources in order to get to the submission structure. Read more about typical mapping scenarios in our blog the SDTM mapping process simplified.

Inconsistencies across studies

Within an organization that doesn’t adhere to internal data standards, comparison or merging of data can be difficult. There’s a risk that the data collection process is inconsistent. Therefore, mistakes can be made, and you may end up collecting data that is incomplete or unusable. In the worst case, you might discover that you haven’t collected all the data you need, and that you now can’t get that information. To rerun the trial would mean more resources, time and money. This is why it’s always better to base study build on pre-agreed, pre-standardized metadata.

Legacy-data-blog-infographic (1)

Best practice for legacy data conversion

If an organization does find themselves in a situation where they must convert non-standard data into a standardized format, it’s a good idea to consider these things:

  • Plan the process in advance - Take extra steps to ensure the integrity of the data isn’t compromised.
  • Expect inconsistencies – Ensure your timeline allows time for you to analyze and resolve any discrepancies in the data.
  • Be transparent – As part of the submission, include appropriate explanations of the process used, conclusions, inconsistencies, and any special circumstances.
  • Don’t overload the submission – Often, empty files or folders can find their way into the final submission. This might leave the submission open to questioning.

The benefits of CDISC data standards

According to CDISC, the benefits of implementing CDISC data standards include:

  • Fostered efficiency
  • Complete traceability
  • Enhanced innovation
  • Improved data quality
  • Facilitated data sharing
  • Reduced costs
  • Increased predictability
  • Streamlined processes

By implementing CDISC standards, you can reduce the chance of collecting poor quality data and instead create consistency across studies. This makes SDTM mapping and any merging of studies a lot simpler, quicker and more cost-effective. It also allows you to do rapid study builds, reducing time and effort on the build itself because you’re reusing your pre-agreed standards.

Need help with clinical data standards?

If you need help mapping non-standard data to SDTM, or are looking to implement manageable, reusable standards, you’ve come to the right place!

ryze allows you to make use of CDISC standards to develop your own organizational standards. When you store and manage your CDISC compliant organizational standards in our clinical metadata repository, they’re ready to use again and again. Your standardized content will be correct and consistent across all your studies, and there’s less chance of manual errors.

If you want to find out more about CDISC standards, why not download our free guide to the CDISC standards required for regulatory submission?  

Get your free guide to to CDISC standards




About the author



Kevin Burges

Head of Product Management | Formedix


Kevin Burges has been working at Formedix for over 20 years. Over time his role has changed from Developer to Senior Developer, to Technical Director and now Head of Product Management.

Kevin has a strong interest in metadata management and automation as an engine for streamlining clinical trials, and he works closely with customers to evolve the ryze platform with their needs in mind. He has also worked closely with CDISC since 2000, and has won awards for outstanding achievement towards advancing CDISC standards.

Nowadays, he’s part of the Data Exchange Standards team, which includes ODM, Define XML and Dataset XML.


Similar blogs you might like...