3  SDTM AE Derivation: A Dplyr Guide with Scenario Explanations

This document demonstrates the derivation of an SDTM (Study Data Tabulation Model) Adverse Events (AE) domain using dplyr in R.

It aims to illustrate various scenarios and the corresponding dplyr code required to transform raw adverse event data into a compliant SDTM AE dataset.Each section will focus on specific SDTM AE variables, providing the derivation logic and dplyr code.

Simulate Raw Adverse Event DataFirst, let’s simulate some raw adverse event data

This dataset will serve as our source for deriving the SDTM AE variables. It includes various data quality issues and scenarios that we will address during the derivation process.#| label: simulate_raw_data

3.0.1 Derivation of Core AE Variables

This section details the derivation of fundamental SDTM AE variables.

3.0.2 STUDYID, DOMAIN, USUBJID, AESEQ

These are foundational variables for any SDTM domain.

3.0.3 AETERM and AEDECOD

AETERM is the verbatim reported term, and AEDECOD is the standardized decoded term (e.g., using MedDRA). For this example, AEDECOD will be a simplified mapping.

Note

Scenario: MedDRA Coding In a real-world scenario, AEDECOD (and other MedDRA variables like AELLT, AEHLT, AEHLGT, AESOC, AEBODSYS) would be derived from a MedDRA dictionary lookup based on AETERM. This typically involves specialized coding software or processes. For this example, we use a simplified case_when statement.

3.0.3.1 Derivation of Date Variables

Date variables are crucial for timing and sequence. SDTM dates are typically in ISO 8601 format (YYYY-MM-DD).

3.0.3.2 AESTDTC and AEENDTC

These are the start and end date/times of the adverse event.

Tip

Scenario: Partial Dates If raw data contains partial dates (e.g., “2023-01” or “2023”), you would need more sophisticated parsing logic, often using ymd_hms, ymd, yq, etc., and then imputing missing parts based on SDTMIG rules (e.g., ‘01’ for day/month if unknown). The format function will handle NA values gracefully.

3.0.3.3 AEONGO

AEONGO indicates if the adverse event is ongoing at the time of data cutoff or last assessment.

Note

Scenario: Conflicting Information If AE_ONGOING_FLAG is “N” but AE_END_DATE_RAW is missing, the usual rule is that AEONGO should be “Y”. Prioritize explicit ongoing flags if they exist. The case_when logic above handles this by checking AE_ONGOING_FLAG == “Y” first.

3.0.4 Derivation of Seriousness Variables

Seriousness variables indicate the severity and impact of the adverse event.

3.0.4.1 AESER, AESCONG, AESDISAB, AESDTH, AESHLT, AESHOSP, AESLIFE

AESER is derived based on any of the seriousness criteria being met. The individual criteria flags (AESCONG, AESDISAB, etc.) are direct mappings.

Note

Scenario: Multiple Criteria If an event meets multiple seriousness criteria, all applicable flags should be ‘Y’, and AESER will also be ‘Y’.

Tip

Scenario: Missing Seriousness Data If AE_SERIOUSNESS_CRITERIA is missing, all AESxxx flags and AESER should be NA. The if_else function handles NA gracefully, resulting in N if the condition is FALSE and NA if the input is NA. For AESER, if all inputs are NA, AESER will also become N (due to TRUE ~ “N”). If NA is desired for AESER when all criteria are NA, a more explicit case_when for AESER might be needed.

3.0.5 Derivation of Outcome and Relatedness Variables

3.0.6 AEOUT

AEOUT describes the outcome of the adverse event.

Note

Scenario: Inferring Outcome If the raw outcome is missing, AEOUT can often be inferred from AEONGO and AEENDTC. For example, if AEONGO is ‘Y’, AEOUT is typically ‘NOT RESOLVED’. If AEONGO is ‘N’ and AEENDTC is present, AEOUT is typically ‘RESOLVED’.

3.0.7 AEREL

AEREL describes the causality assessment of the adverse event to the study treatment.

Tip

Scenario: Multiple Causality Assessments If multiple causality assessments exist (e.g., by investigator and sponsor), AEREL typically reflects the investigator’s assessment. Other assessments might be stored in a custom variable or a supplemental qualifier.

3.0.8 Derivation of Severity and Toxicity Grade

3.0.8.1 AESEV

AESEV describes the severity of the adverse event.

3.0.9 AETOXGR (Simulated)

3.0.9.1 AETOXGR is the toxicity grade,

it is often based on a standardized grading scale (e.g., CTCAE). For this example, we’ll simulate it based on AESEV.

Note

Scenario: CTCAE Grading In clinical trials, AETOXGR is typically derived directly from the CTCAE (Common Terminology Criteria for Adverse Events) grading, which is a more granular and specific assessment than general severity. The mapping here is a simplification.

3.0.10 Derivation of Latency

3.0.10.1 AELAT

AELAT represents the latency of the adverse event, usually the time from the start of study drug to the start of the adverse event.

Tip

Scenario: Missing Dates for Latency If AE_START_DATE_OBJ or STUDY_DRUG_START_DATE_OBJ are missing, AELAT will correctly be NA.

Note

Scenario: Different Reference Dates Latency might be calculated from different reference dates (e.g., first dose of any study treatment, first dose of specific study drug, randomization date). Ensure the correct reference date is used as per study specifications.

3.0.10.2 Final SDTM AE Dataset

Now, let’s select and order the final SDTM AE variables according to typical SDTM specifications.