4  Data analysis and coding

#Chapter 2: Data Analysis and Coding

This chapter outlines the analytical approaches and coding considerations specific to Occurrence Data (OCCDS). For a programmer, understanding these principles is crucial for correctly deriving and summarizing occurrence data.

4.1 Statistical Analysis

The primary analytical method for OCCDS datasets revolves around summarizing the number of subjects with at least one occurrence of a particular event or term. Unlike the Basic Data Structure (BDS), where analyses often involve statistical measures of AVAL (analysis value) or AVALC (analysis value character), OCCDS focuses on counts of subjects.

Key Programmer Considerations:

Subject-Based Counting: SAS programs (or other programming languages) will primarily implement logic to count distinct subjects (USUBJID) who experienced a specific occurrence. This often translates to COUNT(DISTINCT USUBJID) in SQL-like or SAS PROC FREQ (with N or COUNT options) approaches.

Denominator Derivation (Crucial Point): A common pitfall is deriving denominators directly from the OCCDS dataset. Denominators for percentages (e.g., % of subjects with an AE) should almost always come from the ADaM Subject-Level Analysis Dataset (ADSL). This is because the OCCDS dataset only contains records for subjects who had an occurrence; subjects who did not experience the event will not have a record in the OCCDS dataset, yet they are part of the overall study population for the denominator.

Analysis Types: While OCCDS datasets are ideal for generating simple frequency and percentage tables (e.g., AE tables, concomitant medication summaries), they can also serve as the basis for more complex analyses, such as time-to-event analyses, provided the necessary timing variables are included and derived correctly.

4.2 Dictionary Coding

Occurrence data, especially from verbatim text collected on CRFs, relies heavily on coding dictionaries such as MedDRA (Medical Dictionary for Regulatory Activities) for adverse events and medical history, and the WHO Drug Dictionary for concomitant medications. These dictionaries provide a hierarchical classification, allowing for analysis at various levels of granularity.

Key Programmer Considerations:

  • Handling Verbatim Text: Your data processing pipelines must include steps for dictionary coding. This involves mapping verbatim terms collected in raw data to standardized terms and their hierarchical classifications (e.g., mapping AE.AETERM to PT, HLT, HLGT, SOC in MedDRA).

  • Consistency in Coding Rules: When integrating data, especially from multiple studies, it is paramount that consistent dictionary coding rules are applied. Programmers may need to implement logic to handle different coding versions or to ensure that terms are mapped uniformly across studies to avoid inconsistencies in summarized data.

4.2.1 Recoding of Occurrence Data

“Recoding” refers to situations where dictionary-coded terms are re-categorized or re-grouped for specific analytical purposes. This is distinct from the initial coding of verbatim text.

Key Programmer Considerations:

  • Scenarios for Recoding: Common scenarios include:

  • Long-term Safety Reports: Where a consistent view of safety data across a longer period, possibly with updated dictionary versions, is required.

  • Integrated Analyses: Combining data from multiple clinical trials for regulatory submissions often necessitates recoding to ensure comparability

  • Complexity of Integration: The document acknowledges that multi-study data integration and recoding processes are complex and often require significant programming effort to maintain consistency and traceability. This aspect is an ongoing area of development within the ADaM Team.

4.3 Adverse Events

Adverse Event (AE) data is a core and frequently analyzed type of occurrence data where OCCDS is highly beneficial.

Key Programmer Considerations:

  • Definition: Understand the definition of an Adverse Event (as per ICH E2A guidance) to correctly interpret source data.

    Key Attributes: Programs deriving AE data for OCCDS must correctly derive and include critical attributes such as:

    • Severity/Intensity: How severe the AE was.

    • Relatedness: Whether the AE was considered related to the study treatment.

    • Seriousness: Whether the AE met criteria for seriousness.

Important
  • Treatment-Emergent Flag (ATEFLAG): A crucial derivation for programmers. An AE is considered treatment-emergent if its onset occurs on or after the first dose of study treatment and up to a specified post-treatment period (or if it worsens during this time). Your programming logic needs to compare the AE start date (AESTDTC) with the treatment start date (TRTSDT) and other relevant dates.

4.4 Concomitant Medications Data

Concomitant medications data is another common use case for OCCDS, often summarized to understand drug exposure and potential interactions.

  • Summarization by Dictionary: Programs will summarize concomitant medications by the medication name (CMDECOD), active ingredient, or within a specific classification system (e.g., ATC codes from the WHO Drug Dictionary). This involves leveraging the hierarchical structure of the WHO Drug Dictionary.

4.5 Pre-specified Data

This refers to data collected on CRFs where specific categories or checkboxes are provided (e.g., --PRESP for pre-specified event, --OCCUR for occurrence).

Note
  • Variable Usage: Data from pre-specified occurrences can be summarized using variables like --TERM (for the specific term), --TRT (treatment), --CAT (category), and --SCAT (subcategory).
Tip

AE Domain Rule: - A critical distinction: for the SDTM Adverse Events (AE) domain, every record must correspond to an actual occurrence.

  • If a pre-specified AE was checked but did not occur (--OCCUR = "N"), that record is typically handled in a different SDTM domain (e.g., Findings About Events (FA)) and would not lead to an AE record in OCCDS. Programmers must enforce this rule during derivation.

4.6 Combining Spontaneous and Pre-specified Occurrences

While statistically and programmatically feasible to combine spontaneous (unsolicited) and pre-specified (solicited) occurrence data, it requires careful consideration.

Important

Key Programmer Considerations:

  • Statistical Alignment: Ensure that combining these data types makes statistical sense for your analysis, as their collection methodologies differ.

  • Denominator Management: Be extra vigilant with denominators. If you combine data, ensure your denominator correctly reflects the population at risk for both types of occurrences.

  • Excluding Non-Occurring Data: When combining, make sure to exclude records where the occurrence flag indicates “no occurrence” (e.g., --OCCUR = "N") from your analysis datasets to avoid inflating counts.

4.7 Other Data

OCCDS is flexible and can be applied to other types of occurrence data beyond just AEs and concomitant medications.

Key Programmer Considerations:

Important
  • Suitable Data Types:
    • Clinical Events (CE): Can be summarized by category.
    • Protocol Violations (PV): Often summarized by counting subjects with violations.
    • NCI-CTC Coded Lab Data: If lab data is coded using NCI-CTC (National Cancer Institute Common Terminology Criteria for Adverse Events), it can be summarized similarly to AEs, focusing on grading and frequency.

4.8 General Rule for OCCDS vs. BDS:

Important
  • Use OCCDS: When your primary goal is to summarize hierarchical data by counting subjects (e.g., “how many subjects experienced a specific MedDRA Preferred Term?”).
-   **Use BDS:** When you need to summarize non-hierarchical data using a `PARAM` and `AVAL` approach (e.g., "what was the mean change from baseline in a lab parameter?").