| Title: | Clinical Trial Example Datasets |
|---|---|
| Description: | A collection of clinical trial example datasets from multiple sources including the CDISC Pilot 01 study (CDISC <https://www.cdisc.org/>). All datasets are provided in Parquet format for efficient storage and can be accessed using the 'connector' package. Designed for training, testing, prototyping, and demonstrating clinical data analysis workflows. |
| Authors: | Lovemore Gakava [aut, cre, cph] |
| Maintainer: | Lovemore Gakava <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 0.1.3 |
| Built: | 2026-05-15 08:45:54 UTC |
| Source: | https://github.com/lovemore-gakava/clintrialdata |
Returns the path to the local cache directory where downloaded clinical
trial datasets are stored. The location follows the platform-specific
user data directory convention via tools::R_user_dir().
You can delete any subdirectory here to remove a cached dataset, or clear the entire directory to free disk space.
cache_dir()cache_dir()
A character string with the path to the cache directory.
cache_dir()cache_dir()
The clinTrialData package contains clinical trial datasets from multiple sources, stored in Parquet format. Data is accessed using connector functions.
The CDISC Pilot 01 study data includes both ADaM and SDTM domains.
ADaM datasets include:
ADSL: Subject-Level Analysis Dataset
ADAE: Adverse Events Analysis Dataset
ADLBC: Laboratory Analysis Dataset (Chemistry)
ADLBH: Laboratory Analysis Dataset (Hematology)
ADLBHY: Laboratory Analysis Dataset (Hy's Law)
ADQSADAS: ADAS-Cog Questionnaire Analysis Dataset
ADQSCIBC: CIBC Questionnaire Analysis Dataset
ADQSNPIX: NPI-X Questionnaire Analysis Dataset
ADTTE: Time-to-Event Analysis Dataset
ADVS: Vital Signs Analysis Dataset
SDTM datasets include:
DM: Demographics
AE: Adverse Events
VS: Vital Signs
LB: Laboratory Test Results
And 18 additional domains (see list_data_sources() for details)
Data sources are discovered by scanning the package directory structure.
List available datasets with list_data_sources().
Access data using the connection function:
# Connect to any data source (e.g., CDISC Pilot data)
db <- connect_clinical_data("cdisc_pilot")
# List available datasets
db$adam$list_content_cnt()
# Read a dataset
adsl <- db$adam$read_cnt("adsl")
# See all available data sources
list_data_sources()
Datasets are stored in Parquet format:
Columnar storage
Fast reads
Compression
Cross-platform compatibility
CDISC Pilot 01 Study Data Various clinical trial data sources
CDISC. Clinical Data Interchange Standards Consortium. https://www.cdisc.org/
Generic connection function that allows access to any data source in the package. Data sources are automatically discovered by scanning the package's example data directory structure.
connect_clinical_data(source = "cdisc_pilot")connect_clinical_data(source = "cdisc_pilot")
source |
Character string specifying the data source.
Use |
A connectors object
if (interactive()) { # Connect to CDISC Pilot data db <- connect_clinical_data("cdisc_pilot") # List available datasets db$adam$list_content_cnt() # Read a dataset (requires the arrow package) if (requireNamespace("arrow", quietly = TRUE)) { adsl <- db$adam$read_cnt("adsl") } # List available sources list_data_sources() }if (interactive()) { # Connect to CDISC Pilot data db <- connect_clinical_data("cdisc_pilot") # List available datasets db$adam$list_content_cnt() # Read a dataset (requires the arrow package) if (requireNamespace("arrow", quietly = TRUE)) { adsl <- db$adam$read_cnt("adsl") } # List available sources list_data_sources() }
Fetches and displays metadata for any study available in the
clinTrialData library – without downloading the full dataset. Metadata
includes the study description, available domains and datasets, subject
count, version, and data source attribution.
For studies already downloaded via download_study(), the metadata is read
from the local cache and works offline. For studies not yet downloaded, a
small JSON file (~2KB) is fetched from the GitHub Release.
dataset_info(source, repo = "Lovemore-Gakava/clinTrialData")dataset_info(source, repo = "Lovemore-Gakava/clinTrialData")
source |
Character string. Name of the study (e.g.
|
repo |
GitHub repository in the form |
Invisibly returns the metadata as a named list.
dataset_info("cdisc_pilot")dataset_info("cdisc_pilot")
Downloads a study dataset from a GitHub Release and stores it in the local
cache (see cache_dir()). Once downloaded, the study is available to
connect_clinical_data() without an internet connection.
Requires the piggyback package.
download_study( source, version = "latest", force = FALSE, repo = "Lovemore-Gakava/clinTrialData" )download_study( source, version = "latest", force = FALSE, repo = "Lovemore-Gakava/clinTrialData" )
source |
Character string. The name of the study to download (e.g.
|
version |
Character string. The release tag to download from. Defaults
to |
force |
Logical. If |
repo |
GitHub repository in the form |
Invisibly returns the path to the cached study directory.
if (interactive()) { # Download a study not bundled with the package download_study("cdisc_pilot_extended") # Then connect as usual db <- connect_clinical_data("cdisc_pilot_extended") }if (interactive()) { # Download a study not bundled with the package download_study("cdisc_pilot_extended") # Then connect as usual db <- connect_clinical_data("cdisc_pilot_extended") }
Returns a data frame of all clinical trial studies available as GitHub
Release assets, along with their local cache status. Studies marked as
cached = TRUE are already downloaded and available for use with
connect_clinical_data() without an internet connection.
When GitHub is unreachable, the function falls back to the last
successfully fetched listing (if available) and issues a warning.
The cached column is always recomputed from the local filesystem.
Requires the piggyback package.
list_available_studies(repo = "Lovemore-Gakava/clinTrialData")list_available_studies(repo = "Lovemore-Gakava/clinTrialData")
repo |
GitHub repository in the form |
A data frame with columns:
Study name (pass this to download_study() or
connect_clinical_data())
Release tag the asset belongs to
Asset size in megabytes
TRUE if the study is already in the local cache
if (interactive()) { list_available_studies() }if (interactive()) { list_available_studies() }
Returns information about all clinical datasets available locally –
both datasets bundled with the package and any datasets previously
downloaded via download_study(). The location column indicates
whether a dataset is "bundled" (shipped with the package) or
"cached" (downloaded to the user cache directory).
To see datasets available for download from GitHub, use
list_available_studies().
list_data_sources()list_data_sources()
A data frame with columns:
Dataset name (pass to connect_clinical_data())
Human-readable study description
Comma-separated list of available data domains
(e.g. "adam, sdtm")
Storage format ("parquet")
Either "bundled" or "cached"
list_data_sources()list_data_sources()
S3 method for remove_cnt that checks if the study folder is locked before allowing remove operations.
## S3 method for class 'ConnectorLockedFS' remove_cnt(connector_object, name, ...)## S3 method for class 'ConnectorLockedFS' remove_cnt(connector_object, name, ...)
connector_object |
The ConnectorLockedFS object |
name |
The file name to remove |
... |
Additional arguments passed to the underlying connector |
Invisible connector_object
S3 method for write_cnt that checks if the study folder is locked before allowing write operations.
## S3 method for class 'ConnectorLockedFS' write_cnt(connector_object, x, name, overwrite = FALSE, ...)## S3 method for class 'ConnectorLockedFS' write_cnt(connector_object, x, name, overwrite = FALSE, ...)
connector_object |
The ConnectorLockedFS object |
x |
The data to write |
name |
The file name |
overwrite |
Whether to overwrite existing files |
... |
Additional arguments passed to the underlying connector |
Invisible connector_object