Skip to content

Sparse Data Rule (sparse_data) โ€‹

Sparse data calculates mean concentrations for sparse sampling designs. User specified grouping variables, time variable, and concentration variable. This function then calculates the mean and standard deviation at each time point. This data can now be used for PK analysis. Note that this rule must be the last rule in a data cleaning pipeline.

FieldDescriptionRequired
groupingColumnsArray of columns to be used to group the data for summarizationโœ… Yes
timeColumnColumn that includes the time variableโœ… Yes
concentrationColumnColumn that includes the concentration data for summarization. Should be a column with only numeric values.โœ… Yes
groupIdColumnNameNew column in output that will contain group id numbersโœ… Yes
carryAlongColumnsColumns for which data will be carried into the output file. Only the first value in each profile will be usedโŒ No
uniqueIdColumn that contains unique subject identifiersโœ… Yes
includedIdsColumnNameNew column in output that will contain unique subject identifiers used in the summarizationโœ… Yes

Example: โ€‹

json
{
  "description": "Sparse Data Description",
  "version": "3.0.0",
  "groupingColumns": [
    "sex",
    "dose"
  ],
  "timeColumn": "time",
  "concentrationColumn": "conc",
  "groupIdColumnName": "group_id",
  "carryAlongColumns": [
    "dose"
  ],
  "uniqueId": "subject",
  "includedIdsColumnName": "subjects_included",
  "type": "sparse_data"
}

Behavior: Concentration data in conc will be summarized by dose and sex at each timepoint in time. The new group identifiers will be in group_id, and included subjects for each summary time point will be in subjects_included.