An increasingly less thin wrapper around a data.table containing the data. Contains metadata about the particular machine learning problem, including which variables are to be used as covariates and outcomes.



R6Class object.



Passes all arguments to the constructor. See documentation for Constructor below.


sl3_Task object


make_sl3_Task(data, covariates, outcome = NULL, outcome_type = NULL, outcome_levels = NULL, id = NULL, weights = NULL, offset = NULL, nodes = NULL, column_names = NULL, row_index = NULL, folds = NULL)


A data.frame or data.table containing the underlying data


A character vector of variable names that define the set of covariates


A character vector of variable names that define the set of outcomes. Usually just one variable, although some learners support multivariate outcomes. Use sl3_list_learners("multivariate_outcome") to find such learners.


A Variable_type object that defines the variable type of the outcome. Alternatively, a character specifying such a type. See variable_type for details on defining variable types.


A vector of levels expected for the outcome variable. If outcome_type is a character, this will be used to construct an appropriate variable_type object.


A character indicating which variable (if any) to be used as an observation "id", for learners that support clustered observations. Use sl3_list_learners("id") to find such learners.


A character indicating which variable (if any) to be used as observation weights, for learners that support that. Use sl3_list_learners("weights") to find such learners.


A character indicating which variable (if any) to be used as an observation "id", for methods that support clustered observations. Use sl3_list_learners("offset") to find such learners.


A list of character vectors as nodes. This will override the covariates, outcome, id, weights, and offset arguments if specified, and is an alternative way to specify those arguments.


A named list of characters that maps between column names in data and how those variables are referenced in sl3_Task functions.


add_interactions(interactions, warn_on_existing = TRUE)

Adds interaction terms to task, returns a task with interaction terms added to covariate list.

  • interactions: A list of lists, where each sublist describes one interaction term, listing the variables that comprise it

  • warn_on_existing: If TRUE, produce a warning if there is already a column with a name matching this interaction term

add_columns(fit_uuid, new_data, global_cols=FALSE)

Add columns to internal data, returning an updated vector of column_names

  • fit_uuid: A uuid character that is used to generate unique internal column names. This prevents two added columns with the same name overwriting each other, provided they have different fit_uuid.

  • new_data: A data.table containing the columns to add

  • global_cols: If true, don't use the fit_uuid to make unique column names

next_in_chain(covariates=NULL, outcome=NULL, id=NULL, weights=NULL, offset=NULL, column_names=NULL, new_nodes=NULL, ...)

Used by learner$chain methods to generate a task with the same underlying data, but redefined nodes. Most of the parameter values are passed to the sl3_Task constructor, documented above.

  • covariates: An updated covariates character vector

  • outcome: An updated outcome character vector

  • id: An updated id character value

  • weights: An updated weights character value

  • offset: An updated offset character value

  • column_names: An updated column_names character vector

  • new_nodes: An updated list of node names

  • ...: Other arguments passed to the sl3_Task constructor for the new task


Returns a task with rows subsetted using the row_index index vector

  • row_index: An index vector defining the subset

get_data(rows, columns)

Returns a data.table containing a subset of task data.

  • rows: An index vector defining the rows to return

  • columns: A character vector of columns to return.


Returns true if the node is defined in the task

  • node_name: The name of the node to look for

get_node(node_name, generator_fun=NULL)

Returns a ddta.table with the requested node's data

  • node_name: The name of the node to look for

  • generator_fun: A function(node_name, n) that can generate the node if it was not specified in the task.



Internal representation of the data


Formatted task data


Number of observations


A list of node variables


a data.table containing the covariates


a data.table containing the covariates and an intercept term


a vector containing the outcomes


a vector containing the offset. Will return an error if the offset wasn't specified on construction


a vector containing the observation weights. If weights aren't specified on construction, weights will default to 1


a vector containing the observation units. If the ids aren't specified on construction, id will return seq_len(nrow)


An origami fold object, as generated by make_folds, specifying a cross-validation scheme


A unique identifier of this task


The named list mapping variable names to internal column names


A variable_type object specifying the type of the outcome