#' Select the best Box-Cox lambda parameter for a variable
#'
#' @description Searches a specified grid of lambda values for a Box-Cox
#'   transformation that is most similar to a normal distribution. This function
#'   can also produce supplemental reports:
#'   * Suggested Winsorization: a report of outlier observations and suggested Winsorized values
#'   * Influential subjects: a report of subjects that have extreme variance among repeated observations
#'
#' @section Lambda search:
#'
#'   The best lambda value is defined as the lambda value that produces a
#'   transformation that minimizes the sum of squared errors (SSE) between the
#'   actual 1st to 99th percentiles of the transformed variable to the
#'   corresponding expected percentiles of a normal distribution. Using the 1st
#'   to 99th percentiles excludes extreme values and makes the selection of the
#'   lambda less susceptible to outliers.
#'
#' @section Suggested Winsorization:
#'
#'   Outlier detection is done on the Box-Cox transformed scale using the
#'   selected lambda value to ensure that the data is as close to normal as
#'   possible. Outliers are defined as being a specified multiple (default: 3)
#'   of the interquartile range below the 25th percentile or above the 75th
#'   percentile.
#'
#' @section Influential subjects:
#'
#'   Detection of influential subjects is done on the Box-Cox transformed scale
#'   using the selected lambda value to ensure that the data is as close to
#'   normal as possible. Influential subjects are found by performing an F-test
#'   on the variance of each subject's observations against the mean of the
#'   variances of the other subjects' observations. When each subject has 2
#'   observations, the default alpha for the F-test corresponds to identifying
#'   subjects as influential that are 3 times the interquartile range of the
#'   differences between observations below the 25th percentile of differences
#'   or above the 75th percentile of differences. Multiple testing correction
#'   (e.g., Bonferroni, Benjamini-Hochberg) is also available.
#'
#'
#' @param input.data A data frame.
#' @param row.subset Logical vector of the same length as the `nrow(input.data)`
#'   indicating which rows of `input.data` to use for selecting the best lambda.
#' @param variable Variable to transform.
#' @param id Variable that identifies each subject. Required only for
#'   supplemental reports.
#' @param repeat.obs Variable that distinguishes repeat observations for each
#'   subject. Required only for supplemental reports.
#' @param lambda.start Minimum lambda value in the search grid.
#' @param lambda.increment Spacing between lambda values in the search grid.
#' @param num.lambdas Number of lambda values in the search grid.
#' @param covariates Vector of covariates used to select the best lambda.
#' @param weight Variable with weighting for each subject.
#' @param do.winsorization Generate suggested Winsorization report? (default =
#'   `FALSE`)
#' @param print.winsorization Print suggested Winsorization report to the
#'   console? (default = `TRUE`)
#' @param is.episodic Is the variable episodic? Episodic variables have a
#'   substantial number of zero observations due to not being continuously
#'   observed. Required only for suggested Winsorization report. (default =
#'   `FALSE`)
#' @param do.influential Generate influential subject report? (default =
#'   `FALSE`)
#' @param print.influential Print influential subject report to the console
#'   (default = `TRUE`)
#' @param iqr.multiple Multiple of the interquartile range of the Box-Cox
#'   transformed variable. This sets the distance away that an observation must
#'   be from the 25th or 75th percentiles to be considered an outlier. Has no
#'   effect if the suggested Winsorization report is not generated. (default =
#'   `3`)
#' @param influential.alpha The F-test p-value threshold that a subject must be
#'   under to be considered influential. Has no effect if the influential
#'   subject report is not generated. See "Influential Subjects" section for
#'   details. (default = `0.000002342729`)
#' @param multiple.test The type of multiple testing correction to use to adjust
#'   'influential.subject.alpha'. The options are the same as for
#'   [stats::p.adjust()]. Has no effect if the influential subject report is not
#'   generated. (default = `"none"`)
#'
#' @returns A data frame with the following columns:
#' * variable: Name of the variable that was transformed.
#' * tran_lambda: The value of lambda for the Box-Cox transformation most resembling a normal distribution.
#'
#'   The following attribute is present `do.winsorization` is `TRUE`:
#'
#' * winsorization.report: A data frame of outlier observations:
#'   * `id`: The unique identifier for each subject.
#'   * `repeat.obs` Distinguishes repeated observations for the same subject.
#'   * `variable`: The value of the outlier on the original scale.
#'   * `variable`.winsorized: The suggested value to Winsorize the outlier value to.
#'
#'   The following attribute is present if `do.influential` is `TRUE`:
#'
#' * influential.subject.report: A data frame of subjects with influential within-subject variances:
#'   * `id`: The unique identifier for each subject.
#'   * p: The p-value of the F-test that identified the subject's variance as influential.
#'   * `variable`.1 - `variable`.k: One column for each of the k unique values of `repeat.obs` containing values of `variable` for each observation.
#'
#' @export
#'
#' @examples
#' #subset NHANES data
#' nhanes.subset <- nhcvd[nhcvd$SDMVSTRA %in% c(48, 60, 72),]
#'
#' #daily variable
#' boxcox.sodium <- boxcox_survey(input.data=nhanes.subset,
#'                                row.subset=(nhanes.subset$DAY == 1),
#'                                variable="TSODI",
#'                                id="SEQN",
#'                                repeat.obs="DAY",
#'                                weight="WTDRD1",
#'                                do.winsorization=TRUE,
#'                                iqr.multiple=2,
#'                                do.influential=TRUE,
#'                                influential.alpha=0.005)
#' boxcox.sodium
#'
#' #episodic variable
#' boxcox.g.whole <- boxcox_survey(input.data=nhanes.subset,
#'                                 row.subset=(nhanes.subset$DAY == 1),
#'                                 variable="G_WHOLE",
#'                                 is.episodic=TRUE,
#'                                 id="SEQN",
#'                                 repeat.obs="DAY",
#'                                 weight="WTDRD1",
#'                                 do.winsorization=TRUE,
#'                                 iqr.multiple=2,
#'                                 do.influential=TRUE,
#'                                 influential.alpha=0.005)
#' boxcox.g.whole
boxcox_survey <- function(input.data,
                          row.subset=NULL,
                          variable,
                          id,
                          repeat.obs,
                          lambda.start=0,
                          lambda.increment=0.01,
                          num.lambdas=101,
                          covariates=NULL,
                          weight=NULL,
                          do.winsorization=FALSE,
                          print.winsorization=TRUE,
                          is.episodic=FALSE,
                          do.influential=FALSE,
                          print.influential=TRUE,
                          iqr.multiple=3,
                          influential.alpha=0.000002342729,
                          multiple.test="none") {

  #1. Find best Box-Cox lambda parameter
  selected.lambda <- find_best_lambda(input.data=input.data,
                                      row.subset=row.subset,
                                      variable=variable,
                                      covariates=covariates,
                                      weight=weight,
                                      lambda.start=lambda.start,
                                      lambda.increment=lambda.increment,
                                      num.lambdas=num.lambdas)

  boxcox.lambda.data <- data.frame(variable=variable, tran_lambda=selected.lambda)

  #2. Create report of suggested Winsorized values for outliers
  if(do.winsorization) {

    winsorization.report <- find_suggested_winsorization(input.data=input.data,
                                                         row.subset=row.subset,
                                                         lambda=selected.lambda,
                                                         variable=variable,
                                                         is.episodic=is.episodic,
                                                         covariates=covariates,
                                                         weight=weight,
                                                         id=id,
                                                         repeat.obs=repeat.obs,
                                                         iqr.multiple=iqr.multiple,
                                                         print.report=print.winsorization)

    attr(boxcox.lambda.data, "winsorization.report") <- winsorization.report
  }

  #3. Create report of subjects with influential within-subject variances
  if(do.influential) {

    influential.subject.report <- find_influential_subjects(input.data=input.data,
                                                            row.subset=row.subset,
                                                            lambda=selected.lambda,
                                                            variable=variable,
                                                            weight=weight,
                                                            id=id,
                                                            repeat.obs=repeat.obs,
                                                            alpha=influential.alpha,
                                                            multiple.test=multiple.test,
                                                            print.report=print.influential)

    attr(boxcox.lambda.data, "influential.subject.report") <- influential.subject.report
  }

  return(boxcox.lambda.data)
}
