Paired Subsampling to enable inference on the generalization error.
Details
The first repeats_in iterations are a standard ResamplingSubsampling
and should be used to obtain a point estimate of the generalization error.
The remaining iterations should be used to estimate the standard error.
Here, the data is divided repeats_out times into two equally sized disjunct subsets, to each of which subsampling
which, a subsampling with repeats_in repetitions is applied.
See the $unflatten(iter) method to map the iterations to this nested structure.
Point Estimation
When calling $aggregate() on a resample result obtained using this resampling method, only the first repeats_out iterations will be used.
See section "Point Estimation" of MeasureCiConZ.
Parameters
repeats_in::integer(1)
The inner repetitions.repeats_out::integer(1)
The outer repetitions.ratio::numeric(1)
The proportion of data to use for training.
References
Nadeau, Claude, Bengio, Yoshua (1999). “Inference for the generalization error.” Advances in neural information processing systems, 12.
Super class
mlr3::Resampling -> ResamplingPairedSubsampling
Methods
Method unflatten()
Unflatten the resampling iteration into a more informative representation:
inner: The subsampling iterationouter:NAfor the firstrepeats_initerations. Otherwise it indicates the outer iteration of the paired subsamplings.partition:NAfor the firstrepeats_initerations. Otherwise it indicates whether the subsampling is applied to the first or second partition Of the two disjoint halfs.
Examples
pw_subs = rsmp("paired_subsampling")
pw_subs
#>
#> ── <ResamplingPairedSubsampling> : Paired Subsampling ──────────────────────────
#> • Iterations: 315
#> • Instantiated: FALSE
#> • Parameters: repeats_in=15, repeats_out=10, ratio=0.9