Paired Subsampling to enable inference on the generalization error.
One should not directlu call $aggregate()
with a non-CI measure on a resample result using paired subsampling,
as most of the resampling iterations are only intended
Details
The first repeats_in
iterations are a standard ResamplingSubsampling
and should be used to obtain a point estimate of the generalization error.
The remaining iterations should be used to estimate the standard error.
Here, the data is divided repeats_out
times into two equally sized disjunct subsets, to each of which subsampling
which, a subsampling with repeats_in
repetitions is applied.
See the $unflatten(iter)
method to map the iterations to this nested structure.
Parameters
repeats_in
::integer(1)
The inner repetitions.repeats_out
::integer(1)
The outer repetitions.ratio
::numeric(1)
The proportion of data to use for training.
References
Nadeau, Claude, Bengio, Yoshua (1999). “Inference for the generalization error.” Advances in neural information processing systems, 12.
Super class
mlr3::Resampling
-> ResamplingPairedSubsampling
Methods
Method unflatten()
Unflatten the resampling iteration into a more informative representation:
inner
: The subsampling iterationouter
:NA
for the firstrepeats_in
iterations. Otherwise it indicates the outer iteration of the paired subsamplings.partition
:NA
for the firstrepeats_in
iterations. Otherwise it indicates whether the subsampling is applied to the first or second partition Of the two disjoint halfs.
Examples
pw_subs = rsmp("paired_subsampling")
pw_subs
#> <ResamplingPairedSubsampling>: Paired Subsampling
#> * Iterations: 315
#> * Instantiated: FALSE
#> * Parameters: repeats_in=15, repeats_out=10, ratio=0.9