Implementation
A standard Active Learning (AL) methodology is applied throughout, encompassing four steps: training the
U-Net model on the initial annotated data (L),
calculating the model uncertainty scores or representativeness scores on the unlabeled pool of data (U),
selecting the top-ranked batch of images (K) to obtain their labels from oracle and add them to L and remove
them from U, and finally retrain the model on the updated L.
These steps are repeated until the optimal number of AL iterations is reached.
Random sampling and a variety of different selective sampling approaches are used for
selecting the next batch of images from the unlabelled pool.
The most common AL selection strategy is uncertainty sampling, where the most uncertain unlabelled images are
quiried for annotation. Such uncertainty methods include: Classification Uncertainty (Pixel-wise), Predictive entropy,
and Monte Carlo droupout ensemble-based methods using entropy, variance (Var), variation ratio (Var_ratio),
standard diviation (STD), Coeffcient of variation (Coef_var),
and Bayesian active learning with disagreement (BALD).
Training settings: Tensorflow and Keras frameworks are used for the development of DL models, and training was conducted using an Nvidia RTX3090 GPU. U-Net was trained using binary cross-entropy loss and ADAM optimiser with a learning rate of 0.0001 for 200 epochs. Images were resized to 512x512, and a fixed batch size of 8 was applied.
For small datasets A and B, we selected 10% of the initial training data as L, and U will be the remaining 90%.
For the larger dataset, we chose 4% as the initial L, and the remaining is for the U, which will be
used as an oracle. Finally, to have a fair comparison between methods, we trained the model on the
initially labelled data of
the three datasets and used the same model's weights for training the following AL iterations.