Section 5 Running image segmentation optimization

5.1 Introduction

For running the optimization of image segmentation parameters, we will use the function named gaOptimizeSegmentationParams(). For more details about this function and its parameters check out the help files with ?gaOptimizeSegmentationParams.

In short, there are two main steps involved in this process:

  1. Run the gaOptimizeSegmentationParams() to optimize the parameters, and then,

  2. Using the optimized parameters, run a final round to segment, train and predict class labels to the whole image.

When running the optimization in SegOptim setting the segmentation parameters correctly (i.e., minimum and maximum values for each optimizable parameter) is crucial to obtain good results. Also, keep in mind that it may take a while to run depending on the inputs. However, using parallelization (check the parallel parameter), reducing the size of input imagery (i.e., by taking a representative subset of the whole area), and tweaking the popSize, run and maxiter parameters of the genetic algorithm will help a lot to make things faster.

In this example we will use RSGISLib segmentation algorithm (segmentMethod = "RSGISLib_Shep"). For RSGISLib specifically, SegOptim only supports the < 3.5 version of this software, at least for now (updated: May 2019). For classification, we will employ the Random Forest algorithm defined in classificationMethod = "RF". The fitness value will be provided by the Kappa index for the test split(s) as defined in evalMetric = "Kappa". Other performance metrics can be used (even user defined functions can be set up and used by SegOptim). 5-fold cross-validation will be used for assessing classification performance of this example (evalMethod = "5FCV").

The data used in this example can be found in this link (updated 30/01/2020).

Check out the code below and have a good read on the comments to better understand each step, the required inputs as well as the parameters that have to be set up.


5.2 Part 1 - Optimizing parameters

SegOptim can optimize the following RSGISLib parameters:

  1. Number of clusters for running the k-means algorithm (integer);

  2. Minimum segment size, if smaller it will be merged to neighboring segments (in pixels; integer);

  3. Spectral threshold used to merge objects (float), is a value providing the maximum (Euclidean distance) spectral separation for which to merge clumps. Set to a large value to ignore spectral separation and always merge.

Therefore, you need to provide valid minimum and maximum values for each one of these parameters that will be passed to the Genetic Algorithm. These values obviously depend on the target, objectives and the spatial resolution of the inputs so please define them with care.

library(SegOptim)

# Set the working directory which will be used by SegOptim to place temporary files
# Change this according to your system
setwd("C:/myfiles/data")

# Minimum values for nr of clusters, min segment size and spectral threshold
RSGISLib_Shep_min <- c(5, 10, 3)

# Maximum values for nr of clusters, min segment size and spectral threshold
RSGISLib_Shep_max <- c(12, 60, 40)


# Other inputs ------------------------------------------------------------------------------------------------ #

# Classification features (a SpatRaster object from terra package)
rstClassifFeatures <- rast("./CLASSIF_FEAT/WV2_VilarVeiga_smallTestSite.tif")

# Train data (a SpatRaster object/ one band only)
rstTrainData <- rast("./TRAIN_AREAS/trainAreas_Adealbata_VVeiga_WV2_v1.tif")

# A path to the multiband image file used for image segmentation
# This will be used by RSGISLib for image segmentation purposes
segmFeatures <- "./SEGM_FEAT/WV2_b532_VilarVeiga_smallTestSite.tif"

# Also, don't forget to modify pythonPath containing RSGISLib
pyPath <- "C:/Anaconda3/envs/py35"


# Run the optimization procedure ----------------------------------------------------------------------------- #


gaOptim <- gaOptimizeSegmentationParams(  rstFeatures = rstClassifFeatures,
                                          trainData   = rstTrainData,
                                             # // Segmentation parameters ---
                                          segmentMethod = "RSGISLib_Shep",
                                          inputRstPath  = segmFeatures,
                                          pythonPath    = pyPath, 
                                          verbose = FALSE,
                                             # // End segmentation parameters ---
                                          trainThresh   = 0.5,
                                          segmStatsFuns = c("mean"),
                                          classificationMethod = "RF",
                                          classificationMethodParams = NULL,
                                          balanceTrainData = FALSE,
                                          balanceMethod = "ubOver",
                                          evalMethod = "5FCV",
                                          evalMetric = "Kappa",
                                          minTrainCases = 30,
                                          minCasesByClassTrain = 10,
                                          minCasesByClassTest = 10,
                                          minImgSegm = 30,
                                          lower = RSGISLib_Shep_min,
                                          upper = RSGISLib_Shep_max,
                                          popSize = 20, 
                                          pcrossover = 0.8, 
                                          pmutation = 0.2,
                                          maxiter = 100, 
                                          run = 20,
                                          keepBest = TRUE,
                                          parallel = 2) # use two cores for optimization (change to available resources)

For checking the results obtained through optimization let’s inspect the output object named gaOptim:

# Value of evalMetric for the optimal set of parameters found by the genetic algorithm  
#
print(gaOptim@fitnessValue)
    
# Optimized parameters (for RSGISLib) containing three values: 
# number of clusters, min segment size, and, spectral threshold
#
print(gaOptim@solution)

5.3 Part 2 - Running the final image segmentation

After running the optimization procedure we can now use the optimized set of parameters. Let’s start by performing the final image segmentation with optimized params.

[NOTE] If you used a subset of the image data just for optimization purposes, you can now run the entire scene in these final steps to obtain class labels for the whole area.

# Run the final image segmentation
segmObj <- segmentation_RSGISLib_Shep(x             = gaOptim@solution, # Use optimized parameters
                                      inputRstPath  = segmFeatures,
                                      outputSegmRst = "segm_raster.tif", # Output segmented raster (change this!)
                                      pythonPath    = pyPath,
                                      verbose       = TRUE)

# Load the resulting segmented raster
rstSeg <- rast(segmObj$segm)

Now, let’s proceed to load the feature data and populate segments with statistics of it. In this example we will use only the average value of each feature but other statistics can be used as to improve classification.

# Prepare data before classification
# This will populate each segment with statistics of each one of the layers in rstClassifFeatures ------------- #
#
calDataPrep <- prepareCalData( rstSegm     = rstSeg, 
                               trainData   = rstTrainData, 
                               rstFeatures = rstClassifFeatures,
                               thresh      = 0.5,
                               funs        = "mean",
                               verbose     = TRUE)

# Check the generated datasets
head(calDataPrep$calData)

head(calDataPrep$classifFeatData)

This function will generate an object of class SOptim.CalData containing two data frames:

  1. calData - A data frame object containing calibration data for training and evaluating a classifier algorithm. The first column (named “SID”) contains the ID of each segment, and the second column (named “train”) holds the segment class (or label). The following n columns hold the classification features for training;

  2. classifFeatData - A data frame containing all segments and features from inputs. The first column (named “SID”) holds the unique identifier for each image segment. The following n columns are used as classification features. Typically this dataset is used for predicting the target class after calibrating a certain classifier algorithm.

Using these data we can now run the classification step:

# Run the final classification -------------------------------------------------------------------------------- #
#
RFclassif <- calibrateClassifier(calData       = calDataPrep,
                    classificationMethod       = "RF",
                    classificationMethodParams = NULL,
                    balanceTrainData           = FALSE,
                    balanceMethod              = "ubOver",
                    evalMethod                 = "5FCV",
                    evalMetric                 = "Kappa",
                    minTrainCases              = 30,
                    minCasesByClassTrain       = 10,
                    minCasesByClassTest        = 10,
                    runFullCalibration         = TRUE) # This is mandatory - see help files!                    

# Average performance for evalMetric and using test data only (i.e., data not used for training)
RFclassif$AvgPerf

# Performance stats for evalMetric across test rounds
RFclassif$PerfStats

# Calculate other performance scores for the classification (such as Peirce Skill Score, 
# Gerrity Skill Score, and AUC)
# see ?evalPerformanceClassifier for more details
#
evalPerformanceClassifier(RFclassif)

# Confusion matrix for the full dataset
RFclassif$ConfMat$FULL

Finally, we can now predict the label class of each segment for the whole area/scene:

# Predict the label/class of each object --------------------------------------------------------------------- #
#
predictSegments(
        classifierObj = RFclassif, 
                calData       = calDataPrep, 
                rstSegm       = rstSeg, 
                predictFor    = "all",
                filename      = "segm_classified.tif",  # The output file with class labels for each object 
                verbose       = TRUE, 
                na.rm         = TRUE)                           

# load the resulting raster
finalRst <- rast("segm_classified.tif")

plot(finalRst)