This document was last updated 01/07/02 cates@cs.utah.edu.  It describes the
streaming watershed algorithm, some trials we ran with that algorithm, and the
example code contained in this directory.

TABLE OF CONTENTS
I    Introduction
II   The streaming watershed algorithm
III  Implementation
IV   Some results
V    Some comments
VI   Ideas for future work
VII  Using the example applications


I. Introduction

This directory and its subdirectories contain code to perform streaming
watershed segmentation of multidimensional image data.

Directory                  Contents
---------                  ----------------
./                         I/O and streaming watershed application code
./ParameterFileParser/     Parser for parameter files used by the application
                           code
./VolumeChunker/           Code for creating the volume chunks that the
                           application code processes

[ For a description of the watershed algorithm, refer to the documentation of
WatershedImageFilter found inline in Code/Algorithms/itkWatershedImageFilter.h ]

This example implements a streaming version of the Itk watershed segmentation
algorithm (WatershedImageFilter).  In the context of the Insight toolkit,
``streaming'' means that a single, large dataset is processed as a sequence of
smaller sub-datasets. Streaming is useful for processing datasets that are too
large to fit in the core memory of a machine.

Algorithms that map a localized set of data points into a result are easily
converted for streaming.  Examples include pixel-wise operations, convolution
algorithms, and image resamplers.  Sub-dataset regions can simply be padded
with ghost pixels to a width appropriate for the algorithm.  For example, data
can be streamed through a convolution algorithm by padding the sub-dataset
regions to accommodate the size of the convolution kernel.

The watershed algorithm is not so easily streamed.  It is global in nature,
meaning that the result at any pixel may depend on any and all other pixels in
the dataset.  We have devised a way to stream watershed segmentation using a
two-pass algorithm.  The streaming algorithm is described in the next section.


II. The streaming watershed algorithm

Vocabulary:

dataset         Image or volume of pixels.
sub-dataset     A subset of a dataset.
segment         A labeled region of interest in a dataset.  The goal of the
                segmentation algorithm is to produce labeled segments.

[ There are some design documents checked into the InsightDocuments repository.
  See the file "InsightDocuments/Developer/General/Code/
                 Utah/StreamingWatershedSegmentationSegmentationDocuments.pdf".
  These documents may be helpful for the description that follows.
  It may also be helpful to familiarize yourself with the non-streaming watershed 
  segmentation algorithm, as described in the inline documentation of
  Code/Algorithms/itkWatershedImageFilter.h. ]

We have devised a way to stream watershed segmentation for large datasets using
a two-pass algorithm.  Input to this algorithm is a collection of sub-dataset
regions which together comprise a single, contiguous dataset. The first pass of
the algorithm segments the collection of sub-dataset regions in sequence,
storing information about segment connections across the boundaries. The second
pass analyzes the boundary pixel information, and relabels each sub-dataset
region's segments to produce a global result.

(To visualize a volumetric data set that has been broken into subsets, picture a
set of cubes (i.e. six-sided dice), with their faces pressed against one
another to form a larger cube (Rubik's cube).  If each cube is a sub-dataset,
and the collective cube they comprise is the entire dataset, then the
interfaces between cubes are the boundaries of interest.)

The watershed algorithm segments data by tracing pixels values down the path of
steepest gradient descent to local minima.  Each pixel traversed is assigned a
label that matches that of its associated local minimum, thus creating segments
analogous to watershed basins. In the streaming version of the algorithm, the
initial segmentation of each sub-dataset proceeds exactly as described above
until a path of steepest descent crosses a boundary.  Boundary crossings are
noted and resolved in a separate step to generate a table of equivalent
segments among the sub-datasets.  Cross-boundary segments are marked as
equivalent if their paths of steepest descent flow into one another.
Equivalent segments are relabeled in a final pass through all the data.  Each
sub-dataset requires a single pixel overlap with its adjacent sub-dataset for
the algorithm to analyze flow of gradient descent across boundaries.

In a separate step, the watershed segmentation process generates a hierarchy of
merges among segments according to saliency measures at segment boundaries.  We
have not implemented data streaming for this final step.  Because the hierarchy
generation step relies on a summary table of information about the segments, it
has a much smaller memory profile.  For this first implementation, we have
limited ourselves to datasets whose segment information will fit into memory.


III. Implementation

A streaming version of the Itk watershed algorithm can be implemented using the 
same itk component process objects that create the non-streaming watershed
filter itk::WatershedImageFilter.  itk::WatershedImageFilter creates a
non-streaming segmentation pipeline as follows:

 A: CreateInitialSegmentation->GenerateMergeHierarchy->RelabelInitialSegmentation

For streaming, we modify the pipeline as follows:

 B: CreateInitialSegmentation->ResolveBoundaryInformation->RelabelEquivalencies
                ->GenerateMergeHierarchy->RelabelInitialSegmentation

The Itk pipeline mechanism does not directly support the iterative and
streaming capabilities required to directly implement Pipeline B.  To implement
Pipeline B in Itk, we break it into separate processes:

 C: CreateInitialSegmentation (iterates over all sub-datasets)

 D: ResolveBoundaryInformation

 E: RelabelEquivalencies (iterates for all sub-datasets)

 E: GenerateMergeHierarchy

 F: RelabelInitialSegmentation (iterates over all sub-datasets)

Each step C-F is implemented as a separate application.  The applications are
controlled by parameter files and store intermediate data structures on disk.


IV. Some results.

The example application found in this directory was tested on Visible Female
color data.  The largest dataset we processed was 150 slices of the abdomen
section of the Fullcolor dataset at full resolution.  These slices were cropped
(1600x800x150) to remove some of the uninteresting background material at the
edges and then split into 8x4x1 sub-volumes using the vhchunker application
found in Applications/StreamedWatershedSegmentation/VolumeChunker.

The sub-volume chunks were converted to floating point data and padded 13
pixels on each edge.  We then processed each chunk with 13 iterations of the
curvature-based anisotropic diffusion filter
(itkCurvatureAnisotropicDiffusionImageFilter) using a conductance parameter
1.2, time step 0.125, and fixed average gradient magnitude of 15.0.  The fixed
average gradient magnitude was necessary to avoid blocking artifacts in the
final volume which result if the diffusion algorithm is allowed to dynamically
calculate its gradient magnitude for each chunk (the default, see
AnisotropicDiffusionImageFilter documentation for further details).

The diffused volume contained 192M pixels and was 4.6 GBytes in size.  To
generate the input for the watershed algorithm, we streamed the diffused volume 
through itk::VectorGradientMagnitudeImageFilter to produce a scalar edge
image, and padded each chunk with the single overlap pixel required by the
watershed streaming code.

Watershed segmentation was performed on the edge volume in five steps:

(a) Each chunk of the edge volume was first processed with itkSegmenterApp to
produce the initial labeled volume and the boundary information. 

(b) The boundary information from (a) was processed with itkResolverApp to
produce a table of equivalencies among segment labels.

(c) The equivalency table was used by itkEquivalenceRelabelerApp to relabel
each of the sub-volumes from (a).

(d) ItkTreeGeneratorApp was run on the segment table output of step (a) to
produce a merge hierarchy.

(e) Using the results of (d), the volume from (c) can be relabeled in constant
time to produce any segmentation within the hierarchy.

Steps (b), (c), and (e) are order N algorithms whose execution times were
dominated by file I/O.  Times were 2-3 minutes at worst.

The following table gives approximate execution times for the more critical
steps (a) and (e) over variously sized data sets, all prepared as described above 
and processed using the same parameters.  The test machine used a Pentium III
1GHZ processor with 1G RAM.

Number of 1600x800 slices:  30  100  150
--------------------------------------------------------
Step a  (in minutes)         3   14   20
Step e  (in minutes)        10  180 1320


V. Some comments.

The computational complexity of step (e) is clearly a limiting factor for
segmenting large datasets with our technique.  Computation times increase
non-linearly with a linear increase in dataset size.  This is most likely due
to a non-linear increase in connectivity among segments, but there may also be
implementational factors at work.

The size of the table of segment information used by the algorithm in step (e)
is also a limiting factor.  In practice, 200+ slice datasets usually produced
segment tables too large to fit in the core memory of our test machine.

There are advantages to segmenting data as a three-dimensional volume versus
segmenting on a slice-by-slice basis. Volumetric segmentation uses all of the
available connectivity information among pixels and can, therefore, connect
sections of convoluted structures that are disjoint within
any single two dimensional plane.  This could be very important for structures
such as blood vessel trees, whose roots might connect at only a few points far down
into the volume. A two-dimensional segmentation in a near plane cannot see the
connection, and would identify the disjoint vessel regions as separate
structures. 

There are also disadvantages to volumetric segmentation.  As evidenced above,
computational complexity can become prohibitively high for many applications.
Also, because the watershed algorithm is sensitive to noise, larger volumes and
increased connectivity among segments provides more opportunity for segments to
erroneously merge into one another.  How sensitive the algorithm is to noise
largely depends on the saliency measures used on the segment boundaries.  More
robust measures would help to prevent this latter problem.


VI. Ideas for future work.

Ideally, we would like to be able to do watershed segmentations on datasets the 
size of the entire Visible Female.  In order to achieve that goal, we need to
reduce the complexity of the hierarchy generation algorithm (e).  We also need
to come up with a method for compressing and/or streaming the segment table
information used in (e) to overcome memory limits on smaller machines.

To improve the quality of our segmentations, we would like to experiment with
different saliency measures on segment boundaries.  The Itk code could be
modified to accept a class of saliency measures, perhaps as template parameters.


VII. Using the example application

This section describes how to use the applications contained in this directory
to perform streamed segmentations.  The applications are driven by parameter
files, which describe the inputs and outputs to each step in the segmentation
process.

Step 1: Preparing the data
        A recommended procedure for preprocessing the data is described above
        in Section IV.  See the documentation for the VolumeChunker for
        information on how to break data into chunks.  It is important that the
        chunk inputs to Step 2 be padded exactly one pixel on each face that touches
        the face of another chunk (i.e. they should overlap each other by one
        pixel).

Step 2: Run itkSegmenter
        This application performs the initial segmentation of each volumetric
        image chunk.  It outputs boundary data for use in Step 3, labeled
        image volumes, and a table of segment information.  A sample parameter
        file is given below: 

// BEGIN SAMPLE PARAMETER FILE FOR itkSegmenter

(input_chunk_file  "chunkset.record") // This is the name of the header file
                                      // for the chunks to process.  Automatically
                                      // generated by vhchunker or chunker

(output_chunk_file "chunkset.record")  // Name of the header file to output.
                                       // Header information may be changed by
                                       // the Segmenter application 

(chunk_prefix "cd2_edge.chunk.chunk") // Prefix of the input data filenames.
                                      // The prefix will be appended with an
                                      // integer (1, 2, 3, ...).

(output_prefix     "labeled/cdab") // Prefix to use for the names of the
                                   // labeled output chunks.  This prefix will
                                   // be appended with .segmented.1,
                                   // .segmented.2, etc. 

(segment_file_name "segfile.seg") // Filename of the segment information table
                                  // that this application with generate.

(boundary_analysis 1) // This parameter should be set to 1 to enable analysis
                      // of boundaries.  Used for debugging purposes.

(threshold 0.03) // This parameter sets the threshold value of
                 // itk::WatershedSegmenter (see itkWatershedSegmenter.h for
                 // more information).

(maximum_flood_level 1.0) // This parameter can be used to limit the size of the
                          // resulting segment table if you know in advance the
                          // value you will use for flood_level in Step 4.  A
                          // safe setting is always 1.0.  Valid settings are
                          // 0.0 - 1.0 (percentage).  You can set it at or
                          // above the value you use for flood_level in Step 5.

(prune_value 0)  // This parameter is used (but expected).

(prune_as_we_go 0) // This parameter is not used (but expected).

// END SAMPLE PARAMETER FILE FOR itkSegmenter


Step 3: Run itkResolverApp
        This application resolves region equivalencies across volume chunk
        boundaries, using input from Step 1.  Its output is a table of
        equivalencies which is one of the inputs to steps 4 and 5.  Here is a
        sample parameter file.

// BEGIN SAMPLE PARAMETER FILE FOR itkResolverApp

(segment_file_name     "segfile.seg")  // Name of the segment table file
                                       // generated in Step 2.

(boundary_file_prefix  "cdab.segmented.boundary") // Name of the boundary 
                                                  // data file generated in
                                                  // Step 2.

(equivalency_file_name "equivalencyfile.eq") // Name of the equivalency table
                                             // this application will generate.

(chunk_dimensions 8 4 1) // The number of chunks in each dimension of the 
                         // volume segmented in step 2.

// END SAMPLE PARAMETER FILE FOR itkResolverApp


Step 4: Run itkEquivalenceRelabelerApp
        This application relabels the initial segmentation from Step 2 using
        the equivalencies generated in Step 3.

// BEGIN SAMPLE PARAMETER FILE FOR itkEquivalenceRelabelerApp

(chunk_file  "chunkset.record")  // Name of the header file generated in Step 2.

(chunk_prefix "cdab.segmented") // Prefix of the labeled sub-volumes generated
                                //  in Step 2

(equivalency_file_name "equivalencyfile.eq") // Filename of the equivalency
                                            // table generated in Step 3.

// END SAMPLE PARAMETER FILE FOR itkEquivalenceRelabelerApp


Step 5: Run itkTreeGeneratorApp
        This application generates a segmentation hierarchy from the segment
        information output in Step 1.  See documentation of
        itk::WatershedSegmentTreeGenerator for more information.

// BEGIN SAMPLE PARAMETER FILE FOR itkTreeGeneratorApp

(segment_file_name     "segfile.seg") // Filename of the segment tree output in 
                                      // Step 2.

(tree_file_name        "treefile.tree") // Filename of the merge hierarchy that 
                                        // will be generated in this step.

(equivalency_file_name "equivalencyfile.eq") // Filename of the equivalency
                                             // table generated in Step 3.

(max_floodlevel 0.15) // Merges are calculated until this maximum saliency
                      // level is encountered.  This value is a percentage
                      // (0.0-1.0) of the maximum level in the merge tree. 
                      // See itk::WatershedSegmentTreeGenerator for more
                      // information. 

(merge_equivalencies 1) // Set this value to 1.  Used for debugging.

// END SAMPLE PARAMETER FILE FOR itkTreeGeneratorApp
        
Step 6: Run itkRelabelerApp 
        This application relabels the segmented volume from Step 2 to produce
        another labeled volume in the hierarchy (from Step 5).

// BEGIN SAMPLE PARAMETER FILE FOR itkRelabelerApp

(chunk_file        "chunkset.record") // Header file generated in Step 2

(chunk_prefix      "cdab.segmented.equivalencyrelabeled") // Prefix of volume
                                                          // chunk files
                                                          // generated in Step 4 

(segment_tree_name "treefile.tree") // Name of segment tree hierarchy generated 
                                    // in Step 5.

(flood_level 0.08) // Level in the hierarchy to generate.  Must be less than
                  // the max_floodlevel specified in Step 5.

// END SAMPLE PARAMETER FILE FOR itkRelabelerApp 
