Raster Statistics

Statistics can be computed for each band and component in a raster. Complicating this, statistics can also be computed for an event or over a range of events, and for the base resolution level or a specified resolution level. Beyond that, you can also compute statistics for a polygonal subset of a raster. 

The first use of statistics is to acquire an understanding of the data that is in the raster. It can give you information about outliers and bad values, skewness, and distribution. 

Statistics are frequently required for raster rendering. Rendering often requires transforming data values to a color index and, by default, the data transform will be constructed using statistics as input. This may be as simple as the minimum and maximum values or it may use the frequency distribution (histogram) to generate an equal-area histogram equalised data-color transform. 

Statistics can be used to do simple data analysis. For classified rasters, the statistical count of cells of a particular value can be used to compute the area of classifications.

 The same kind of analysis can be focussed on subsets of the data by clipping the raster to a polygon via a virtual raster intermediary and computing the statistics of the virtual raster.

 Statistics should be computed from the data at the base resolution level. This is the unmodified and uninterpolated raster source data. For a large raster, this may take a significant amount of time. When you create an MRR, it is performed as a standard part of the raster finalisation.

 If statistics are not available and are required immediately, then they can be computed on-the-fly from an overview resolution level. If an appropriate level is targeted, the computation of statistics will be rapid. However, the statistics will only approximate the base level statistics.

 An MRR stores statistics in the file so that they are available thenceforth. In the MapInfo raster system, other rasters can have their statistics stored in the GHX file. Statistics are only saved into the GHX file if they have been computed for the base resolution level. This may not occur naturally, but you can force it to happen via the statistics dialog.

 In the MapInfo Pro raster system, the type of statistics that are computed depend on the field and band data types. The levels are Count, Summary, Distribution, and Spatial. By default, the highest supported level of statistics is computed.

 “Count” statistics simply count all the valid and invalid cells in the raster. Valid cells may have valid values or invalid (floating point) values. Invalid cells may be either Empty or Null. This distinction is used in events and editing.

“Summary” statistics record the data range including the minimum and maximum valid values, the mean, variance and standard deviation, and the signal to noise ratio.

“Distribution” statistics computes a data distribution histogram, and from this dataset it can compute quartiles, the median, and the mode.

“Spatial” statistics record the difference in the data values from one cell to the next cell including the minimum and maximum differences, the mean, variance and standard deviation. This information is used in hill shade rendering.

The MapInfo raster engine uses a single pass to compute all statistics. For a large raster, using a single pass improves performance considerably because the bulk of the time required is consumed reading the data tiles from storage and decompressing the data stream. However, computing a histogram with a single pass is challenging. Traditionally, you would compute summary statistics in the first pass, then (knowing the data range) bin the values in the second pass to build the histogram. Instead, the histogram is “accumulated” in a single pass. This has unique advantages. It tends to isolate outliers and the quality of the histogram is not influenced by outliers.

The type of statistics computed firstly depends on the data field type. For a continuous field, we will compute spatial statistics. For an image field, we will compute distribution statistics. For classified and image palette fields we compute distribution statistics.

However, the type of statistics computed also depends on the band data type. For decimal and integer data we can compute spatial statistics. For multi-component color (like RGB) we only compute count statistics, but for single component color we compute distribution statistics. For other data types like complex numbers, strings, and time we generally only compute count statistics.