Raster Size Concepts Explained – How big is my raster?
A raster in MRR format can have a virtually unlimited size. What do we mean by this? How do we measure size? We can measure it in two ways – File Size and Grid Size.
The file size is the sum, expressed as the number of bytes, of all the files that are used to store the raster. In other words, it is the amount of storage space used on your HDD, SDD or Cloud Drive to store a raster.
The grid size is the number of cells in a raster, usually expressed as the product of the number of columns and rows. In an MRR we can extend this concept, as MRR does not have a simple rectangular structure.
The file size of a raster used to be easy to calculate. You would compute the grid size (the product of the number of columns and rows) and multiply this by the size of the raster data type. So, the file size was related to the grid size. This rule does not apply for an MRR and computing the file size of an MRR is far more complicated for the following reasons –
- The grid size is not a good indication of the number of cells in an MRR.
- An MRR may have multiple fields and multiple bands in each field.
- A field may contain multiple events, each of which can contain a different amount of data.
- Each cell contains a validity flag (which may be stored per band or per field).
- There will be a full set of overview levels.
- There will be a variable amount of metadata.
- All data, including metadata, is likely to be stored in a compressed state.
The last point is the most important. MRR uses compression codecs to compress all data prior to storing it. So, the file size of an MRR depends more on the amount of incompressible information in the data than on the amount of data. Consequently, two rasters in MRR format that are the same grid size will almost certainly have different file sizes.
A common goal, when creating an MRR, is to minimise the file size. Smaller files are less expensive to store and transport and smaller data streams are faster to transfer across the internet. The following issues ought to be considered –
- Use the most efficient and most appropriate field type for the data. An ImagePalette field may result in a smaller file than an Image field. A Classified field may result in a smaller file than a Continuous field.
- Use the most compact data type for each band. For example, it may be sufficient to use a 4-byte real number rather than a higher precision 8-byte real number.
- For Continuous multi-banded fields, decide whether you need a validity flag per band or whether one for all bands will suffice.
- When using real numbers, restrict the decimal accuracy where appropriate. Do you really need to have five decimals of precision or are one or two sufficient? Restricting the decimal accuracy reduces the amount of information in the data stream and results in higher levels of compression.
- Use predictive encoding (which is applied prior to compression) to make the data stream more compressible. Predictive encoding, which is suitable for integer data types, replaces the data stream with a set of forward prediction values. It is a perfectly reversible process.
- Use a transformation (offset and scale) to transform real numbers to integer numbers for storage.
- Choose a compression codec for either speed or space. Compression rates vary between codec, but all codecs are fast to decompress. When you are creating an MRR that will be written once but read many times, you might use a codec such as LZMA(9) to minimise the file size. On the other hand, if the MRR will be written once and read once you might use a codec like LZ4 to minimise processing time.
A raster in a legacy format like Vertical Mapper GRD will have a grid size defined by M columns and N rows. The size of the raster is fixed and the number of cells in the raster is easily computable.
A raster in MRR format is different. An MRR contains a sparse array of tiles. The tiles, in a particular resolution level, are all the same size and will be M columns by N rows, but the grid size is defined by the width and length of the bounding box of the extant tiles. A tile normally contains 1024 columns by 1024 rows of cells. It can be any size, but in practice, it is always square, and the number of rows and columns is a power of 2 that ranges from 64 to 2048.
What this means is that the grid size of an MRR is not fixed. It can grow as you add tiles (it never shrinks as you cannot remove tiles, although you can invalidate all the cells in a tile). There is almost no limit to how many tiles you can define, or where they are placed. When you execute a processing operation that outputs an MRR, the operation enjoys a key advantage in that it does not need to determine the grid size in advance – you can just add tiles as you process them.
Whilst grid size is not a useful concept in an MRR, there are other measures including the Cell Range, Valid Cell Range and Extant Tile Boundary that do provide useful information. The following diagrams illustrate concepts around grid size in an MRR.
Figure 1 shows a stylised raster. The blue polygon shows the coverage of the valid cells in the raster. The X and Y axes show cell coordinates and range from about 0 to 96 on the X axis and 0 to 80 on the Y axis. The tile structure is also illustrated, shown by the solid axis lines. In this example, the tile size is 16 x 16 cells (this is an unrealistic tile size and used for illustration only). The tile origin is shown in the bottom left corner and has cell coordinate of (0, 0). Note that tiles can exist in both the positive XY quadrant or in any other quadrant where they will have negative tile coordinates. You can find out the tile size of an MRR in MapInfo Pro Advanced via the Raster Info dialog where it is reported as the “Base level Tile Size”.
In Figure 2 the black shaded polygon illustrates the extant tile boundary for this raster. In an MRR, tiles only exist where cells have been populated. In other words, tiles are created on demand and no demand exists until the user tries to set a cell value in a tile that has not yet been created. Note that this usually occurs when the user tries to define a valid cell, but it also occurs when the user tries to define an invalid cell.
The tile boundary polygon in this case is quite simple. However, in a larger, more complex raster, the boundary polygon may contain multiple rings incorporating holes and islands etc. The boundary polygon is not accessible from MapInfo Pro Advanced, but it can be acquired via API’s.
We can compute the number of cells in this MRR by adding up the number of extant tiles. In this case there are 17 tiles and so there are 17 * 256 = 4352 cells. This does not tell us how many cells are valid and how many are invalid. To compute this, we need to acquire raster statistics. In MapInfo Pro Advanced on the Statistics dialog you can see both the number of cells in all the extant tiles (“Total Cell Count”) and the number of valid cells (“Valid Cells”). You can deduce the number of invalid cells by computing the difference.
In Figure 3 the black shaded rectangle represents the cell range. In an MRR, the cell range is the bounding box of the extant tiles in the raster. By this measure, the size of the raster is 6*5*256 = 7680 cells. When you use convert operation in MapInfo Pro Advanced to convert an MRR to any other raster format, this is the raster size that is used. In MapInfo Pro Advanced you can see the cell range width and height in the Raster Info dialog reported as “Raster Size”.
In Figure 4, the dotted rectangle represents the valid cell range. This bounding box is computed when you compute statistics for a raster and is the smallest bounding box that encloses all the valid data in the raster. In MapInfo Pro Advanced the valid cell range is reported in the Statistics dialog as the “Valid Cell Extent”.
In Figure 5 the black dotted rectangle represents one possible grid size for this raster. The valid cell range bounding box is the smallest possible grid size. However, rasters will typically contain some invalid cells about the edge to provide some padding and consequently, the grid size bounding box is often slightly larger than the valid cell range. In an MRR, the grid size is equivalent to the cell range. In MapInfo Pro Advanced you can see the grid size reported in the Raster Info dialog as the “Raster Size”.
With this in mind, let us consider what happens when we convert a raster in a legacy format to MRR format in MapInfo Pro Advanced. The source raster grid size is almost never an integer multiple of the MRR tile size, so the cell range of the MRR will almost always extend further to the right and upwards. In other words, the grid size of the output raster almost always increases. This has little or no impact on the file size because there is little information in the empty additional rows and columns and so they compress away to almost nothing.
When you convert an MRR to a legacy format it finds the bounds of the tiles and sets this as the grid size in the output raster. So, if you take a legacy format raster and use MapInfo Pro Advanced to convert it to an MRR, then convert that MRR back to a legacy format, you will almost certainly find that the grid size of the output raster is larger than the original. The data is all there and in the same place, but new rows and columns will be added containing empty cells.
When you load a raster into MapInfo Pro Advanced, the raster engine will access the raster by tile – regardless of the format of the raster or whether it actually is tiled. So, in a sense, all rasters look like an MRR format raster to the raster engine. However, it remembers the original grid size. So, if you convert a legacy format raster into another legacy format (for example from Vertical Mapper GRD to GeoTIFF), then the grid size is preserved exactly.
By default, when you convert a raster to an MRR format, the operation will not write cells that are invalid. In this scenario, if a tile contains no valid cells, then it will never be called into existence. So, the convert operation may result in a smaller raster as invalid tiles have been clipped out.
This issue could be resolved in MapInfo Pro Advanced by allowing users to use the valid cell range when running a convert operation, or to allow them to enter the origin and exact grid size. For now, the following rules apply for these conversion operations –
MRR to MRR
- Empty tiles will be clipped out and the grid size may reduce.
MRR to Legacy
- The grid size extends to the edge of the extant tile boundary.
- Clip to valid cell range currently unsupported in the user interface.
- Output to specified cell range currently unsupported in the user interface.
Legacy to MRR
- Grid size is lost permanently and the grid size almost always increases.
Legacy to Legacy
- Grid size is preserved.
- Clip to valid cell range unsupported in the user interface.
- Output to specified cell range currently unsupported in the user interface.
You can download a PDF of this document here.