Pakistan Flood 2022 : Computing the flooded area

From June 2022 to October 2022, Pakistan experienced significant flooding in the Indus Valley that killed 1739 people. In this article, I describe how I used ProRaster Scientific to compute the area flooded based on Landsat 8 & 9 OLI_TIRS Earth Observation Imagery analysis. For more information see “2022 Pakistan floods” in Wikipedia. The image below is a part of the flooded region, showing innundation in blue colors integrated over multiple months.

The area flooded is typically described as “one third of Pakistan”, which would be close to 300,000 square kilometres. This appears to be attributable to Sherry Rehman who was Minister for Climate Change at the time. Alternative estimates have been made by UNOSAT (United Nations Satellite Centre) of 75,000 square kilometres and USAID of 85,000 square kilometres.

This problem (computing the flooded area) was set as a challenge to three researchers who showed how to go about computing this using Google Earth Engine, Microsoft Planetary Computer, and QGIS. You can find this video on the Minds Behind Maps channel on YouTube. The estimates from these presenters ranged from 6000 to 14000 square kilometres. I responded to this with my own video where I tackled the problem using ProRaster Scientific and then posted a revised video where I used new techniques I had implemented to perform the analysis. Here are the links to all of these videos and resources.

Minds Behind Maps

Comparing Google Earth Engine, QGIS & Microsoft Planetary Computer on the same analysis

Using ProRaster Scientific to map the 2022 Indus Valley Flood, Pakistan

Revised: Using ProRaster Scientific to map the 2022 Indus Valley Flood, Pakistan

I downloaded Landsat 8 & 9 scenes on Paths 151 and 152 from row 40 to 43. All available data was downloaded from June to October 2022, regardless of the amount of cloud cover. Cloud cover was, in some scenes, approaching 100%. Collection 2, Level 2 data was acquired which has been corrected to Surface Reflectance (SR) / Bottom of Atmosphere (BOA). With two adjacent paths, the area is surveyed over two days. This is repeated every 8 days.

I generated a mosaic sequence to check the quality of the data and the cloud coverage. I found that several scenes were unusable, not only due to high cloud cover, but because that cloud cover was not correctly classified. Areas of cloud were incorrectly identified as snow & ice, or as land. Masking these scenes using the supplied QA data yielded invalid results. I did not use these scenes in the data analysis.

The flood is a moving target, and no single acquisition event can measure the full extent of the flood waters. I created a composite mosaic that combined all the usable scenes over the flood period. The order of the scenes was immaterial. Each scene was masked to remove all cloud-compromised pixels. I used an external function rule – seeking the maximum value of NDWI in all pixels.

NDWI (Normalised Difference Water Index) is computed as (Green-NIR)/(Green+NIR).

This produced an imagery dataset that spanned the eight scenes and composited the data over five months. Each pixel in the dataset is assigned a value from one source scene. That scene is chosen if the NDWI index computed for that scene is higher than all other scenes at that pixel location (excluding all scenes for which that pixel value is masked). In other words, I integrated all the flooded pixels into a pseudo-image that shows all standing water over five months.

I went through the same process for a smaller set of scenes from the previous year to establish a baseline against which the 2022 data could be compared.

The image below shows the extent of flooding  (in blue) over a Google Maps background. On the left is a single acquisition event in September 2022. On the right is the integration of all data over the flood event. The dataset on the right was used to compute the flooded area.  

To my composite mosaic products, I added an NDWI Spectral Index operation, to which I applied a Calculator operation to reject all pixels with values less than 0 (land). I applied a Polygon Clip operation to clip the imagery to the Pakistan border. I executed a Statistics operation at base resolution (30 metre pixels) for this product to acquire the flooded area.

I found the flooded area in 2022 was 30986 square kilometres and in 2021 was 3264 square kilometres. The difference was 27772 square kilometres. This corrects for the measurement of lakes and other standing water, irrigation channels, and the river channel itself. However, it is probably an underestimate of the true flooded area and is certainly an underestimate of the area affected by flooding, for several reasons –

  • There was no coverage north of 30 degrees. The upper reaches of the Indus Valley were not included.
  • Urban infrastructure and housing may be flooded, but the satellite is still imaging the building rooftops, and these are not counted as water.
  • Imagery coverage is limited by the eight-day temporal repeat frequency and by significant cloud cover throughout the event.
  • The final area is dependent on which spectral index is chosen to identify water and on the cut-off level between land and water. This is completely arbitrary.
  • The spectral index may incorrectly identify water as land or vice-versa.

The area estimates acquired from Google Earth Engine and Microsoft Planetary Computer are both considerably too small. This is due to data preparation. Although they made use of all the imagery available over the flood event, they effectively used a single snapshot of the event and did not integrate over the event.

After completing the analysis, I found that the Landsat OLI Tasselled Cap Wetness index may be better at distinguishing between land-water than NDWI. The image below is a higher resolution version of the flooded extent.