Earth Observation (EO) data is crucial for understanding, managing, and conserving our planet's ecosystem and its natural resources. This data enables humanity to monitor environmental changes, such as natural disasters, urban growth, and climate shifts, assisting informed decisions and proactive measures. Early EO heavily relied on statistical methods and expert domain knowledge, but the advent of machine learning has revolutionized EO data processing, enhancing efficiency and accuracy. Conventional ML models require expensive and labor-intensive data labeling. In contrast, unsupervised ML techniques can learn features from data without the need for manual labeling, making the process more efficient and cost-effective.
This thesis presents a UCL approach utilizing advanced DL models to classify EO data, referred to as UCL4EO. This approach eliminates the need for manual data labeling in training the DL model. The UCL framework comprises i) a DL model tailored for feature extraction from image data, ii) a clustering method to group deep features, and iii) a selection operation to capture representative samples from these clusters. The CNN extracts meaningful features from images, subjected to a clustering algorithm to create pseudo-labels. After identifying the initial clusters, representative samples from each cluster are chosen using the UCL selection operation to fine-tune the feature extractor. The stated process is repeated iteratively until convergence. The proposed UCL approach progressively learns and incorporates salient data features in an unsupervised manner by utilizing pseudo-labels.
UCL started as a proof of concept to show the viability of the method for binary classification on RS and aerial imagery. Specifically, the UCL framework is employed to identify water bodies using three RGB datasets, encompassing both low and high-resolution RS and aerial imagery. While UCL has been extensively examined with RGB imagery, it has been adapted to benefit from the enhanced capabilities of multi-spectral satellite imagery. This adaptation enables UCL to generalize to multi-spectral imagery from Sentinel-2 to detect forest fires in Australia. UCL undergoes subsequent improvements and is further investigated to identify utility poles in high-resolution UAV images. These gray-scale images of utility poles pose computer vision challenges, including issues like occlusion and cropping, where a significant portion of the image contains the background and only a slight appearance of the utility pole. Extensive experimentation on the mentioned tasks effectively showcases UCL's adaptive learning capabilities, producing promising results. The achieved accuracy surpassed those of supervised methods in cross-domain adaptation on similar tasks, underscoring the effectiveness of the proposed algorithm.
The scope of UCL has been extended to encompass multi-class classification tasks in the domain of RS data, referred to as Multi-class UCL. Multi-class UCL progressively acquires knowledge about various categories on multi-scale resolution. To investigate Multi-class UCL, we have used four publicly available datasets of Sentinel-2 and aerial imagery: EuroSAT, SAT-6, UCMerced, and RSSCN7. Comprehensive experiments conducted on the above-mentioned datasets revealed better cross-domain adaptation capabilities compared to supervised methods, thereby demonstrating the effectiveness of Multi-class UCL.
In these investigations, two datasets are generated using Sentinel-2 satellite imagery: one for water bodies - PakSAT and the other for Australian forest fires. However, cloud cover poses a significant challenge by obstructing the satellite's ability to capture clear images of the Earth's surface. To address this issue, available cloud masking techniques are employed to filter out images affected by cloud cover, ensuring the datasets contain only clear and usable data. Later, this thesis examines cloud detection and Cloud Optical Thickness (COT) estimation from Sentinel-2 imagery. We employed machine-learning techniques, achieving better performance than SCL designed by ESA for cloud cover tasks.
In addition to the application in RS data, UCL has been investigated in other domains of EO, such as undersea imagery. Furthermore, UCL has also been used for tasks like natural scene classification, medical imaging, and document analysis, demonstrating its versatility and broad applicability. Further exploration of UCL could involve improving the process of generating pseudo-labels through deep learning techniques.