====== ComBat harmonization ====== {{rss>https://pubmed.ncbi.nlm.nih.gov/rss/search/1VCF4_1oOVawbKqAgZ5UEILSEHfRndOGSuqSD-uB2rbMLgOk7M/?limit=15&utm_campaign=pubmed-2&fc=20240405061308}} ComBat harmonization is a statistical method used to address batch effects in high-dimensional biological data, such as genomic or imaging data. Batch effects refer to systematic variations in data that are not related to the biological factors of interest but rather arise from technical sources, such as differences in experimental conditions, instruments, or operators. These batch effects can obscure true biological signals and lead to spurious results if not properly accounted for. ComBat harmonization was originally developed for [[gene expression microarray data]] but has since been adapted for various types of high-dimensional data, including imaging data from modalities like positron emission tomography (PET) or magnetic resonance imaging (MRI). The ComBat algorithm works by modeling the batch effects as additive and multiplicative factors that are specific to each sample and each feature (e.g., gene or imaging voxel). It then estimates and removes these batch effects while preserving the biological variability present in the data. This is achieved through an empirical Bayes framework that borrows information across features to improve estimation accuracy, particularly for features with limited sample sizes. The main steps of the ComBat harmonization process typically include: Data preprocessing: Standardization of data and removal of any unwanted variation or noise. Batch effect estimation: Estimation of the batch effects for each sample and feature based on the observed data. Batch effect removal: Adjustment of the data to remove the estimated batch effects, typically through linear regression. Biological signal preservation: Ensuring that the true biological variability in the data is preserved during batch effect removal. ComBat harmonization has been widely used in various fields of biomedical research, including cancer genomics, neuroimaging, and radiomics. It has been shown to effectively reduce batch effects and improve the reproducibility and interpretability of results in studies involving high-dimensional data. In the context of the provided study, ComBat harmonization was applied to PET radiomic features extracted from patients with non-small cell lung cancer (NSCLC) to standardize the features across different segmentation methods. This allowed for more reliable comparisons and predictions by mitigating the impact of variations introduced by segmentation techniques, thereby enhancing the robustness and generalizability of the study findings. ---- Hosseini et al. aimed to examine the robustness of positron emission tomography (PET) [[radiomic features]] extracted via different segmentation methods before and after ComBat harmonization in patients with non-small cell lung cancer (NSCLC). Methods: We included 120 patients (positive recurrence = 46 and negative recurrence = 74) referred for PET scanning as a routine part of their care. All patients had a biopsy-proven NSCLC. Nine segmentation methods were applied to each image, including manual delineation, K-means (KM), watershed, fuzzy-C-mean, region-growing, local active contour (LAC), and iterative thresholding (IT) with 40, 45, and 50% thresholds. Diverse image discretizations, both without a filter and with different wavelet decompositions, were applied to PET images. Overall, 6741 radiomic features were extracted from each image (749 radiomic features from each segmented area). Non-parametric empirical Bayes (NPEB) ComBat harmonization was used to harmonize the features. Linear Support Vector Classifier (LinearSVC) with L1 regularization For feature selection and Support Vector Machine classifier (SVM) with fivefold nested cross-validation was performed using StratifiedKFold with 'n_splits' set to 5 to predict recurrence in NSCLC patients and assess the impact of ComBat harmonization on the outcome. Results: From 749 extracted radiomic features, 206 (27%) and 389 (51%) features showed excellent reliability (ICC ≥ 0.90) against segmentation method variation before and after NPEB ComBat harmonization, respectively. Among all, 39 features demonstrated poor reliability, which declined to 10 after ComBat harmonization. The 64 fixed bin widths (without any filter) and wavelets (LLL)-based radiomic features set achieved the best performance in terms of robustness against diverse segmentation techniques before and after ComBat harmonization. The first-order and GLRLM and also first-order and NGTDM feature families showed the largest number of robust features before and after ComBat harmonization, respectively. In terms of predicting recurrence in NSCLC, our findings indicate that using ComBat harmonization can significantly enhance machine learning outcomes, particularly improving the accuracy of watershed segmentation, which initially had fewer reliable features than manual contouring. Following the application of ComBat harmonization, the majority of cases saw substantial increase in sensitivity and specificity. Radiomic features are vulnerable to different segmentation methods. ComBat harmonization might be considered a solution to overcome the poor reliability of radiomic features ((Hosseini SA, Shiri I, Ghaffarian P, Hajianfar G, Avval AH, Seyfi M, Servaes S, Rosa-Neto P, Zaidi H, Ay MR. The effect of harmonization on the variability of PET radiomic features extracted using various segmentation methods. Ann Nucl Med. 2024 Apr 4. doi: 10.1007/s12149-024-01923-7. Epub ahead of print. PMID: 38575814.))