Liquid biopsy-derived RNA sequencing
Liquid biopsy-derived RNA sequencing (lbRNA-seq) exhibits significant promise for clinic-oriented cancer diagnosis due to its non-invasiveness and ease of repeatability.
Liquid biopsy is a developing field of diagnosis and patient follow-up in multiple types of cancer. Fragments of circulating nucleic acids are collected in various forms from different bodily fluids, including serum, urine, or cerebrospinal fluid in order to measure the quality and quantity of these markers. Multiple types of nucleic acids can be analyzed using liquid biopsy. Circulating cell-free DNA, mitochondrial DNA, or the more stable long and small non-coding RNAs, circular RNAs, or microRNAs can be identified and measured by novel PCR and next-generation sequencing-based methods. These markers can be used to detect the previously described alterations in a minimally invasive method. These markers can be used to differentiate patients with poor or better prognoses or to identify patients who do not respond to therapy.
Liquid biopsies collect and analyze tumor components in body fluids, and there is an increasing interest in the investigation of liquid biopsies as a surrogate for tumor tissue in the management of both primary and secondary brain tumors 1).
It is widely accepted that the capture, enumeration and identification of circulating tumor cells (CTCs) hold significant promise for early cancer screening, diagnosis and prognosis. These cells originate from primary tumors and disseminate to distant sites via the blood 2) 3) 4)
Despite substantial advancements, obstacles like technical artefacts and process standardisation impede seamless clinical integration. Alongside addressing technical aspects such as normalising fluctuating low-input material and establishing a standardised clinical workflow, the lack of result validation using independent datasets remains a critical factor contributing to the often low reproducibility of liquid biopsy-detected biomarkers. Considering the outlined drawbacks, our objective was to establish a workflow/methodology characterised by: 1. Harness the rich diversity of biological features accessible through lbRNA-seq data, encompassing a holistic range of molecular and functional attributes. These components are seamlessly integrated via a Machine Learning-based Ensemble Classification framework, enabling a unified and comprehensive analysis of the intricate information encoded within the data. 2. Implementing and rigorously benchmarking intra-sample normalisation methods to heighten their relevance within clinical settings. 3. Thoroughly assessing its efficacy across independent test sets to ascertain its robustness and potential utility. Using ten datasets from several studies comprising three different sources of biological material, they first show that while the best-performing normalisation methods depend strongly on the dataset and coupled Machine Learning method, the rather simple Counts Per Million method is generally very robust, showing comparable performance to cross-sample methods. Subsequently, we demonstrate that the innovative biofeature types introduced in this study, such as the Fraction of Canonical Transcript, harbour complementary information. Consequently, their inclusion consistently enhances prediction power compared to models relying solely on gene expression-based biofeatures. Finally, they demonstrate that the workflow is robust on completely independent datasets, generally from different labs and/or different protocols. Taken together, the workflow presented here outperforms generally employed methods in prediction accuracy and may hold potential for clinical diagnostics application due to its specific design 5).