Visualizing and interpreting cancer genomics data via the Xena platform

MJ Goldman, B Craft, M Hastie, K Repečka… - Nature …, 2020 - nature.com
MJ Goldman, B Craft, M Hastie, K Repečka, F McDade, A Kamath, A Banerjee, Y Luo
Nature biotechnology, 2020nature.com
To the Editor—There is a great need for easy-to-use cancer genomics visualization tools for
both large public data resources such as TCGA (The Cancer Genome Atlas) 1 and the GDC
(Genomic Data Commons) 2, as well as smaller-scale datasets generated by individual labs.
Commonly used interactive visualization tools are either web-based portals or desktop
applications. Data portals have a dedicated back end and are a powerful means of viewing
centrally hosted resource datasets (for example, Xena's predecessor, the University of …
To the Editor—There is a great need for easy-to-use cancer genomics visualization tools for both large public data resources such as TCGA (The Cancer Genome Atlas) 1 and the GDC (Genomic Data Commons) 2, as well as smaller-scale datasets generated by individual labs. Commonly used interactive visualization tools are either web-based portals or desktop applications. Data portals have a dedicated back end and are a powerful means of viewing centrally hosted resource datasets (for example, Xena’s predecessor, the University of California, Santa Cruz (UCSC) Cancer Browser (currently retired3), cBioPortal4, ICGC (International Cancer Genomics Consortium) Data Portal5, GDC Data Portal2). However, researchers wishing to use a data portal to explore their own data have to either redeploy the entire platform, a difficult task even for bioinformaticians, or upload private data to a server outside the user’s control, a non-starter for protected patient data, such as germline variants (for example, MAGI (Mutation Annotation and Genome Interpretation6), WebMeV7 or Ordino8). Desktop tools can view a user’s own data securely (for example, Integrated Genomics Viewer (IGV) 9, Gitools10), but lack well-maintained, prebuilt files for the ever-evolving and expanding public data resources. This dichotomy between data portals and desktop tools highlights the challenge of using a single platform for both large public data and smaller-scale datasets generated by individual labs.
Complicating this dichotomy is the expanding amount, and complexity, of cancer genomics data resulting from numerous technological advances, including lower-cost high-throughput sequencing and single-cell-based technologies. Cancer genomics datasets are now being generated using new assays, such as whole-genome sequencing11, DNA methylation whole-genome bisulfite sequencing12 and ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing13). Visualizing and exploring these diverse data modalities is important but challenging, especially as many tools have traditionally specialized in only one or perhaps a few data types. And although these complex datasets generate insights individually, integration with other omics datasets is crucial
nature.com