The published website showing all the code and results can be accessed here
- The directories tcga_luad_expression, tcga_skcm_expression, tcga_brca_expression contain mutation, total expression, and exon-level expression data for lung cancer and melanoma. Fig. 3A, B, and Fig S3 use this data.
- alkati_growthcurvedata_popdoublings.csv and alkati_growthcurvedata.csv contain growth tracking data for Fig 4A
- alkati_baf3_ic50s_heatmap.csv contains the crizotinib and brigatinib IC50 data from Fig 4C and Fig S4D
- alkati_growthcurvedata_f1174mutants_raw.xlsx and alkati_growthcurvedata_popdoublings_f1174mutants.csv contain the growth tracking data from Fig S4C
- alkati_melanoma_vemurafenib_figure_data.csv are dose response studies from Fig 5A
- all_data_skcm.csv and skcm_alk_exon_expression.csv are generated by tcga_skcm_data_parser.rmd. It contains 351 melanoma patients' BRAF and NRAS mutational status, as well as ALK expression metrics that help decide which patients had an ALKATI-like expression. This data is used in Fig 2.
- luad_alk_exon_expression.csv and all_data_luad.csv is generated by TCGA_luad_data_parser.rmd. This is a summary of ALK exon expression imbalance in lung cancer patients, used in Fig S3A
- luad_egfr_exon_expression.csv is generated by TCGA_luad_data_parser.rmd and contains exon expression imbalance numbers for lung cancer patients, used in Fig S3B.
The Code directory contains .R files that are functions that the Rmd files in the analysis folder use.
- contab_downsampler.R takes a contingency table, a GOI frequency, and corrects the frequency of the positive control 1 gene in the contingency table such that it is equal to the GOI frequency. Refer to Star Methods section titled frequency correction in gene pairs and algorithm 1 of the pseudo-code for details.
- contab_simulator.R takes a contingency table of frequencies and simulates N cohorts of count data centered around the probabilities in the contingency table. N refers to the number of contingency tables generated. Refer to Star Methods section titled Pairwise Comparisons of gene pairs and pseudo-code Algorithm 2 for details.
- mut_excl_genes_generator.R generates a single contingency table given a cohort size, incidence of the gene of interest, and the odds ratios of the two gene pairs. Refer to the generating simulated cohorts section of the Star Methods and Algorithm 3 of the pseudo-code for details.
- alldata_compiler.R is a data-parsing function that generates count data of mutations in genes, given the gene name, and the name of the mutation of interest. This function can directly be used with the .mut files.
- contab_maker.R makes a 2x2 contingency table from count data.
The Analysis directory contains the Rmarkdown files that run various analysis using functions in the code repository.
- pairwisecomparisons_simulateddata.Rmd contains all analyses in Fig 1.
- tcga_skcm_data_parser.Rmd, and alkati_subsampling_simulations_2.Rmd contains all analyses in Fig 2
- tcga_luad_data_parser_egfr.Rmd, TCGA_luad_data_parser.Rmd, ALK_ExonImbalance_SKCM_Analysis.Rmd, alk_luad_mutation_bias.Rmd, and ALKATI_Filter_Cutoff_Analysis.Rmd contain analyses used in Fig 2, Fig 3, and Fig S3
- baf3_alkati_transformations.Rmd contains analyses used in Fig 4A
- Alkati_ccle_depmap_sensitivity.Rmd contains analyses used in Fig 5D, E and Fig S5A, B.
A workflowr project.