TCGA癌种项目:
项目名称 癌种 样本数量
TCGA-BRCA 乳腺癌(Breast Invasive Carcinoma) 1079
TCGA-OV 卵巢浆液性囊腺癌(Ovarian Serous Cystadenocarcinoma) 571
TCGA-LUAD 肺腺癌(Lung Adenocarcinoma) 563
TCGA-UCEC 子宫内膜癌(Uterine Corpus Endometrial Carcinoma) 542
TCGA-HNSC 头颈部鳞状细胞癌(Head and Neck Squamous Cell Carcinoma) 523
TCGA-KIRC 肾细胞癌(Kidney Renal Clear Cell Carcinoma) 523
TCGA-GBM 胶质母细胞瘤(Glioblastoma Multiforme) 522
TCGA-LGG 低级别胶质瘤(Brain Lower Grade Glioma) 509
TCGA-LUSC 肺鳞状细胞癌(Lung Squamous Cell Carcinoma) 501
TCGA-THCA 甲状腺癌(Thyroid Carcinoma) 473
TCGA-PRAD 前列腺癌(Prostate Adenocarcinoma) 469
TCGA-SKCM 黑色素瘤(Skin Cutaneous Melanoma) 469
TCGA-COAD 结肠癌(Colon Adenocarcinoma) 458
TCGA-STAD 胃癌(Stomach Adenocarcinoma) 437
TCGA-BLCA 膀胱癌(Bladder Urothelial Carcinoma) 408
TCGA-LIHC 肝细胞癌(Liver Hepatocellular Carcinoma) 375
TCGA-CESC 宫颈癌(Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma) 305
TCGA-KIRP 肾乳头状细胞癌(Kidney Renal Papillary Cell Carcinoma) 289
TCGA-TGCT 睾丸生殖细胞肿瘤(Testicular Germ Cell Tumors) 261
TCGA-SARC 软组织肉瘤(Sarcoma) 255
TCGA-ESCA 食管癌(Esophageal Carcinoma) 183
TCGA-PAAD 胰腺癌(Pancreatic Adenocarcinoma) 173
TCGA-READ 直肠癌(Rectum Adenocarcinoma) 170
TCGA-PCPG 嗜铬细胞瘤/副神经节瘤(Pheochromocytoma and Paraganglioma) 169
TCGA-LAML 急性髓细胞性白血病(Acute Myeloid Leukemia) 135
TCGA-THYM 胸腺瘤(Thymoma) 97
TCGA-ACC 肾上腺皮质癌(Adrenocortical Carcinoma) 92
TCGA-MESO 恶性间皮瘤(Mesothelioma) 85
TCGA-UVM 葡萄膜黑色素瘤(Uveal Melanoma) 80
TCGA-KICH 肾嫌色细胞癌(Kidney Chromophobe) 66
TCGA-UCS 子宫梗死性肉瘤(Uterine Carcinosarcoma) 57
TCGA-CHOL 胆管癌(Cholangiocarcinoma) 50
TCGA-DLBC 弥漫性大B细胞淋巴瘤(Lymphoid Neoplasm Diffuse Large B-cell Lymphoma) 47
指定好项目名称下载即可(STAR-count转录组定量结果,其它数据类型需要自己指定):
library(TCGAbiolinks)
library(dplyr)
library(SummarizedExperiment)
library(msigdbr)

# 选择项目
class <- "TCGA-READ"

# 数据下载
query <- GDCquery(
  project = class,
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification", 
  workflow.type = "STAR - Counts"
)
GDCdownload(query = query)
data <- GDCprepare(query = query)
if (!dir.exists(paste0("./", class))) {
  dir.create(paste0("./", class))
}
Exp <- assay(data) %>% as.data.frame() # 提取数据表达
ann <- rowRanges(data) # 提取基因注释
ann <- as.data.frame(ann)
rownames(ann) <- ann$gene_id
ann <- ann[rownames(Exp),]
write.csv(ann, paste0("./", class,"/ann.csv"), row.names = F) # 基因注释信息
Exp <- cbind(data.frame(Gene = ann$gene_name), Exp)
write.csv(Exp, paste0("./", class,"/exp.csv"), row.names = F) # 表达矩阵
clinical <- GDCquery_clinic(project= class, type = "clinical") # 提取临床信息
write.csv(clinical, paste0("./", class,"/clinical.csv"), row.names = F) # 临床注释信息
结果如下:

图片

▲ count 表达矩阵

图片

▲ 样本临床、生存信息

图片

▲ 基因注释

更多推荐