长沙 汽车 网站建设,九江建网站,简单免费制作手机网站,国际站关键词推广目录
一、实验介绍
1. 论文#xff1a;基于多模态深度学习方法的单细胞多组学数据聚类
Abstract
2. Github链接
二、实验环境
0. 作者要求
1. 环境复现
实验一
实验二#xff08;本实验#xff09;
2. 库版本介绍
实验一
实验二
3. IDE
三、实验内容
1. 用法…目录
一、实验介绍
1. 论文基于多模态深度学习方法的单细胞多组学数据聚类
Abstract
2. Github链接
二、实验环境
0. 作者要求
1. 环境复现
实验一
实验二本实验
2. 库版本介绍
实验一
实验二
3. IDE
三、实验内容
1. 用法
2. 输出
3. 参数
4. run_scMDC
设置超参数
对scRNA-seq数据进行预处理
构建scMultiCluster模型
预训练
使用KMeans确定聚类数k
微调模型,训练聚类层
保存预测结果和embedding到文件
输出预测结果y_pred,计算指标AMI、NMI和ARI
5. 聚类结果可视化
可视化结果
附录run_scMDC训练过程 一、实验介绍
1. 论文基于多模态深度学习方法的单细胞多组学数据聚类
Clustering of single-cell multi-omics data with a multimodal deep learning method | Nature Communicationshttps://www.nature.com/articles/s41467-022-35031-9
Abstract Single-cell multimodal sequencing technologies are developed to simultaneously profile different modalities of data in the same cell. It provides a unique opportunity to jointly analyze multimodal data at the single-cell level for the identification of distinct cell types. A correct clustering result is essential for the downstream complex biological functional studies. However, combining different data sources for clustering analysis of single-cell multimodal data remains a statistical and computational challenge. Here, we develop a novel multimodal deep learning method, scMDC, for single-cell multi-omics data clustering analysis. scMDC is an end-to-end deep model that explicitly characterizes different data sources and jointly learns latent features of deep embedding for clustering analysis. Extensive simulation and real-data experiments reveal that scMDC outperforms existing single-cell single-modal and multimodal clustering methods on different single-cell multimodal datasets. The linear scalability of running time makes scMDC a promising method for analyzing large multimodal datasets. 开发单细胞多模态测序技术以同时分析同一细胞中的不同数据模式。它提供了一个独特的机会可以在单细胞水平上联合分析多模态数据以鉴定不同的细胞类型。正确的聚类结果对于下游复杂的生物学功能研究至关重要。然而组合不同的数据源对单细胞多模态数据进行聚类分析仍然是一个统计和计算挑战。在这里我们开发了一种新颖的多模态深度学习方法scMDC用于单细胞多组学数据聚类分析。scMDC 是一个端到端的深度模型它明确表征不同的数据源并共同学习深度嵌入的潜在特征以进行聚类分析。大量的仿真和真实数据实验表明scMDC在不同的单细胞多模态数据集上优于现有的单细胞单模态和多模态聚类方法。运行时的线性可扩展性使scMDC成为分析大型多模态数据集的有前途的方法。 2. Github链接
GitHub - xianglin226/scMDC单细胞多组学深度聚类https://github.com/xianglin226/scMDC 二、实验环境
0. 作者要求
Python 3.8.1Pytorch 1.6.0Scanpy 1.6.0SKlearn 0.22.1Numpy 1.18.1h5py 2.9.0 本研究中scMDC的所有实验都是在Nvidia Tesla P10016GGPU上进行的。我们建议在 conda 环境中安装 conda 环境 conda create -n scMDC。scMDC 对包含 5000 cells的数据集进行聚类大约需要 3分钟。 1. 环境复现 未重新配置环境继续使用前文深度学习系列文章的环境实践表明可行
conda create -n DL python3.7
conda activate DL
pip install torch1.8.1cu102 torchvision0.9.1cu102 torchaudio0.8.1 -f https://download.pytorch.org/whl/torch_stable.htmlconda install matplotlib
conda install scikit-learn
实验一
conda install pandas
conda install seaborn
conda install networkx
conda install statsmodels
pip install pyHSICLasso
注本人的实验环境按照上述顺序安装各种库若想尝试一起安装天知道会不会出问题
实验二本实验
pip install scanpy 2. 库版本介绍 自己配置的py3.7环境各版本介绍
软件包本实验版本目前最新版matplotlib3.5.33.8.0numpy1.21.61.26.0python3.7.16scikit-learn0.22.11.3.0torch1.8.1cu1022.0.1torchaudio0.8.12.0.2torchvision0.9.1cu1020.15.2
实验一
networkx2.6.33.1pandas1.2.32.1.1pyHSICLasso1.4.21.4.2seaborn0.12.20.13.0statsmodels0.13.50.14.0
实验二 Installing collected packages: stdlib_list, natsort, llvmlite, h5py, session-info, numba, pynndescent, anndata, umap-learn, scanpy
Successfully installed anndata-0.8.0 h5py-3.8.0 llvmlite-0.39.1 natsort-8.4.0 numba-0.56.4 pynndescent-0.5.10 scanpy-1.9.3 session-info-1.0.0 stdlib_list-0.9.0 umap-learn-0.5.4 3. IDE 建议使用Pycharm
win11 安装 Anaconda2022.10pycharm2022.3/2023.1.4配置虚拟环境_QomolangmaH的博客-CSDN博客https://blog.csdn.net/m0_63834988/article/details/128693741https://blog.csdn.net/m0_63834988/article/details/128693741https://blog.csdn.net/m0_63834988/article/details/128693741https://blog.csdn.net/m0_63834988/article/details/128693741 三、实验内容
1. 用法 准备 h5 格式的输入数据。参阅“数据集”文件夹中的自述文件根据“脚本”文件夹中的运行脚本运行scMDC如果您处理mRNAATAC数据并使用run_scMDC_batch.py进行多批次数据聚类请注意参数设置基于训练良好的 scMDC 模型通过run_LRP.py运行 DE 分析请参阅“脚本”文件夹中的 LRP 运行脚本 2. 输出 scMDC输出数据的潜在表示可用于进一步的下游分析并通过t-SNE或Umap可视化;多批次 scMDC 输出集成数据集的潜在表示在其上校正批处理效应。LRP输出一个基因等级指示基因对给定簇的重要性可用于通路分析。 3. 参数
--n_clusters簇数K;如果此参数设置为 -1scMDC 将估计 K。--cutoff一个周期的比率在此之前模型只训练低级自动编码器。--batch_size批量大小。--data_file数据输入的路径。 数据格式H5。结构X1RNAX2ADT或ATACY标签如果退出批次多批次数据聚类的批次指示符。--maxiter训练的最大时期。默认值10000。--pretrain_epochs预训练的周期数。默认值400。--gamma聚类损失系数。默认值0.1。--phi1 和 phi2预训练和聚类阶段的 KL 损失系数。默认值CITE-Seq 为 0.001;0.005 表示 SMAGE-Seq*。--update_interval检查性能的间隔。默认值1。 --tol停止模型的条件即更改标签的百分比。--tol0.001。--ae_weights权重文件的路径。--save_dir存储输出的目录。--ae_weight_file存储权重的目录。--resolution用于估计 k 的分辨率参数默认值0.2。--n_neighbors估计 K 的n_neighbors参数默认值30。--embedding_file如果保存嵌入文件。默认值否--prediction_file如果保存预测文件。默认值否 --encodeLayerRNA 的低电平编码器层默认值CITE-Seq 的 [256,64,32,16];[256,128,64] 对于 SMAGE-seq。--decodeLayer1ADT 的低级编码器层默认CITE-Seq 的 [16,64,256]。[64,128,256] 为 SMAGE-seq。--decodeLayer2高级编码器的层。默认值[16,20] 表示 CITE-Seq。[64,128,256] 为 SMAGE-seq。--sigma1RNA数据上的噪声。默认值2.5。--sigma2ADT数据上的噪声。默认值CITE-Seq 为 1.5;2.5 对于 SMAGE-Seq--filter1如果对基因进行特征选择。默认值否。--filter2如果在ATAC上进行功能选择。默认值否。--f1如果进行羽毛选择则用于聚类的高可变基因在X1中的数量。默认值2000 --f2如果进行羽毛选择则来自 ATAC在 X2 中的高变量基因数用于聚类。默认值2000*为方便起见我们将 10X 单细胞多组 ATAC 基因表达技术表示为 SMAGE-seq。 4. run_scMDC 利用scMultiCluster模型联合利用scRNA-seq和其他模态数据进行细胞类型的无监督聚类,并评估了聚类效果。
设置超参数 读取数据数据包括两个模态:scRNA-seq读数矩阵X1和另一种基因组数据X2(如ADT/ATAC),以及标签y。 import argparseparser argparse.ArgumentParser(descriptiontrain,formatter_classargparse.ArgumentDefaultsHelpFormatter)parser.add_argument(--n_clusters, default12, typeint)parser.add_argument(--cutoff, default0.5, typefloat,helpStart to train combined layer after what ratio of epoch)parser.add_argument(--batch_size, default256, typeint)# parser.add_argument(--data_file, defaultf../datasets/SMAGESeq_10X_pbmc_10k_granulocyte_plus.h5)parser.add_argument(--data_file, defaultf../datasets/output500.h5)parser.add_argument(--maxiter, default5000, typeint)parser.add_argument(--pretrain_epochs, default200, typeint)parser.add_argument(--gamma, default.1, typefloat,helpcoefficient of clustering loss)parser.add_argument(--tau, default.1, typefloat,helpfuzziness of clustering loss)parser.add_argument(--phi1, default0.005, typefloat,helpcoefficient of KL loss in pretraining stage)parser.add_argument(--phi2, default0.005, typefloat,helpcoefficient of KL loss in clustering stage)parser.add_argument(--update_interval, default1, typeint)parser.add_argument(--tol, default0.001, typefloat)parser.add_argument(--lr, default1., typefloat)parser.add_argument(--ae_weights, defaultNone)parser.add_argument(--save_dir, defaultatac_pbmc10k)parser.add_argument(--ae_weight_file, defaultAE_weights_pbmc10k.pth.tar)parser.add_argument(--resolution, default0.2, typefloat)parser.add_argument(--n_neighbors, default30, typeint)parser.add_argument(--embedding_file, defaultTrue)parser.add_argument(--prediction_file, defaultTrue)parser.add_argument(-el, --encodeLayer, nargs, default[256, 128, 64])parser.add_argument(-dl1, --decodeLayer1, nargs, default[64, 128, 256])parser.add_argument(-dl2, --decodeLayer2, nargs, default[64, 128, 256])parser.add_argument(--sigma1, default2.5, typefloat)parser.add_argument(--sigma2, default2.5, typefloat)parser.add_argument(--f1, default1000, typefloat, helpNumber of mRNA after feature selection)parser.add_argument(--f2, default1000, typefloat, helpNumber of ADT/ATAC after feature selection)parser.add_argument(--filter1, defaultTrue, helpDo mRNA selection)parser.add_argument(--filter2, defaultTrue, helpDo ADT/ATAC selection)parser.add_argument(--run, default1, typeint)parser.add_argument(--device, defaultcpu)args parser.parse_args()print(args)
输出
Namespace(ae_weight_fileAE_weights_pbmc10k.pth.tar, ae_weightsNone, batch_size256, cutoff0.5, data_file../datasets/output500.h5, decodeLayer1[64, 128, 256], decodeLayer2[64, 128, 256], devicecpu, embedding_fileTrue, encodeLayer[256, 128, 64], f11000, f21000, filter1True, filter2True, gamma0.1, lr1.0, maxiter5000, n_clusters12, n_neighbors30, phi10.005, phi20.005, prediction_fileTrue, pretrain_epochs200, resolution0.2, run1, save_diratac_pbmc10k, sigma12.5, sigma22.5, tau0.1, tol0.001, update_interval1) 对scRNA-seq数据进行预处理 规范化,批量效应校正,log变换等。 data_mat h5py.File(args.data_file)x1 np.array(data_mat[X1])x2 np.array(data_mat[X2])y np.array(data_mat[Y])data_mat.close()# #Gene filter# if args.filter1:# importantGenes geneSelection(x1, nargs.f1, plotFalse)# x1 x1[:, importantGenes]# if args.filter2:# importantGenes geneSelection(x2, nargs.f2, plotFalse)# x2 x2[:, importantGenes]print(------------------------------------)print(x1.shape)print(x2.shape)# preprocessing scRNA-seq read counts matrixadata1 sc.AnnData(x1)adata1.obs[Group] yadata1 read_dataset(adata1,transposeFalse,test_splitFalse,copyTrue)adata1 normalize(adata1,size_factorsTrue,normalize_inputTrue,filter_min_countsFalse,logtrans_inputTrue,nor0)adata2 sc.AnnData(x2)adata2.obs[Group] yadata2 read_dataset(adata2,transposeFalse,test_splitFalse,copyTrue)adata2 normalize(adata2,size_factorsTrue,normalize_inputTrue,logtrans_inputTrue)# adata2 clr_normalize_each_cell(adata2)# adata1.write_h5ad(adata1.h5ad) # 保存adata1为h5ad文件# adata2.write_h5ad(adata2.h5ad) # 保存adata2为h5ad文件# print(------------------------------------)# print(adata1.X.shape)# print(adata2.X.shape)input_size1 adata1.n_varsinput_size2 adata2.n_varsprint(args)
输出
(11020, 500)
(11020, 500)
Namespace(ae_weight_fileAE_weights_pbmc10k.pth.tar, ae_weightsNone, batch_size256, cutoff0.5, data_file../datasets/output500.h5, decodeLayer1[64, 128, 256], decodeLayer2[64, 128, 256], devicecpu, embedding_fileTrue, encodeLayer[256, 128, 64], f11000, f21000, filter1True, filter2True, gamma0.1, lr1.0, maxiter5000, n_clusters12, n_neighbors30, phi10.005, phi20.005, prediction_fileTrue, pretrain_epochs200, resolution0.2, run1, save_diratac_pbmc10k, sigma12.5, sigma22.5, tau0.1, tol0.001, update_interval1) 构建scMultiCluster模型
该模型由一个共享的encoder和两个私有的decoder组成编码器来连接来自不同模态的数据并使用两个解码器来单独解码来自每个模态的数据。其中使用 ZINB 损失作为重构损失因为该损失能很好的表征大量丢失且稀疏的数据实际应用中已经发现可以有效地拟合scRNA-seq数据并改善表示学习和聚类结果。而使用KL损失可以吸引相似的细胞并分离不同的细胞帮助得到更好的聚类结果。通过加入Clustering损失能直接针对聚类结果进行优化实现最优的聚类结果。 encodeLayer list(map(int, args.encodeLayer))decodeLayer1 list(map(int, args.decodeLayer1))decodeLayer2 list(map(int, args.decodeLayer2))model scMultiCluster(input_dim1input_size1, input_dim2input_size2, tauargs.tau,encodeLayerencodeLayer, decodeLayer1decodeLayer1, decodeLayer2decodeLayer2,activationelu, sigma1args.sigma1, sigma2args.sigma2, gammaargs.gamma, cutoff args.cutoff, phi1args.phi1, phi2args.phi2, deviceargs.device).to(args.device)print(str(model)) 输出
scMultiCluster((encoder): Sequential((0): Linear(in_features1000, out_features256, biasTrue)(1): BatchNorm1d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(2): ELU(alpha1.0)(3): Linear(in_features256, out_features128, biasTrue)(4): BatchNorm1d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(5): ELU(alpha1.0)(6): Linear(in_features128, out_features64, biasTrue)(7): BatchNorm1d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(8): ELU(alpha1.0))(decoder1): Sequential((0): Linear(in_features64, out_features128, biasTrue)(1): BatchNorm1d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(2): ELU(alpha1.0)(3): Linear(in_features128, out_features256, biasTrue)(4): BatchNorm1d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(5): ELU(alpha1.0))(decoder2): Sequential((0): Linear(in_features64, out_features128, biasTrue)(1): BatchNorm1d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(2): ELU(alpha1.0)(3): Linear(in_features128, out_features256, biasTrue)(4): BatchNorm1d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(5): ELU(alpha1.0))(dec_mean1): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): MeanAct())(dec_disp1): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): DispAct())(dec_mean2): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): MeanAct())(dec_disp2): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): DispAct())(dec_pi1): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): Sigmoid())(dec_pi2): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): Sigmoid())(zinb_loss): ZINBLoss()
)
预训练 此时只使用了ZINB损失和KL损失 if not os.path.exists(args.save_dir):os.makedirs(args.save_dir)t0 time()if args.ae_weights is None:model.pretrain_autoencoder(X1adata1.X, X_raw1adata1.raw.X, sf1adata1.obs.size_factors, X2adata2.X, X_raw2adata2.raw.X, sf2adata2.obs.size_factors, batch_sizeargs.batch_size, epochsargs.pretrain_epochs, ae_weightsargs.ae_weight_file)else:if os.path.isfile(args.ae_weights):print( loading checkpoint {}.format(args.ae_weights))checkpoint torch.load(args.ae_weights)model.load_state_dict(checkpoint[ae_state_dict])else:print( no checkpoint found at {}.format(args.ae_weights))raise ValueErrorprint(Pretraining time: %d seconds. % int(time() - t0)) 使用KMeans确定聚类数k #get klatent model.encodeBatch(torch.tensor(adata1.X).to(args.device), torch.tensor(adata2.X).to(args.device))latent latent.cpu().numpy()if args.n_clusters -1:n_clusters GetCluster(latent, resargs.resolution, nargs.n_neighbors)else:print(n_cluster is defined as str(args.n_clusters))n_clusters args.n_clusters n_cluster is defined as 12 微调模型,训练聚类层 正式的训练过程此时加入了聚类损失进行模型的训练从而得到较好的聚类结果 y_pred, _, _, _, _ model.fit(X1adata1.X, X_raw1adata1.raw.X, sf1adata1.obs.size_factors,X2adata2.X, X_raw2adata2.raw.X, sf2adata2.obs.size_factors, yy,n_clustersn_clusters, batch_sizeargs.batch_size, num_epochsargs.maxiter,update_intervalargs.update_interval, tolargs.tol, lrargs.lr,save_dirargs.save_dir)print(Total time: %d seconds. % int(time() - t0)) 保存预测结果和embedding到文件 if args.prediction_file:y_pred_ best_map(y, y_pred) - 1np.savetxt(args.save_dir / str(args.run) _pred.csv, y_pred_, delimiter,)if args.embedding_file:final_latent model.encodeBatch(torch.tensor(adata1.X).to(args.device), torch.tensor(adata2.X).to(args.device))final_latent final_latent.cpu().numpy()np.savetxt(args.save_dir / str(args.run) _embedding.csv, final_latent, delimiter,)输出预测结果y_pred,计算指标AMI、NMI和ARI y_pred_ best_map(y, y_pred)ami np.round(metrics.adjusted_mutual_info_score(y, y_pred), 5)nmi np.round(metrics.normalized_mutual_info_score(y, y_pred), 5)ari np.round(metrics.adjusted_rand_score(y, y_pred), 5)print(Final: AMI %.4f, NMI %.4f, ARI %.4f % (ami, nmi, ari))
实验结果
Final: AMI 0.6565, NMI 0.6575, ARI 0.6813 5. 聚类结果可视化
import umap
import matplotlib.pyplot as plt
import numpy as np
import pandas as pdumap_model umap.UMAP()
# 读取CSV文件
encoded_data pd.read_csv(./atac_pbmc10k/1_embedding.csv, headerNone)
cluster_labels pd.read_csv(./atac_pbmc10k/1_pred.csv, headerNone)
unique_labels np.unique(cluster_labels.astype(int))# 对编码结果进行UMAP转换
umap_data umap_model.fit_transform(encoded_data)
fig, ax plt.subplots()# 使用Matplotlib绘制UMAP转换结果的散点图
for label in unique_labels:# 提取属于当前簇的数据点的索引indices np.where(cluster_labels label)[0]# 获取属于当前簇的数据点的UMAP坐标cluster_points umap_data[indices, :]# 使用不同的颜色标记当前簇的数据点ax.scatter(cluster_points[:, 0], cluster_points[:, 1], labelfCluster {label})
ax.set_xlabel(UMAP Dimension 1)
ax.set_ylabel(UMAP Dimension 2)
ax.set_title(UMAP Visualization with Cluster Colors)# 显示图例并将图例放置在右上角的位置
ax.legend(bbox_to_anchor(1.02, 1), locupper left)
x_min, x_max np.min(umap_data[:, 0]), np.max(umap_data[:, 0])
y_min, y_max np.min(umap_data[:, 1]), np.max(umap_data[:, 1])
x_margin (x_max - x_min) * 0.1
y_margin (y_max - y_min) * 0.1
plt.xlim(x_min - x_margin, x_max x_margin)
plt.ylim(y_min - y_margin, y_max y_margin)
# 调整子图布局以缩小UMAP结果的整体尺寸
plt.tight_layout()
# 保存为图片
plt.savefig(umap_visualization.png)
plt.show()
plt.close() 可视化结果 附录run_scMDC训练过程 ### Autoencoder: Successfully preprocessed 500 genes and 11020 cells.
F:\Programming\PycharmProjects\DeepLearning\bio\class2\scMDC-master\scMDC-master\src\run_scMDC.py:106: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtypeX.dtype, ...) to get the future behavour.adata2 sc.AnnData(x2)
### Autoencoder: Successfully preprocessed 500 genes and 11020 cells.
Namespace(ae_weight_fileAE_weights_pbmc10k.pth.tar, ae_weightsNone, batch_size256, cutoff0.5, data_file../datasets/output500.h5, decodeLayer1[64, 128, 256], decodeLayer2[64, 128, 256], devicecpu, embedding_fileTrue, encodeLayer[256, 128, 64], f11000, f21000, filter1True, filter2True, gamma0.1, lr1.0, maxiter5000, n_clusters12, n_neighbors30, phi10.005, phi20.005, prediction_fileTrue, pretrain_epochs200, resolution0.2, run1, save_diratac_pbmc10k, sigma12.5, sigma22.5, tau0.1, tol0.001, update_interval1)
scMultiCluster((encoder): Sequential((0): Linear(in_features1000, out_features256, biasTrue)(1): BatchNorm1d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(2): ELU(alpha1.0)(3): Linear(in_features256, out_features128, biasTrue)(4): BatchNorm1d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(5): ELU(alpha1.0)(6): Linear(in_features128, out_features64, biasTrue)(7): BatchNorm1d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(8): ELU(alpha1.0))(decoder1): Sequential((0): Linear(in_features64, out_features128, biasTrue)(1): BatchNorm1d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(2): ELU(alpha1.0)(3): Linear(in_features128, out_features256, biasTrue)(4): BatchNorm1d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(5): ELU(alpha1.0))(decoder2): Sequential((0): Linear(in_features64, out_features128, biasTrue)(1): BatchNorm1d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(2): ELU(alpha1.0)(3): Linear(in_features128, out_features256, biasTrue)(4): BatchNorm1d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(5): ELU(alpha1.0))(dec_mean1): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): MeanAct())(dec_disp1): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): DispAct())(dec_mean2): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): MeanAct())(dec_disp2): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): DispAct())(dec_pi1): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): Sigmoid())(dec_pi2): Sequential((0): Linear(in_features256, out_features500, biasTrue)(1): Sigmoid())(zinb_loss): ZINBLoss()
)
Pretraining stage
Pretrain epoch 1, Total loss:1.782213, ZINB loss1:0.803113, ZINB loss2:0.979099, KL loss:0.000000
Pretrain epoch 11, Total loss:1.526047, ZINB loss1:0.686576, ZINB loss2:0.839471, KL loss:0.000000
Pretrain epoch 21, Total loss:1.516047, ZINB loss1:0.680581, ZINB loss2:0.835466, KL loss:0.000000
Pretrain epoch 31, Total loss:1.513802, ZINB loss1:0.679482, ZINB loss2:0.834320, KL loss:0.000000
Pretrain epoch 41, Total loss:1.510133, ZINB loss1:0.677538, ZINB loss2:0.832595, KL loss:0.000000
Pretrain epoch 51, Total loss:1.509518, ZINB loss1:0.676798, ZINB loss2:0.832721, KL loss:0.000000
Pretrain epoch 61, Total loss:1.505354, ZINB loss1:0.675105, ZINB loss2:0.830248, KL loss:0.000000
Pretrain epoch 71, Total loss:1.503640, ZINB loss1:0.673856, ZINB loss2:0.829784, KL loss:0.000000
Pretrain epoch 81, Total loss:1.501737, ZINB loss1:0.672963, ZINB loss2:0.828775, KL loss:0.000000
Pretrain epoch 91, Total loss:1.500950, ZINB loss1:0.672635, ZINB loss2:0.828315, KL loss:0.000000
Pretrain epoch 101, Total loss:1.501092, ZINB loss1:0.671503, ZINB loss2:0.827678, KL loss:0.382204
Pretrain epoch 111, Total loss:1.500101, ZINB loss1:0.671465, ZINB loss2:0.826861, KL loss:0.355161
Pretrain epoch 121, Total loss:1.500591, ZINB loss1:0.671512, ZINB loss2:0.827387, KL loss:0.338605
Pretrain epoch 131, Total loss:1.500021, ZINB loss1:0.671240, ZINB loss2:0.827154, KL loss:0.325242
Pretrain epoch 141, Total loss:1.497220, ZINB loss1:0.669683, ZINB loss2:0.825961, KL loss:0.315069
Pretrain epoch 151, Total loss:1.496627, ZINB loss1:0.669643, ZINB loss2:0.825458, KL loss:0.305160
Pretrain epoch 161, Total loss:1.495756, ZINB loss1:0.669362, ZINB loss2:0.824900, KL loss:0.298926
Pretrain epoch 171, Total loss:1.495618, ZINB loss1:0.669167, ZINB loss2:0.824976, KL loss:0.295155
Pretrain epoch 181, Total loss:1.495205, ZINB loss1:0.669032, ZINB loss2:0.824735, KL loss:0.287623
Pretrain epoch 191, Total loss:1.495886, ZINB loss1:0.669239, ZINB loss2:0.825226, KL loss:0.284036
Pretraining time: 1238 seconds.
n_cluster is defined as 12
Clustering stage
Initializing cluster centers with kmeans.
Initializing k-means: AMI 0.6138, NMI 0.6150, ARI 0.5139
Clustering 1: AMI 0.6138, NMI 0.6150, ARI 0.5139 acc 0.6455
#Epoch 1: Total: 2.262678 Clustering Loss: 0.513242 ZINB Loss1: 0.666067 ZINB Loss2: 0.822290 KL Loss: 0.261079
Clustering 2: AMI 0.6151, NMI 0.6163, ARI 0.5216 acc 0.6529
Clustering 3: AMI 0.6148, NMI 0.6160, ARI 0.5194 acc 0.6439
Clustering 4: AMI 0.6161, NMI 0.6173, ARI 0.5207 acc 0.6378
Clustering 5: AMI 0.6162, NMI 0.6174, ARI 0.5217 acc 0.6398
Clustering 6: AMI 0.6169, NMI 0.6181, ARI 0.5223 acc 0.6431
Clustering 7: AMI 0.6165, NMI 0.6177, ARI 0.5244 acc 0.6524
Clustering 8: AMI 0.6174, NMI 0.6186, ARI 0.5276 acc 0.6597
Clustering 9: AMI 0.6172, NMI 0.6184, ARI 0.5238 acc 0.6427
Clustering 10: AMI 0.6177, NMI 0.6188, ARI 0.5240 acc 0.6410
Clustering 11: AMI 0.6174, NMI 0.6186, ARI 0.5234 acc 0.6428
#Epoch 11: Total: 1.948029 Clustering Loss: 0.232234 ZINB Loss1: 0.667498 ZINB Loss2: 0.823401 KL Loss: 0.224896
Clustering 12: AMI 0.6161, NMI 0.6173, ARI 0.5236 acc 0.6477
Clustering 13: AMI 0.6163, NMI 0.6175, ARI 0.5221 acc 0.6398
Clustering 14: AMI 0.6162, NMI 0.6174, ARI 0.5246 acc 0.6507
Clustering 15: AMI 0.6162, NMI 0.6174, ARI 0.5252 acc 0.6567
Clustering 16: AMI 0.6160, NMI 0.6172, ARI 0.5253 acc 0.6532
Clustering 17: AMI 0.6171, NMI 0.6183, ARI 0.5364 acc 0.6773
Clustering 18: AMI 0.6163, NMI 0.6176, ARI 0.5235 acc 0.6440
Clustering 19: AMI 0.6179, NMI 0.6191, ARI 0.5318 acc 0.6679
Clustering 20: AMI 0.6161, NMI 0.6173, ARI 0.5235 acc 0.6448
Clustering 21: AMI 0.6174, NMI 0.6186, ARI 0.5292 acc 0.6605
#Epoch 21: Total: 1.904330 Clustering Loss: 0.193283 ZINB Loss1: 0.667779 ZINB Loss2: 0.823706 KL Loss: 0.219561
Clustering 22: AMI 0.6159, NMI 0.6171, ARI 0.5233 acc 0.6420
Clustering 23: AMI 0.6151, NMI 0.6163, ARI 0.5231 acc 0.6491
Clustering 24: AMI 0.6157, NMI 0.6169, ARI 0.5229 acc 0.6378
Clustering 25: AMI 0.6172, NMI 0.6184, ARI 0.5334 acc 0.6711
Clustering 26: AMI 0.6158, NMI 0.6170, ARI 0.5265 acc 0.6700
Clustering 27: AMI 0.6181, NMI 0.6193, ARI 0.5386 acc 0.6788
Clustering 28: AMI 0.6158, NMI 0.6170, ARI 0.5239 acc 0.6397
Clustering 29: AMI 0.6188, NMI 0.6200, ARI 0.5383 acc 0.6785
Clustering 30: AMI 0.6171, NMI 0.6183, ARI 0.5255 acc 0.6495
Clustering 31: AMI 0.6174, NMI 0.6186, ARI 0.5264 acc 0.6532
#Epoch 31: Total: 1.881909 Clustering Loss: 0.174528 ZINB Loss1: 0.667321 ZINB Loss2: 0.823386 KL Loss: 0.216674
Clustering 32: AMI 0.6159, NMI 0.6171, ARI 0.5226 acc 0.6438
Clustering 33: AMI 0.6182, NMI 0.6194, ARI 0.5309 acc 0.6648
Clustering 34: AMI 0.6159, NMI 0.6171, ARI 0.5229 acc 0.6410
Clustering 35: AMI 0.6166, NMI 0.6178, ARI 0.5244 acc 0.6443
Clustering 36: AMI 0.6157, NMI 0.6169, ARI 0.5232 acc 0.6404
Clustering 37: AMI 0.6179, NMI 0.6190, ARI 0.5260 acc 0.6488
Clustering 38: AMI 0.6183, NMI 0.6195, ARI 0.5290 acc 0.6600
Clustering 39: AMI 0.6179, NMI 0.6191, ARI 0.5312 acc 0.6658
Clustering 40: AMI 0.6181, NMI 0.6193, ARI 0.5320 acc 0.6681
Clustering 41: AMI 0.6186, NMI 0.6198, ARI 0.5366 acc 0.6754
#Epoch 41: Total: 1.867074 Clustering Loss: 0.161755 ZINB Loss1: 0.667553 ZINB Loss2: 0.823085 KL Loss: 0.214682
Clustering 42: AMI 0.6179, NMI 0.6191, ARI 0.5297 acc 0.6630
Clustering 43: AMI 0.6173, NMI 0.6185, ARI 0.5256 acc 0.6487
Clustering 44: AMI 0.6150, NMI 0.6162, ARI 0.5231 acc 0.6488
Clustering 45: AMI 0.6175, NMI 0.6187, ARI 0.5279 acc 0.6546
Clustering 46: AMI 0.6186, NMI 0.6198, ARI 0.5315 acc 0.6661
Clustering 47: AMI 0.6180, NMI 0.6192, ARI 0.5317 acc 0.6664
Clustering 48: AMI 0.6171, NMI 0.6183, ARI 0.5326 acc 0.6680
Clustering 49: AMI 0.6186, NMI 0.6198, ARI 0.5321 acc 0.6668
Clustering 50: AMI 0.6171, NMI 0.6183, ARI 0.5286 acc 0.6609
Clustering 51: AMI 0.6186, NMI 0.6198, ARI 0.5317 acc 0.6655
#Epoch 51: Total: 1.854746 Clustering Loss: 0.151936 ZINB Loss1: 0.667286 ZINB Loss2: 0.822858 KL Loss: 0.212667
Clustering 52: AMI 0.6165, NMI 0.6177, ARI 0.5271 acc 0.6545
Clustering 53: AMI 0.6151, NMI 0.6163, ARI 0.5232 acc 0.6374
Clustering 54: AMI 0.6173, NMI 0.6185, ARI 0.5273 acc 0.6571
Clustering 55: AMI 0.6160, NMI 0.6172, ARI 0.5227 acc 0.6375
Clustering 56: AMI 0.6151, NMI 0.6163, ARI 0.5230 acc 0.6504
Clustering 57: AMI 0.6190, NMI 0.6202, ARI 0.5405 acc 0.6825
Clustering 58: AMI 0.6180, NMI 0.6192, ARI 0.5296 acc 0.6625
Clustering 59: AMI 0.6159, NMI 0.6171, ARI 0.5231 acc 0.6391
Clustering 60: AMI 0.6174, NMI 0.6186, ARI 0.5297 acc 0.6619
Clustering 61: AMI 0.6176, NMI 0.6188, ARI 0.5277 acc 0.6547
#Epoch 61: Total: 1.844312 Clustering Loss: 0.143666 ZINB Loss1: 0.667285 ZINB Loss2: 0.822651 KL Loss: 0.210710
Clustering 62: AMI 0.6178, NMI 0.6190, ARI 0.5306 acc 0.6644
Clustering 63: AMI 0.6197, NMI 0.6209, ARI 0.5392 acc 0.6800
Clustering 64: AMI 0.6153, NMI 0.6165, ARI 0.5229 acc 0.6436
Clustering 65: AMI 0.6148, NMI 0.6160, ARI 0.5224 acc 0.6383
Clustering 66: AMI 0.6150, NMI 0.6162, ARI 0.5228 acc 0.6380
Clustering 67: AMI 0.6173, NMI 0.6186, ARI 0.5262 acc 0.6549
Clustering 68: AMI 0.6163, NMI 0.6175, ARI 0.5235 acc 0.6450
Clustering 69: AMI 0.6155, NMI 0.6167, ARI 0.5227 acc 0.6414
Clustering 70: AMI 0.6182, NMI 0.6194, ARI 0.5315 acc 0.6672
Clustering 71: AMI 0.6177, NMI 0.6189, ARI 0.5292 acc 0.6623
#Epoch 71: Total: 1.837503 Clustering Loss: 0.137466 ZINB Loss1: 0.667154 ZINB Loss2: 0.823006 KL Loss: 0.209877
Clustering 72: AMI 0.6193, NMI 0.6205, ARI 0.5354 acc 0.6741
Clustering 73: AMI 0.6178, NMI 0.6190, ARI 0.5271 acc 0.6565
Clustering 74: AMI 0.6164, NMI 0.6176, ARI 0.5283 acc 0.6595
Clustering 75: AMI 0.6205, NMI 0.6217, ARI 0.5467 acc 0.6907
Clustering 76: AMI 0.6184, NMI 0.6196, ARI 0.5365 acc 0.6762
Clustering 77: AMI 0.6162, NMI 0.6175, ARI 0.5233 acc 0.6411
Clustering 78: AMI 0.6165, NMI 0.6177, ARI 0.5250 acc 0.6468
Clustering 79: AMI 0.6182, NMI 0.6194, ARI 0.5325 acc 0.6694
Clustering 80: AMI 0.6148, NMI 0.6160, ARI 0.5219 acc 0.6413
Clustering 81: AMI 0.6155, NMI 0.6167, ARI 0.5226 acc 0.6381
#Epoch 81: Total: 1.829174 Clustering Loss: 0.131217 ZINB Loss1: 0.666992 ZINB Loss2: 0.822744 KL Loss: 0.208221
Clustering 82: AMI 0.6169, NMI 0.6181, ARI 0.5292 acc 0.6621
Clustering 83: AMI 0.6161, NMI 0.6173, ARI 0.5231 acc 0.6398
Clustering 84: AMI 0.6174, NMI 0.6186, ARI 0.5304 acc 0.6645
Clustering 85: AMI 0.6143, NMI 0.6155, ARI 0.5223 acc 0.6382
Clustering 86: AMI 0.6182, NMI 0.6194, ARI 0.5368 acc 0.6767
Clustering 87: AMI 0.6189, NMI 0.6201, ARI 0.5410 acc 0.6833
Clustering 88: AMI 0.6162, NMI 0.6174, ARI 0.5242 acc 0.6465
Clustering 89: AMI 0.6157, NMI 0.6169, ARI 0.5253 acc 0.6517
Clustering 90: AMI 0.6170, NMI 0.6182, ARI 0.5278 acc 0.6595
Clustering 91: AMI 0.6166, NMI 0.6178, ARI 0.5306 acc 0.6647
#Epoch 91: Total: 1.822761 Clustering Loss: 0.126242 ZINB Loss1: 0.666825 ZINB Loss2: 0.822552 KL Loss: 0.207142
Clustering 92: AMI 0.6182, NMI 0.6194, ARI 0.5399 acc 0.6824
Clustering 93: AMI 0.6136, NMI 0.6148, ARI 0.5246 acc 0.6499
Clustering 94: AMI 0.6169, NMI 0.6182, ARI 0.5400 acc 0.6822
Clustering 95: AMI 0.6180, NMI 0.6192, ARI 0.5478 acc 0.6946
Clustering 96: AMI 0.6080, NMI 0.6093, ARI 0.5229 acc 0.6466
Clustering 97: AMI 0.6151, NMI 0.6163, ARI 0.5506 acc 0.6985
Clustering 98: AMI 0.6151, NMI 0.6163, ARI 0.5461 acc 0.6935
Clustering 99: AMI 0.6134, NMI 0.6146, ARI 0.5449 acc 0.6916
Clustering 100: AMI 0.6119, NMI 0.6131, ARI 0.5476 acc 0.6966
Clustering 101: AMI 0.6103, NMI 0.6115, ARI 0.5355 acc 0.6808
#Epoch 101: Total: 1.817155 Clustering Loss: 0.122029 ZINB Loss1: 0.666804 ZINB Loss2: 0.822372 KL Loss: 0.205949
Clustering 102: AMI 0.6162, NMI 0.6174, ARI 0.5599 acc 0.7123
Clustering 103: AMI 0.6121, NMI 0.6134, ARI 0.5448 acc 0.6955
Clustering 104: AMI 0.6194, NMI 0.6207, ARI 0.5718 acc 0.7265
Clustering 105: AMI 0.6128, NMI 0.6140, ARI 0.5524 acc 0.7064
Clustering 106: AMI 0.6159, NMI 0.6171, ARI 0.5718 acc 0.7266
Clustering 107: AMI 0.6322, NMI 0.6334, ARI 0.6308 acc 0.7708
Clustering 108: AMI 0.6300, NMI 0.6312, ARI 0.6223 acc 0.7665
Clustering 109: AMI 0.6290, NMI 0.6302, ARI 0.6208 acc 0.7643
Clustering 110: AMI 0.6233, NMI 0.6245, ARI 0.6049 acc 0.7538
Clustering 111: AMI 0.6494, NMI 0.6506, ARI 0.6700 acc 0.7955
#Epoch 111: Total: 1.812602 Clustering Loss: 0.118209 ZINB Loss1: 0.666840 ZINB Loss2: 0.822511 KL Loss: 0.205043
Clustering 112: AMI 0.6432, NMI 0.6444, ARI 0.6577 acc 0.7883
Clustering 113: AMI 0.6419, NMI 0.6431, ARI 0.6555 acc 0.7860
Clustering 114: AMI 0.6212, NMI 0.6224, ARI 0.5924 acc 0.7449
Clustering 115: AMI 0.6342, NMI 0.6354, ARI 0.6385 acc 0.7747
Clustering 116: AMI 0.6132, NMI 0.6144, ARI 0.5697 acc 0.7258
Clustering 117: AMI 0.6266, NMI 0.6278, ARI 0.6163 acc 0.7622
Clustering 118: AMI 0.6493, NMI 0.6505, ARI 0.6698 acc 0.7952
Clustering 119: AMI 0.6267, NMI 0.6279, ARI 0.6201 acc 0.7635
Clustering 120: AMI 0.6365, NMI 0.6378, ARI 0.6473 acc 0.7811
Clustering 121: AMI 0.6457, NMI 0.6469, ARI 0.6624 acc 0.7910
#Epoch 121: Total: 1.808842 Clustering Loss: 0.114941 ZINB Loss1: 0.666752 ZINB Loss2: 0.822904 KL Loss: 0.204244
Clustering 122: AMI 0.6381, NMI 0.6392, ARI 0.6429 acc 0.7790
Clustering 123: AMI 0.6523, NMI 0.6534, ARI 0.6751 acc 0.7985
Clustering 124: AMI 0.6448, NMI 0.6460, ARI 0.6590 acc 0.7888
Clustering 125: AMI 0.6446, NMI 0.6458, ARI 0.6620 acc 0.7908
Clustering 126: AMI 0.6444, NMI 0.6456, ARI 0.6567 acc 0.7877
Clustering 127: AMI 0.6531, NMI 0.6542, ARI 0.6764 acc 0.7994
Clustering 128: AMI 0.6526, NMI 0.6538, ARI 0.6750 acc 0.7986
Clustering 129: AMI 0.6348, NMI 0.6360, ARI 0.6342 acc 0.7741
Clustering 130: AMI 0.6513, NMI 0.6524, ARI 0.6723 acc 0.7970
Clustering 131: AMI 0.6371, NMI 0.6383, ARI 0.6385 acc 0.7770
#Epoch 131: Total: 1.803789 Clustering Loss: 0.111295 ZINB Loss1: 0.666789 ZINB Loss2: 0.822329 KL Loss: 0.203376
Clustering 132: AMI 0.6534, NMI 0.6545, ARI 0.6764 acc 0.7994
Clustering 133: AMI 0.6560, NMI 0.6570, ARI 0.6818 acc 0.8019
Clustering 134: AMI 0.6565, NMI 0.6576, ARI 0.6813 acc 0.8022
Clustering 135: AMI 0.6565, NMI 0.6575, ARI 0.6813 acc 0.8021
delta_label 0.0006352087114337568 tol 0.001
Reach tolerance threshold. Stopping training.
Total time: 1696 seconds.
Final: AMI 0.6565, NMI 0.6575, ARI 0.6813