logo

Compared with disease biomarkers in blood and urine, biomarkers in saliva have distinct advantages, as they can be examined with convenient clinical tests and through noninvasive sample collection. Therefore, identifying human saliva-secretory proteins and further detecting protein biomarkers in saliva have significant value in clinical applications. Currently, there are few prediction methods for identifying saliva-secretory proteins, and all of them are highly dependent on annotated protein features. Unlike conventional machine learning algorithms, deep learning algorithms can automatically learn feature representations from data. We present a novel end-to-end model based on a multilane capsule network (CapsNet) with differently sized convolution kernels to determine which human proteins are saliva-secretory proteins solely from the sequence information of the amino acids. The proposed model is validated, and the results show that it outperforms existing methods based on conventional machine learning algorithms. Furthermore, the model performs better than state-of-the-art deep learning architectures used to analyze biological sequences. By comparing the human saliva-secretory proteins from existing studies and known salivary protein biomarkers of cancer with our model, we find that our model can achieve a satisfactory true positive rate. The main contributions of this study are as follows: (1) an end-to-end model based on CapsNet is proposed that can identify saliva-secretory proteins solely from the sequence information of the amino acids; (2) the proposed model achieves good performance and outperforms existing models; and (3) the results of saliva-secretory protein identification are statistically significant for existing cancer biomarkers in saliva.