logo

Identifying secretory proteins from blood, saliva or other body fluids has become an effective method for diagnosing diseases. Existing secretory protein prediction methods are mainly based on conventional machine learning algorithms, and highly dependent on the feature set from protein. Therefore, the deviation of feature selection may have a negative impact on the final prediction result. Compared with conventional machine learning algorithms, deep learning methods can directly use amino acid sequences as input, and can adaptively learn better feature representations, thereby avoiding the impact of feature selection bias. In this article, we propose a deep learning model based on Capsule Network and Transformer architecture, SecProCT, to predict secretory proteins. The proposed model is validated on independent test set, and obtained 0.917 and 0.905 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, which are better than conventional machine learning methods and other deep learning methods for biological sequence analysis. Further comparing with the experimentally verified blood-secretory proteins and saliva-secretory proteins, our proposed model can achieve true positive rate of 0.909 and 0.935, respectively. Compared with the existing cancer protein biomarkers in blood and saliva, our proposed model can achieve true positive rate of 0.884 and 0.946, respectively. The main contributions of this article are as follows: (1) a deep learning model based on Capsule Network and Transformer architecture is proposed for predicting secretory protein only using amino acid sequences. (2) the results of the model are better than the existing conventional machine learning methods and deep learning methods for biological sequence analysis.; (3) the proposed model can accurately predict most experimentally verified secretory proteins and cancer protein biomarkers in blood and saliva.