Phage Host Prediction Using Deep Neural Network with Multi-source Protein Language Models and Squeeze-and-Excitation Attention Mechanism
Abstract
Phage therapy (PT) has become a promising alternative for treating infections with the increase of antimicrobial resistance. PT utilizes phages to bind to specific receptors on bacterial surfaces via receptor-binding proteins (RBPs), enabling precise destruction of targeted hosts. In PT, a key issue is the phage host prediction (PHP), which tries to match therapeutic phages to pathogenic hosts. However, traditional PHP methods are often hindered by the time-consuming and expensive wet-lab experiments, while recent computational methods neglect the evolutionary diversity and local feature patterns of RBPs. In this article, we propose a novel deep neural network (called PHPRBP) for PHP based on phage RBPs. In PHPRBP, we first utilize pre-trained protein language models (i.e., ESM2 and ProtT5) to learn the multi-source embedding representations from these RBPs, revealing diverse and complementary features. Then, we employ an adaptive synthetic technique to augment minority class samples, addressing the data scarcity issue. Subsequently, we design a deep neural network architecture, which uses a convolutional neural network to capture local sequence features, and applies a squeeze-and-excitation attention mechanism to enhance the contribution of important features. Finally, a fully connected network is used for host prediction. Experimental results show that PHPRBP outperforms the state-of-the-arts in host prediction at both genus and species levels. The data and code of PHPRBP are available at https://github.com/a1678019300/PHPRBP.