Genome-scale annotation of protein binding sites via language model and geometric deep learning

  1. Qianmu Yuan
  2. Chong Tian
  3. Yuedong Yang  Is a corresponding author
  1. School of Computer Science and Engineering, Sun Yat-sen University, China
17 figures, 14 tables and 1 additional file

Figures

The overview of GPSite.

The protein sequence is input to the pre-trained language model ProtTrans and the folding model ESMFold to generate the sequence embedding and predicted structure, respectively. According to the structure, a protein radius graph is constructed where residues constitute the nodes and adjacent nodes are connected by edges. In addition to the pre-computed residue features of ProtTrans embedding and DSSP structural properties, a comprehensive, end-to-end geometric featurizer is employed to extract the geometric node features including distance, direction and angle, as well as geometric edge features between residues including distance, direction and orientation. Here, the R group denotes the centroid of the heavy sidechain atoms. The resulting geometric-aware attributed graph is input to a shared GNN to perform edge-enhanced message passing for capturing the common binding-relevant characteristics among different molecules. Finally, 10 ligand-specific MLPs are adopted to learn the binding patterns of particular molecules in a multi-task manner. Examples of the applications of GPSite include binding site identification and protein-level Gene Ontology (GO; Ashburner et al., 2000) function prediction.

The performance of GPSite and the state-of-the-art methods.

(A) The ROC and precision-recall curves of GPSite on the 10 binding site test sets. The numbers in the legends are areas under the curves. (B–C) The AUPR values of the top-performing methods in each test set. The methods marked with * denote evaluations using the ESMFold-predicted structures as input.

The performance of GPSite on low-quality predicted structures.

(A) The performance of GPSite on structures of different qualities, and the comparisons with the best experimental structure-based methods in the test sets of DNA, RNA, and peptide. The experimental structure-based methods input with ESMFold-predicted structures are marked with *. (B) Distributions of the TM-scores between native and predicted structures in the DNA, RNA and peptide datasets. (C) The correlations between the prediction quality of ESMFold and the performance of GPSite and GraphBind on the RNA-binding site test set when TM-score <0.5. The scatters denote the average TM-score and AUPR for each bin after sorting the proteins according to the TM-scores and evenly dividing them into 20 discrete bins. The lines are fit to the original data (without binning) using linear regression. (D) The glucocorticoid receptor (GR) in complex with DNA, a coactivator peptide, and Zn2+ ions (PDB: 7PRW). The ESMFold-predicted protein structure (gray) is superimposed to the native structure (cyan) using US-align (TM-score=0.72). The ligands are colored in orange. (E) Superposition of the native (cyan) and predicted (gray) DNA-binding domains of GR (TM-score=0.96). (F–H) The Zn2+, DNA and peptide binding site predictions by GPSite for the predicted GR structure in cartoon or surface view. True positives, false positives and false negatives are colored in green, red and yellow, respectively. The ligands in orange were subsequently added based on the native complex structure to show the quality of the predictions by GPSite.

The effects of protein features and model designs.

(A) Ablation studies on sequence and structure information in the DNA, RNA, and peptide test sets. The average performance of the 10 test sets is also shown. (B) Performance comparison between GPSite and the baseline model using MSA profile for proteins with different Neff values in the combined test set of the 10 ligands. (C) Performance boosts in AUPR using GPSite compared to the single-task baseline. (D) Visualization of the distributions of residues encoded by raw feature vectors (left) or hidden embedding vectors from the pre-trained shared network in GPSite (right) for the unseen carbohydrate-binding site dataset using t-SNE. The binding and non-binding residues are colored in red and gray, respectively. (E) The performance when using the hidden embeddings from GPSite as input features to train an MLP for carbohydrate-binding site prediction, and its comparisons with other methods.

Analyses of Swiss-Prot based on the binding site annotations by GPSite.

(A) The distributions of the binding scores assigned by GPSite for proteins with or without certain ligand-binding molecular function in GO. (B) The ROC curves when using the GPSite binding scores to distinguish between binding and non-binding proteins of various ligands. (C) The percentage of proteins predicted as binding to DNA and RNA by GPSite to be annotated with certain biological process in Swiss-Prot. Only the specific biological process terms with depth ≥8 in the GO directed acyclic graph are considered, among which the top 15 terms with the highest percentages are displayed. (D) The percentage of surface pathogenic or benign natural variant sites within GPSite-predicted interfaces. The baseline is the probability of a random surface residue being annotated as an interface residue. (E) The pathogenic probabilities of variants located in non-binding sites or different types of binding sites predicted by GPSite.

Appendix 3—figure 1
Runtime comparison of the GPSite webserver with other top-performing servers.

Five protein chains (i.e. 8HN4_B, 8USJ_A, 8C1U_A, 8K3V_A, and 8EXO_A) comprising 100, 300, 500, 700, and 900 residues, respectively, were selected for testing, and the average runtime is reported for each method. Note that a significant portion of GPSite’s runtime (75 s, indicated in orange) is allocated to structure prediction using ESMFold.

Appendix 3—figure 2
The performance of GPSite when using native or predicted structures as input during the test phase.
Appendix 3—figure 3
Distributions of the TM-scores between native and predicted structures in the protein, ATP, HEM, Zn2+, Ca2+, Mg2+, and Mn2+ datasets.
Appendix 3—figure 4
The performance of GPSite on structures of different qualities, and the comparisons with the best experimental structure-based methods in the test sets of protein, ATP, HEM, Zn2+, Ca2+, Mg2+, and Mn2+.

The experimental structure-based methods input with ESMFold-predicted structures are marked with *. Since there are only 5 proteins with TM-score ≤0.7 in the HEM and Mn2+ test sets (details shown in Appendix 2—table 5), the corresponding results may not be statistically significant.

Appendix 3—figure 5
The prediction results of GPSite and GraphBind for the ribosome biogenesis protein ERB1.

(A) The state E2 nucleolar 60S ribosome biogenesis intermediate (PDB: 7R6Q). The ribosome biogenesis protein ERB1 (chain m) is highlighted in blue, while other protein chains are colored in gray. The RNA chains are shown in orange. (B) The RNA-binding sites on ERB1 (colored in red). (C) The ESMFold-predicted structure of ERB1 (TM-score=0.24). The RNA-binding sites are also mapped onto this predicted structure (colored in red). (D–G) The prediction results of GPSite and GraphBind for the predicted and native ERB1 structures. The confidence of the predictions is represented with a gradient of color from blue for non-binding to red for binding.

Appendix 3—figure 6
The run time of ESMFold with respect to the sequence length in Swiss-Prot evaluated on an NVIDIA A100 GPU.

The run time is presented as mean ± standard deviation per range of number of residues (range size equals 100).

Appendix 3—figure 7
The univariate and bivariate distributions of the protein length and the pTM estimated by ESMFold of the Swiss-Prot sequences.

The probability density curves are fit using kernel density estimation. The darker region in the bivariate heatmap corresponds to a higher number of samples.

Appendix 3—figure 8
The percentage of proteins predicted as binding to peptide, protein, ATP, HEM, Zn2+, Ca2+, Mg2+ and Mn2+ by GPSite to be annotated with certain biological process in Swiss-Prot.

Only the specific biological process terms with depth ≥8 in the GO directed acyclic graph are considered, among which the 15 terms with the highest percentage are displayed.

Appendix 3—figure 9
The percentage of pathogenic or benign natural variant sites within GPSite-predicted interfaces.

The baseline is the probability of a random residue being annotated as an interface residue.

Author response image 1
The structures of 4XQK (A) and 4KYW (B) in PDB.
Author response image 2
The prediction results of GPSite and GraphBind for the ribosome biogenesis protein ERB1.

(A) The state E2 nucleolar 60S ribosome biogenesis intermediate (PDB: 7R6Q). The ribosome biogenesis protein ERB1 (chain m) is highlighted in blue, while other protein chains are colored in gray. The RNA chains are shown in orange. (B) The RNA-binding sites on ERB1 (colored in red). (C) The ESMFold-predicted structure of ERB1 (TM-score = 0.24). The RNA-binding sites are also mapped onto this predicted structure (colored in red). (D-G) The prediction results of GPSite and GraphBind for the predicted and native ERB1 structures. The confidence of the predictions is represented with a gradient of color from blue for non-binding to red for binding.

Author response image 3
Runtime comparison of the GPSite webserver with other top-performing servers.

Five protein chains (i.e., 8HN4_B, 8USJ_A, 8C1U_A, 8K3V_A and 8EXO_A) comprising 100, 300, 500, 700, and 900 residues, respectively, were selected for testing, and the average runtime is reported for each method. Note that a significant portion of GPSite’s runtime (75 s, indicated in orange) is allocated to structure prediction using ESMFold.

Tables

Appendix 2—table 1
Statistics of the 10 binding site benchmark datasets used in this study.
Molecule typeTraining setTest set
SequencesResidues% of binding residuesSequencesResidues% of binding residues
DNA661185,7968.0614657,9145.75
RNA689205,64810.55346105,2309.78
Peptide1251348,3705.3923574,7884.50
Protein33566,36615.6337578,47514.57
ATP347130,6553.917939,4593.12
HEM17647,0638.554815,6186.21
Zn2+1646474,8551.6321156,0201.85
Ca2+1554504,1461.6718366,8541.55
Mg2+1729575,7321.1023588,8061.01
Mn2+547181,6991.415720,4191.10
  1. Note: We combined the two test sets (Test_60 and Test_315) from Yuan et al., 2021 to establish our final protein-protein binding site test set.

Appendix 2—table 2
The performance of GPSite on the five-fold cross-validation and independent test sets.
Molecule typeFive-fold cross-validationTest set
AUCAUPRAUCAUPR
DNA0.9330.6200.9210.516
RNA0.9100.6150.8990.573
Peptide0.8580.4060.8360.345
Protein0.8190.4910.8360.484
ATP0.9600.6880.9750.714
HEM0.9630.7780.9710.802
Zn2+0.9840.8080.9810.859
Ca2+0.9010.5150.9210.565
Mg2+0.8890.3790.8920.370
Mn2+0.9640.7340.9740.709
Average0.9180.6030.9210.594
Appendix 2—table 3
Performance comparison of GPSite with state-of-the-art methods on the 10 binding site test sets.
Test setMethodRecPreAccF1MCCAUCAUPR
DNADRNApred0.2580.1590.8790.1970.1400.6980.129
COACH-D0.2470.3150.9260.2770.2410.6740.197
NCBRPred0.2250.3160.9270.2630.2300.7630.229
SVMnuc0.3190.3190.9220.3190.2770.8060.259
NucBind0.3330.3290.9230.3310.2900.8060.264
GraphBind0.6070.3550.9140.4480.4220.8840.424
GeoBind*0.4810.4270.9330.4520.4170.8910.416
GeoBind0.5200.4420.9350.4780.4450.8960.443
GraphSite0.4930.4500.9360.4700.4370.9100.455
GPSite0.4630.5250.9450.4920.4640.9210.516
RNACOACH-D0.0730.2100.8820.1080.0710.4630.111
DRNApred0.0920.2360.8820.1330.0930.5300.142
NucBind0.1850.3440.8860.2410.1950.6490.226
SVMnuc0.2270.3710.8870.2820.2320.7420.275
NCBRPred0.2340.4710.8990.3120.2840.6600.302
aaRNA0.4220.3600.8700.3890.3180.8030.359
GeoBind0.5620.4550.8910.5030.4460.8040.459
GraphBind*0.5760.3420.8500.4290.3650.8280.433
GraphBind0.6330.4000.8710.4910.4360.8610.506
GPSite0.5570.5410.9100.5490.4990.8990.573
PeptidePepNN-Seq0.2890.1530.8960.2000.1580.7290.128
PepBind0.0620.5760.9560.1120.1780.6550.148
PepNN-Struct*0.3510.1800.8990.2380.2020.7650.163
PepNN-Struct0.3370.2100.9130.2590.2220.7830.187
PepBCL0.1680.3890.9510.2340.2330.7580.222
GPSite0.2570.4810.9540.3350.3300.8360.345
ProteinDeepPPISP0.6070.2110.6120.3140.1570.6570.258
SPPIDER0.6030.3090.7460.4090.2920.7780.375
MaSIF-site0.5840.3300.7670.4210.3080.7770.384
GraphPPIS0.6700.3200.7450.4340.3280.7940.422
ScanNet*0.5510.3610.7920.4360.3260.7880.399
ScanNet0.5680.4420.8320.4970.4030.8320.476
GPSite0.4900.4730.8460.4810.3910.8360.484
ATPTargetS0.4510.5490.9710.4950.4830.8550.447
GraphBind0.5290.4730.9670.4990.4830.9010.503
GeoBind0.6140.4790.9670.5380.5260.9270.534
DELIA*0.4520.6690.9760.5390.5380.9140.545
DELIA0.4530.6890.9770.5470.5480.9180.559
GPSite0.6180.7420.9810.6750.6680.9750.714
HEMTargetS0.5040.7560.9590.6050.5980.8920.581
GraphBind0.7330.5050.9390.5980.5780.9260.638
DELIA0.6040.6700.9570.6360.6140.9280.664
GeoBind*0.6460.6250.9540.6350.6110.9200.659
GeoBind0.7070.7100.9640.7090.6890.9320.724
GPSite0.7150.7620.9680.7380.7220.9710.802
Zn2+MIB0.7440.2190.9460.3390.3850.9350.394
TargetS0.4540.7490.9870.5660.5780.8740.593
IonCom*0.8490.1450.9040.2480.3270.9390.676
IonCom0.8520.1370.8980.2360.3170.9370.671
LMetalSite0.6810.8590.9920.7600.7610.9760.803
GPSite0.7000.9140.9930.7930.7970.9810.859
Ca2+MIB0.3380.0780.9280.1260.1350.7750.103
TargetS0.1210.4900.9840.1940.2380.7760.163
IonCom0.2970.2470.9750.2690.2580.6980.166
DELIA0.1720.6330.9860.2710.3250.7850.248
GeoBind0.2790.5150.9850.3620.3720.8950.348
GraphBind*0.2900.5370.9850.3770.3880.8360.335
GraphBind0.3710.6230.9870.4650.4750.8880.430
LMetalSite0.4130.7240.9880.5260.5420.9050.492
GPSite0.4350.8200.9900.5690.5930.9210.565
Mg2+MIB0.2460.0430.9380.0740.0820.6750.053
TargetS0.1180.4910.9900.1900.2370.7240.148
IonCom0.2400.2500.9850.2450.2370.6880.184
DELIA0.1290.6500.9910.2150.2870.7440.198
GeoBind0.1810.4750.9900.2630.2890.8400.227
GraphBind*0.2460.2050.9830.2240.2160.7500.136
GraphBind0.2730.4140.9890.3290.3310.7760.231
LMetalSite0.2450.7280.9910.3670.4190.8650.316
GPSite0.3030.6440.9910.4120.4380.8920.370
Mn2+MIB0.4620.0960.9460.1590.1930.8560.168
IonCom0.5110.2450.9770.3310.3440.8330.304
TargetS0.2710.4960.9890.3510.3620.8640.322
GeoBind0.5690.4790.9880.5200.5160.9380.454
DELIA0.5020.6650.9920.5720.5740.9020.489
GraphBind*0.3780.6440.9910.4760.4890.9280.473
GraphBind0.4270.7060.9920.5320.5450.9300.555
LMetalSite0.6130.7190.9930.6620.6610.9660.625
GPSite0.6130.8070.9940.6970.7010.9740.709
  1. Note: The best/second-best AUC and AUPR values are indicated by bold/underlined fonts. For the best experimental structure-based method (measured by AUPR) in each test set, its corresponding result when using ESMFold-predicted structures as input is denoted with *.

Appendix 2—table 4
Performance comparison of GPSite with ScanNet and PeSTo on the protein-protein binding site test set from PeSTo (Krapp et al., 2023b).
MethodAUPRAUCMCC
ScanNet0.7200.8970.510
PeSTo*0.6910.8860.451
PeSTo0.7970.9290.636
GPSite0.8240.9420.637
  1. Note: The performance of ScanNet and PeSTo are directly obtained from Krapp et al., 2023b. PeSTo* denotes evaluation using the ESMFold-predicted structures as input. The metrics provided are the median AUPR, median AUC and median MCC. The best/second-best results are indicated by bold/underlined fonts.

Appendix 2—table 5
The numbers of proteins with TM-score >0.7 or ≤0.7 between native and ESMFold-predicted structures in the 10 binding site datasets.
Molecule typeTraining setTest set
>0.7≤0.7>0.7≤0.7
DNA52014110442
RNA428261175171
Peptide107417717560
Protein2934232154
ATP314336217
HEM15917435
Zn2+142821816051
Ca2+137717715033
Mg2+156516419540
Mn2+51235525
Appendix 2—table 6
The prediction quality of ESMFold measured by TM-score between native and predicted structures in the 10 binding site datasets.
Molecule typeTraining setTest setTotal
MedianMeanMedianMeanMedianMean
DNA0.900.820.880.790.890.82
RNA0.790.730.700.650.760.70
Peptide0.930.860.880.780.930.85
Protein0.940.870.930.850.930.86
ATP0.950.890.900.830.940.88
HEM0.950.890.940.870.940.88
Zn2+0.940.870.910.820.930.86
Ca2+0.950.880.930.850.940.88
Mg2+0.950.900.930.860.950.89
Mn2+0.960.920.950.910.960.91
Appendix 2—table 7
The ablation studies on protein features and model designs in the 10 binding site test sets.
MethodDNARNAPepProATPHEMZn2+Ca2+Mg2+Mn2+Avg
w/o sequence0.3890.4730.2510.3960.6460.7260.7910.5030.3380.6460.516
One-hot0.4290.5060.2540.4270.6450.7550.8400.5640.3590.6730.545
MSA profile0.5070.5570.2810.4630.6710.7910.8140.5400.3690.6830.568
w/o structure0.4370.5030.2420.3940.5440.5650.7930.4680.2880.6070.484
w/o geometry0.4840.5390.3180.4390.6310.6700.8130.4890.3130.6380.533
Single-task0.5060.5490.3380.4550.6690.7160.8430.5570.3260.6320.559
GPSite0.5160.5730.3450.4840.7140.8020.8590.5650.3700.7090.594
  1. Note: The numbers in this table are AUPR values. Bold fonts indicate the best results. ‘Pep’ and ‘Pro’ denote peptide and protein, respectively. ‘Avg’ means the average AUPR values among the 10 test sets. ‘One-hot’ denotes replacing the ProtTrans embedding with one-hot sequence encoding. The generation of the MSA profile (PSSM and HMM) is detailed in Generation of the evolutionary features from MSA. ‘w/o structure’ means using a transformer model only input with the ProtTrans sequence features. ‘w/o geometry’ means removing the geometric featurizer in GPSite.

Appendix 2—table 8
Performance comparison between GPSite and the baseline model using MSA profile for proteins with different Neff values in the combined test set of the 10 ligands.
NeffSequencesResiduesMSA AUCGPSite AUCp-value
[1, 2)6718,2360.8180.8504.3×10–8
[2, 3)3293950.8560.8540.72
[3, 4)7118,3280.8950.8940.13
[4, 5)13330,3920.9010.8964.0×10–4
[5, 6)18239,8580.9090.9169.8×10–4
[6, 7)22660,1280.9150.9130.10
[7, 8)25792,7910.9200.9311.1×10–9
[8, +∞)947334,4550.9190.9357.0×10–10
  1. Note: Significance tests are performed following the procedure in Yan and Kurgan, 2017; Xia et al., 2021. If p-value <0.05, the difference between the performance is considered statistically significant.

Appendix 2—table 9
Performance comparison on the 10 binding site test sets under different training and evaluation settings.
SettingDNARNAPepProATPHEMZn2+Ca2+Mg2+Mn2+Avg
Train: native
Test: native
0.5870.6340.3680.5520.7460.8460.9050.7050.4280.7860.656
Train: native
Test: predicted
0.4970.5540.3110.4590.7040.7840.8260.5460.3520.6940.573
Train: predicted
Test: native
0.5540.6100.3710.5290.7330.8440.8900.6600.4150.7610.637
Train: predicted
Test: predicted
(GPSite)
0.5160.5730.3450.4840.7140.8020.8590.5650.3700.7090.594
  1. Note: The numbers in this table are AUPR values. ‘Pep’ and ‘Pro’ denote peptide and protein, respectively. ‘Avg’ means the average AUPR values among the 10 test sets. ‘native’ and ‘predicted’ denote applying native and predicted structures as input, respectively.

Appendix 2—table 10
Cross-type performance by applying different ligand-specific MLPs in GPSite for the test sets of different ligands.
Ligand-specific MLPLigand-binding site test set
DNARNAPepProATPHEMZn2+Ca2+Mg2+Mn2+
DNA0.5160.4610.1580.3270.1230.4250.0320.0330.0280.072
RNA0.3810.5730.1700.3320.1890.5490.0380.0490.0370.093
Pep0.1700.1990.3450.4100.0890.4790.0460.0270.0280.080
Pro0.1870.2140.2010.4840.0310.1170.0300.0260.0150.025
ATP0.1930.3190.1650.2960.7140.7620.0360.0760.0620.138
HEM0.2310.3160.2360.3210.5440.8020.0730.0260.0400.086
Zn2+0.0760.1640.0690.1970.0770.1150.8590.1360.1110.622
Ca2+0.0910.1970.0790.2340.1510.0740.1140.5650.3170.460
Mg2+0.1170.2060.0910.2320.2650.2080.1920.4680.3700.597
Mn2+0.1080.1960.0950.2260.2450.2370.6270.3900.3210.709
  1. Note: ‘Pep’ and ‘Pro’ denote peptide and protein, respectively. The numbers in this table are AUPR values. The best/second-best result in each test set is indicated by bold/underlined font.

Author response table 1
Performance comparison of GPSite with ScanNet and PeSTo on the protein-protein binding site test set from PeSTo 24.
MethodAUPRAUCMCC
ScanNet0.7200.8970.510
PeSTo*0.6910.8860.451
PeSTo0.7970.9290.636
GPSite0.8240.9420.637
  1. Note: The performance of ScanNet and PeSTo are directly obtained from 24. PeSTo* denotes evaluation using the ESMFold-predicted structures as input. The metrics provided are the median AUPR, median AUC and median MCC. The best/second-best results are indicated by bold/underlined fonts.

Author response table 2
Performance comparison on the ten binding site test sets under different training and evaluation settings.
SettingDNARNAPepProATPHEMZn2+Ca2+Mg2+Mn2+Avg
Train: native
Test: native
0.5870.6340.3680.5520.7460.8460.9050.7050.4280.7860.656
Train: native
Test: predicted
0.4970.5540.3110.4590.7040.7840.8260.5460.3520.6940.573
Train: predicted
Test: native
0.5540.6100.3710.5290.7330.8440.8900.6600.4150.7610.637
Train: predicted
Test: predicted
(GPSite)
0.5160.5730.3450.4840.7140.8020.8590.5650.3700.7090.594
  1. Note: The numbers in this table are AUPR values. “Pep” and “Pro” denote peptide and protein, respectively. “Avg” means the average AUPR values among the ten test sets. “native” and “predicted” denote applying native and predicted structures as input, respectively.

Author response table 3
Performance comparison between GPSite and the baseline model using MSA profile for proteins with different Neff values in the combined test set of the ten ligands.
NeffSequencesResiduesMSA AUCGPSite AUCP-value
[1,2)67182360.8180.8504.3×10-8
[2,3)3293950.8560.8540.72
[3,4)71183280.8950.8940.13
[4,5)133303920.9010.8964.0×10-4
[5,6)182398580.9090.9169.8×10-4
[6,7)226601280.9150.9130.10
[7,8)257927910.9200.9311.1×10-9
[8,+∞)9473344550.9190.9357.0×10-10
  1. Note: Significance tests are performed following the procedure in 12,25. If P-value < 0.05, the difference between the performance is considered statistically significant.

Author response table 4
Cross-type performance by applying different ligand-specific MLPs in GPSite for the test sets of different ligands.
Ligand-specific MLPLigand-binding site test set
DNARNAPepProATPHEMZn2+Ca2+Mg2+Mn2+
DNA0.5160.4610.1580.3270.1230.4250.0320.0330.0280.072
RNA0.3810.5730.1700.3320.1890.5490.0380.0490.0370.093
Pep0.1700.1990.3450.4100.0890.4790.0460.0270.0280.080
Pro0.1870.2140.2010.4840.0310.1170.0300.0260.0150.025
ATP0.1930.3190.1650.2960.7140.7620.0360.0760.0620.138
HEM0.2310.3160.2360.3210.5440.8020.0730.0260.0400.086
Zn2+0.0760.1640.0690.1970.0770.1150.8590.1360.1110.622
Ca2+0.0910.1970.0790.2340.1510.0740.1140.5650.3170.460
Mg2+0.1170.2060.0910.2320.2650.2080.1920.4680.3700.597
Mn2+0.1080.1960.0950.2260.2450.2370.6270.3900.3210.709
  1. Note: “Pep” and “Pro” denote peptide and protein, respectively. The numbers in this table are AUPR values. The best/second-best result in each test set is indicated by bold/underlined font.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Qianmu Yuan
  2. Chong Tian
  3. Yuedong Yang
(2024)
Genome-scale annotation of protein binding sites via language model and geometric deep learning
eLife 13:RP93695.
https://doi.org/10.7554/eLife.93695.3