Amy L. Bauer, William S. Hlavacek, Pat J. Unkefer and Fangping Mu (Submitted) Using Sequence-specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites.
Computational results for 54 TFs whose number of known binding sites documented in RegulonDB is five or more. The cutoff values for the BvH, Match, and MATRIX SEARCH methods are defined as the mean score of the N positive training examples. For the QPMEME method, cutoff values are set to -1, and for the SiteSlueth method, cutoff values are set to 0. For MATRIX SEARCH and QPMEME, background frequency is defined as: A: 0.25007276777571225, T: 0.2507150646759991, G: 0.23820269415412798 and C: 0.26100947339416075. The cross-validation score V is given in parentheses and is defined as the fraction of positive examples predicted to be true binding sites. In 3-fold cross-validation, the available positive examples are divided into training sets and testing sets without overlap as described in the main text. Models are built based on the training set and tested using the remaining positive examples. Each model (derived through any of the five methods that we consider here) is built with cutoff values set as defined above.
| TF Name | Training Set Size | BvH | Match | MATRIX SEARCH | QPMEME | SiteSleuth |
|---|---|---|---|---|---|---|
| AgaR | 11 | 465 (0.09) | 2236 (0.09) | 360 (0) | 1884 (0.05) | 167 (0.27) |
| AraC | 20 | 33412 (0.25) | 52734 (0.2) | 33390 (0.35) | 102457 (0.18) | 2442 (0.3) |
| ArcA | 91 | 73895 (0.23) | 76283 (0.29) | 72860 (0.23) | 132264 (0.31) | 25229 (0.27) |
| ArgR | 24 | 671 (0.5) | 1686 (0.5) | 652 (0.46) | 18912 (0.4) | 2171 (0.58) |
| CpxR | 33 | 33101 (0.33) | 55743 (0.39) | 33221 (0.33) | 368930 (0.41) | 13964 (0.39) |
| CRP | 260 | 7639 (0.63) | 11030 (0.64) | 7580 (0.63) | 1069283 (0.8) | 10189 (0.67) |
| CysB | 8 | 278 (0) | 1551 (0) | 146 (0) | 729 (0) | 33 (0) |
| CytR | 14 | 202 (0.29) | 1505 (0.21) | 175 (0.43) | 2599 (0.16) | 133 (0.43) |
| DeoR | 7 | 6957 (0.14) | 16652 (0) | 6191 (0) | 20923 (0.07) | 467 (0.14) |
| DgsA | 8 | 17 (0) | 224 (0.25) | 13 (0) | 84 (0.1) | 142 (0.38) |
| DnaA | 10 | 48972 (0) | 108426 (0.1) | 47570 (0.1) | 76486 (0.19) | 2724 (0.6) |
| FadR | 10 | 4238 (0.3) | 14726 (0.3) | 3920 (0.3) | 33838 (0.1) | 754 (0.2) |
| Fis | 133 | 202096 (0.44) | 262224 (0.47) | 199966 (0.43) | 1506632 (0.61) | 129150 (0.33) |
| FlhDC | 20 | 162411 (0.25) | 234753 (0) | 163533 (0.2) | 492630 (0.09) | 5688 (0.15) |
| FNR | 85 | 4900 (0.51) | 8543 (0.56) | 4928 (0.51) | 340882 (0.67) | 2463 (0.53) |
| FruR | 13 | 21 (0.23) | 97 (0.38) | 21 (0.15) | 263 (0.22) | 661 (0.69) |
| Fur | 54 | 25843 (0.54) | 37387 (0.54) | 25838 (0.48) | 275020 (0.54) | 24684 (0.59) |
| GadE | 5 | 54 (0.2) | 368 (0) | 29 (0) | 255 (0.16) | 12 (0.4) |
| GalR | 10 | 244 (0.4) | 996 (0.4) | 206 (0.4) | 3769 (0.25) | 198 (0.7) |
| GalS | 9 | 149 (0.22) | 528 (0.56) | 126 (0.22) | 2591 (0.2) | 395 (0.67) |
| GcvA | 5 | 14 (0) | 141 (0) | 11 (0) | 49 (0) | 22 (0) |
| GlpR | 23 | 10195 (0.09) | 27198 (0.22) | 10063 (0.04) | 31268 (0.17) | 3661 (0.3) |
| GntR | 17 | 2358 (0.24) | 7342 (0.29) | 2317 (0.24) | 23530 (0.27) | 1455 (0.59) |
| H-NS | 34 | 218778 (0.32) | 303046 (0.24) | 218934 (0.18) | 597416 (0.28) | 31154 (0.06) |
| IclR | 10 | 232538 (0) | 319893 (0.1) | 235576 (0.1) | 294673 (0.1) | 11357 (0.1) |
| IHF | 87 | 167137 (0.41) | 316148 (0.47) | 167122 (0.41) | 1050684 (0.63) | 16056 (0.18) |
| IscR | 8 | 7 (0) | 50 (0) | 5 (0) | 30 (0) | 104 (0.25) |
| LexA | 24 | 259 (0.54) | 1745 (0.67) | 254 (0.46) | 30290 (0.35) | 1578 (0.71) |
| Lrp | 84 | 541474 (0.31) | 625587 (0.3) | 539556 (0.3) | 1319479 (0.56) | 3196 (0.3) |
| MalT | 20 | 35743 (0.2) | 55431 (0.3) | 35261 (0.2) | 125337 (0.34) | 6468 (0.55) |
| MarA | 16 | 39156 (0.13) | 85183 (0.13) | 38597 (0.06) | 137897 (0.06) | 3808 (0.06) |
| MelR | 8 | 146 (0) | 680 (0.25) | 129 (0.25) | 1114 (0.14) | 30 (0.5) |
| MetJ | 27 | 634694 (0.26) | 1137063 (0.37) | 621589 (0.41) | 1449353 (0.31) | 27334 (0.33) |
| MetR | 6 | 174 (0) | 800 (0) | 114 (0) | 778 (0.07) | 4 (0.33) |
| ModE | 8 | 95 (0) | 765 (0.25) | 73 (0) | 724 (0.14) | 29 (0.5) |
| Nac | 10 | 22752 (0) | 48020 (0) | 21363 (0.1) | 53086 (0.08) | 2382 (0.1) |
| NagC | 14 | 27 (0.14) | 110 (0.21) | 24 (0) | 100 (0.17) | 843 (0.5) |
| NanR | 6 | 973 (0.5) | 973 (0.5) | 973 (0.67) | 15678 (0.18) | 664 (0.83) |
| NarL | 91 | 426754 (0.37) | 472550 (0.46) | 431808 (0.46) | 1682666 (0.72) | 92451 (0.65) |
| NarP | 16 | 115566 (0.19) | 129839 (0.19) | 115658 (0.06) | 216367 (0.33) | 2823 (0.44) |
| NtrC | 22 | 1248 (0.41) | 6264 (0.5) | 1200 (0.41) | 14721 (0.36) | 2843 (0.73) |
| OmpR | 20 | 11612 (0.15) | 18694 (0.1) | 11637 (0.15) | 34207 (0.16) | 3532 (0.3) |
| OxyR | 9 | 1029 (0) | 3870 (0) | 768 (0) | 2397 (0) | 67 (0) |
| PhoB | 14 | 2579 (0.29) | 9954 (0.21) | 2537 (0.21) | 35604 (0.13) | 1211 (0.36) |
| PhoP | 22 | 470 (0.5) | 1778 (0.5) | 437 (0.41) | 6042 (0.31) | 2781 (0.55) |
| PspF | 5 | 1280 (0) | 4727 (0) | 1116 (0) | 2671 (0) | 122 (0) |
| PurR | 18 | 20483 (0.17) | 81537 (0.39) | 19980 (0.11) | 118837 (0.23) | 3956 (0.39) |
| RcsAB | 5 | 442 (0) | 1011 (0) | 382 (0) | 910 (0) | 230 (0) |
| Rob | 6 | 4712 (0) | 17587 (0) | 4019 (0) | 9418 (0) | 177 (0) |
| SoxS | 18 | 170007 (0) | 403160 (0.06) | 168981 (0.06) | 265797 (0.02) | 5940 (0) |
| TorR | 8 | 2060 (0.13) | 6429 (0.5) | 2027 (0) | 9394 (0.28) | 1520 (0.63) |
| TrpR | 10 | 10 (0.2) | 16 (0.2) | 9 (0.1) | 68 (0.19) | 78 (0.7) |
| TyrR | 19 | 15695 (0.11) | 40645 (0.16) | 15431 (0) | 54411 (0.19) | 1551 (0.32) |
| UxuR | 5 | 5 (0) | 7 (0) | 5 (0) | 8 (0.1) | 84 (0.6) |