Align cellobiose phosphorylase (EC 2.4.1.20) (characterized)
to candidate GFF2709 Psest_2763 Cellobiose phosphorylase
Query= BRENDA::Q9X2G3 (813 letters) >FitnessBrowser__psRCH2:GFF2709 Length = 2843 Score = 332 bits (850), Expect = 2e-94 Identities = 255/841 (30%), Positives = 394/841 (46%), Gaps = 84/841 (9%) Query: 4 GYFDDVNREYVITTP---QTPYPWINYLGTEDFFSIISHMAGGYCFYKDARLRRITRFRY 60 G FD REYV TP PWIN + F +S GY + +++R ++T + Sbjct: 2047 GGFDKDGREYVTLLDAGANTPAPWINVIANPQFGFQVSAQGSGYTWAENSRENQLTPWS- 2105 Query: 61 NNVPTDAGGRYFYIREEN-GDFWTPTWMPVRKDLSFFEARHGLGYTKITGERNGLRATIT 119 N+ TD G FY+R+E+ G ++PT P+R D + ARHG GY++ + +G+ + Sbjct: 2106 NDPVTDPCGEAFYVRDEDSGALFSPTAQPIR-DTGLYVARHGHGYSRFEHQADGIAMDLL 2164 Query: 120 YFVPRHFTGEVHYLVLENKAEKPRKIKLFSFIEFCLWNALDDMTNF---QRNYSTGEVEI 176 +VP ++ L L N + PR++ + + E+ L A F +R+ S G Sbjct: 2165 EYVPLADPIKISRLTLRNLSAVPRRLSVTRYSEWVLGTARGANAPFIITERDESCG---- 2220 Query: 177 EGSVIYHKTEYRER-RNHYAFYSVNQPIDGFDTDRESFIGLYSGFEAPQAVVEGKP-RNS 234 ++ +T + AF + + DR +G SG P A++ G P + Sbjct: 2221 ---MLLARTPWSSAFPGRVAFADLGGRQTAWTADRRELLGRNSGPATPAALLTGAPLTGA 2277 Query: 235 VASGWAPIASHYLEIELAPSEKKELIFILGYVENPEEEKWEKPGVINKKRAKEMIEKFKT 294 +G P A+ +ELA E E+I +G + + A+ ++E+++ Sbjct: 2278 TGAGMDPCAALQTRVELAAGESIEIIAFIGQCPSADA-------------ARALVERYRQ 2324 Query: 295 GEDVEHALKELREYWDDLLGRIQVETHDEKLNRMVNIWNQYQCMVTFNISRSASYFESGI 354 D++ L E+ E+W LG +QV+T D ++ M+N W YQ + +RSA Y SG Sbjct: 2325 -TDLDAVLLEVTEHWRSALGAVQVKTPDRAMDIMLNGWLLYQTLACRIWARSAFYQASG- 2382 Query: 355 SRGIGFRDSNQDILGFVHMIPEKARQRILDLASIQFEDGSTYHQFQPLTKKGNNEIGGGF 414 GFRD QD + PE R IL AS QF +G H + P + +G + Sbjct: 2383 --AYGFRDQLQDGMALTFSRPEATRSHILRAASRQFPEGDVQHWWLPHSGQG---VRTRI 2437 Query: 415 NDDPLWLILSTSAYIKETGDWSILGEEVPFDNDPNKK----------------ASLFEHL 458 +DD +WL +T+ YI+ GD +IL E V F P K A LFEH Sbjct: 2438 SDDRVWLAFATATYIQVAGDATILDEPVTFLEGPLLKPGEHDAFFQPMMAGDAAPLFEHC 2497 Query: 459 KRSFYFTVNNLGPHGLPLIGRADWNDCLNLNCFSKNPDESFQTTVNALDGRVAESVFIAG 518 R + G GLPLIG DWND +N DG+ ESV++ Sbjct: 2498 ARGLDQCLELTGELGLPLIGGGDWNDGMNRV---------------GEDGK-GESVWLGW 2541 Query: 519 LFVLAGKEFVEICKRRGLEEEAREAE---KHVNKMIETTLKYGWDGEWFLRAYDAFGRKV 575 L + + F + +RG + + AE KH + ++ + WDGEW+ RA G + Sbjct: 2542 LLLRTIELFAPLADQRGTVADVQRAERWRKHAQALADSLEEKAWDGEWYRRATFDDGTWL 2601 Query: 576 GSKECEEGKIFIEPQGMCVMAGIGVDNGYAEKALDSVKKYLDTPY-GLVLQ-QPAYSRYY 633 GSK+ +E +I Q V++G D A++A+ SV+++L GL L P + + Sbjct: 2602 GSKDSDECRIDSIAQSWAVLSG-AADPARAKQAMASVRQHLIREQDGLALLFTPPFDKTE 2660 Query: 634 IELGEISSYPPGYKENAGIFCHNNPWVAIAETVIGRGD---RAFEIYRKITPAYLEDISE 690 E G I YPPG +EN G + H W +A +G GD R F + I A + S Sbjct: 2661 KEPGYIKGYPPGLRENGGQYSHAAMWAMLAFAKLGDGDAACRMFSLLNPINHALTPEGSR 2720 Query: 691 IHRTEPYVYAQMVAGKDAPRHGEAKNSWLTGTAAWSFVAITQHILGIRPTYDSLVVDPCI 750 ++ EPYV A V G AP G +W TG A W A + ILGIR + L+VDPCI Sbjct: 2721 RYKVEPYVVAADVYGV-APHKGRGGWTWYTGAAGWMHRAGVEGILGIRREGEWLIVDPCI 2779 Query: 751 PKEWEGFRITRKFRGSIYDITVKNPSHVSKGVKEIIVDGKKIE---GQVLPVFEDGKVHR 807 +W GF T + Y I ++NP+ ++G++ +D + ++ G V ++G+ H+ Sbjct: 2780 SSQWPGFEATITLGETRYAIRLENPTQANRGIQHAQLDERSLDCTNGWVRLALDEGQ-HQ 2838 Query: 808 V 808 V Sbjct: 2839 V 2839 Lambda K H 0.320 0.139 0.431 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 1 Number of Hits to DB: 4892 Number of extensions: 251 Number of successful extensions: 12 Number of sequences better than 1.0e-02: 1 Number of HSP's gapped: 1 Number of HSP's successfully gapped: 1 Length of query: 813 Length of database: 2843 Length adjustment: 50 Effective length of query: 763 Effective length of database: 2793 Effective search space: 2131059 Effective search space used: 2131059 Neighboring words threshold: 11 Window for multiple hits: 40 X1: 16 ( 7.4 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.8 bits) S2: 60 (27.7 bits)
This GapMind analysis is from Sep 17 2021. The underlying query database was built on Sep 17 2021.
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
Otherwise, a candidate is "medium confidence" if either:
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory