As text, or see rules and steps
# Chorismate is the starting point for the biosynthesis of the aromatic amino acids # phenylalanine, tryptophan, and tyrosine. # Chorismate biosynthesis in GapMind is based on # MetaCyc pathways chorismate biosynthesis I (metacyc:ARO-PWY), from # D-erythrose-4-phosphate and phosphoenolpyruvate, or # II (metacyc:PWY-6165), from D-glyceraldeyde-3-phosphate and L-asparatate. # Both pathways are identical after they reach 3-dehydroquinate. # This is also known as DAHP (3-deoxy-D-arabino-heptulosonate 7-phosphate) synthase. # Add CA265_RS11635 (a fusion with chorismate mutase) because it is diverged, # is confirmed by cofitness, and is essential in other Bacteroidetes. # uniprot:P39912 (Bacillus subtilis aroA) is annotated in BRENDA as chorismate mutase but it also has this activity # (PMC1198938). # CH_123440 is annotated as this but without the EC number. # 4.1.2.15 is an obsolete EC number, but it appears in a few entries, so is ignored. # uniprot:C9K7C8 is annotated as this in SwissProt but its function is uncertain, so it is ignored. aroG 3-deoxy-7-phosphoheptulonate synthase EC:2.5.1.54 curated:BRENDA::P39912 curated:CharProtDB::CH_123440 curated:reanno::Pedo557:CA265_RS11635 ignore_other:EC 4.1.2.15 ignore:SwissProt::C9K7C8 # (There's also the obsolete EC number 4.6.1.3, no longer used.) aroB 3-dehydroquinate synthase EC:4.2.3.4 # uniprot:B9CK59 may be misannotated in BRENDA. Several plant shikimate dehydrogenases may also be # 3-dehydroquinate dehydratases (fusion proteins), so any similarity to shikimate dehydrogenase # (EC 1.1.1.24 or 1.1.1.25 or 1.1.1.282) is ignored. # PYRFU_RS04235 (uniprot:G0EDV3) from Pyrolobus fumarii is diverged, especially at the C terminal part, # but has conserved functional residues, and a homolog from Sedimentisphaera salicampi is fused to # aroE, so it is probably the missing aroD. aroD 3-dehydroquinate dehydratase EC:4.2.1.10 ignore:BRENDA::B9CK59 ignore_other:EC 1.1.1.24 ignore_other:EC 1.1.1.25 ignore_other:EC 1.1.1.282 predicted:G0EDV3 # EC:1.1.1.282 is with NAD(P)H instead of NADPH. # BT4215 from Bacteroides thetaiotaomicron (uniprot:Q8A006_BACTN) is diverged, is the only good candidate, # and is essential in various Bacteroidetes. # Ga0059261_2194 / BDW16_RS10815 (uniprot:A0A2M8WD96) has auxotrophic phenotypes in RB-TnSeq data and # can complement an aroE- strain of E. coli (Bradley Biggs). # CH_122204 in CharProtDB is ignored because it is probably quinate dehydrogenase, not shikimate dehydrogenase. aroE shikimate dehydrogenase EC:1.1.1.25 EC:1.1.1.282 uniprot:Q8A006_BACTN uniprot:A0A2M8WD96 ignore:CharProtDB::CH_122204 # In E. coli, AroL and AroK are isozymes. # In Bacillus subtilis, this gene was known as AroI, and it was cloned by complementation # (see A. Nakane et al, J. Fermentation and Bioengineering 1994, 77:312-314.) # That sequence is identical to uniprot:AROK_BACSU. # Manually add BT3393 (uniprot:AROK_BACTN) from B. thetaiotaomicron because it is diverged, is the only good candidate, # and is essential in various Bacteroidetes. # Similarly for Echvi_0140 (uniprot:L0FT15_ECHVK) from Echinicola vietnamensis. # And DVU0892 (uniprot:AROK_DESVH) from D. vulgaris Hildenborough is confirmed by cofitness # CCNA_03103 (uniprot:AROK_CAUVN) is confirmed by cofitness and similar proteins # such as SMc00695 (uniprot:AROK_RHIME) and PGA1_c14090 (uniprot:A0A135IJ25) are essential. aroL shikimate kinase EC:2.7.1.71 uniprot:AROK_BACSU uniprot:AROK_BACTN uniprot:L0FT15_ECHVK uniprot:AROK_DESVH uniprot:AROK_CAUVN uniprot:AROK_RHIME uniprot:A0A135IJ25 # Add AroA from Desulfovibrio vulgaris (DVU0463) because it is a bit diverged, is conserved essential, # and clusters with aromatic amino acid biosynthesis genes. # HMPREF1058_RS13970 (uniprot:I9A2E3) is cofit with chorismate synthase (Surya Tripathi); also, it is 71% identical to # BT2186 / BT_RS11065, which can complement an aroA- strain of E. coli (Bradley Biggs). # PYRFU_RS00635 (uniprot:G0EEF0) from Pyrolobus fumarii is from this family, has similar active site residues # (alignment to Q83E11 shows three conserved residues, and D315 => R331), and # its closest homologs are in chorisomate synthesis operons (but, these are under 30% identity); # it is probably the missing aroA. aroA 3-phosphoshikimate 1-carboxyvinyltransferase EC:2.5.1.19 uniprot:Q72EV5_DESVH uniprot:I9A2E3 uniprot:L0FR45 predicted:G0EEF0 # (There's also obsolete EC 4.6.1.4, no longer used) aroC chorismate synthase EC:4.2.3.5 # The triose-phosphate isomerase tpiA is also thought to convert D-glyceraldehyde 3-phosphate to enolaldehyde, which # spontaneously converts to methylglyoxal. # (Alternatively, methylglyoxal might be formed by methylgyoxal synthase, EC 4.2.3.3?) # Ignore uniprot:P00941, which is misannotated in BRENDA. tpiA D-glyceraldehyde-3-phosphate phospholyase EC:5.3.1.1 ignore:BRENDA::P00941 # 6-deoxy-5-ketofructose-1-phosphate synthase is an activity of some fructose-bisphosphate aldolases # (which are usually annotated as 4.1.2.13). To find the fbp in Desulfovibrio vulgaris Hildenborough # and Miyazaki F, it is necessary to match more broadly. # MetaCyc reports that AroA' from Methanococcus jannaschii (metacyc:MONOMER-14592) # also has activity as a fructose-bisphosphate aldolase, but it's not clear that # it carries out this reaction. # The bifunctional fructose-1,6-bisphosphate aldolase/phosphatases are ignored because it # is not obvious that they would carry out this reaction; this includes # uniprot:Q980K6, uniprot:A3MSD2, uniprot:Q8NKR9, uniprot:A0RV30, uniprot:A4YIZ5, uniprot:A8A9E4, # uniprot:B1YAL1, uniprot:B6YTP6, uniprot:D9PUH5, uniprot:F9VMT6, uniprot:Q2RG86, uniprot:Q72K02. # And ignore CharProtDB items with incorrect EC and very short SwissProt entries. fbp 6-deoxy-5-ketofructose 1-phosphate synthase EC:2.2.1.11 EC:4.1.2.13 ignore:metacyc::MONOMER-14592 ignore_other:fructose%bisphosphate aldolase ignore:SwissProt::P84722 ignore:SwissProt::P86979 ignore:SwissProt::P86980 ignore:BRENDA::Q980K6 ignore:BRENDA::A3MSD2 ignore:SwissProt::Q8NKR9 ignore:SwissProt::A0RV30 ignore:SwissProt::A4YIZ5 ignore:SwissProt::A8A9E4 ignore:SwissProt::B1YAL1 ignore:SwissProt::B6YTP6 ignore:SwissProt::D9PUH5 ignore:SwissProt::F9VMT6 ignore:SwissProt::Q2RG86 ignore:SwissProt::Q72K02 import met.steps:aspartate-semialdehyde # aroA' condenses 6-deoxy-5-ketofructose 1-phosphate with L-aspartate 4-semialdehyde aroA' 2-amino-3,7-dideoxy-D-threo-hept-6-ulosonate synthase EC:2.2.1.10 # Ignore uniprot:P81230, which is misannotated as this in BRENDA aroB' dehydroquinate synthase II EC:1.4.1.24 ignore:BRENDA::P81230 # Pathway I uses aroG and aroB, while pathway II uses non-canonical activities of triose-phosphate # isomerase (tpiA) and fructose-bisphosphate aldolase (fbp) # to form 6-deoxy-5-ketofructose 1-phosphate. AroA' condenses this with asparate semialdehyde # to 2-amino-3,7-dideoxy-D-threo-hept-6-ulosonate, # and AroB' cyclizes it to 3-dehydroquinate. 3-dehydroquinate: aroG aroB 3-dehydroquinate: tpiA fbp aspartate-semialdehyde aroA' aroB' all: 3-dehydroquinate aroD aroE aroL aroA aroC
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
Otherwise, a candidate is "medium confidence" if either:
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory