Definition of chorismate biosynthesis
As rules and steps, or see full text
Rules
Overview: Chorismate is the starting point for the biosynthesis of the aromatic amino acids phenylalanine, tryptophan, and tyrosine. Chorismate biosynthesis in GapMind is based on MetaCyc pathways chorismate biosynthesis I (link), from D-erythrose-4-phosphate and phosphoenolpyruvate, or II (link), from D-glyceraldeyde-3-phosphate and L-asparatate. Both pathways are identical after they reach 3-dehydroquinate.
- all: 3-dehydroquinate, aroD, aroE, aroL, aroA and aroC
- 3-dehydroquinate:
- aroG and aroB
- or tpiA, fbp, aspartate-semialdehyde, aroA' and aroB'
- Comment: Pathway I uses aroG and aroB, while pathway II uses non-canonical activities of triose-phosphate isomerase (tpiA) and fructose-bisphosphate aldolase (fbp) to form 6-deoxy-5-ketofructose 1-phosphate. AroA' condenses this with asparate semialdehyde to 2-amino-3,7-dideoxy-D-threo-hept-6-ulosonate, and AroB' cyclizes it to 3-dehydroquinate.
- aspartate-semialdehyde: asp-kinase and asd
Steps
aroG: 3-deoxy-7-phosphoheptulonate synthase
- Curated proteins or TIGRFams with EC 2.5.1.54
- Curated sequence P39912: 3-deoxy-7-phosphoheptulonate synthase (EC 2.5.1.54); chorismate mutase (EC 5.4.99.5)
- Curated sequence CH_123440: 3-Deoxy-D-arabinoheptulosonate-7-phosphate synthase
- Curated sequence CA265_RS11635: chorismate mutase (EC 5.4.99.5)
- Ignore hits to items matching EC 4.1.2.15 when looking for 'other' hits
- Ignore hits to C9K7C8 when looking for 'other' hits (Phospho-2-dehydro-3-deoxyheptonate aldolase AMT16; AM-toxin biosynthesis protein 16; EC 2.5.1.54)
- Comment: This is also known as DAHP (3-deoxy-D-arabino-heptulosonate 7-phosphate) synthase. Add CA265_RS11635 (a fusion with chorismate mutase) because it is diverged, is confirmed by cofitness, and is essential in other Bacteroidetes. P39912 (Bacillus subtilis aroA) is annotated in BRENDA as chorismate mutase but it also has this activity (PMC1198938). CH_123440 is annotated as this but without the EC number. 4.1.2.15 is an obsolete EC number, but it appears in a few entries, so is ignored. C9K7C8 is annotated as this in SwissProt but its function is uncertain, so it is ignored.
- Total: 3 HMMs and 23 characterized proteins
aroB: 3-dehydroquinate synthase
aroD: 3-dehydroquinate dehydratase
- Curated proteins or TIGRFams with EC 4.2.1.10
- Ignore hits to B9CK59 when looking for 'other' hits (3-dehydroquinate dehydratase (EC 4.2.1.10))
- Ignore hits to items matching EC 1.1.1.24 when looking for 'other' hits
- Ignore hits to items matching EC 1.1.1.25 when looking for 'other' hits
- Ignore hits to items matching EC 1.1.1.282 when looking for 'other' hits
- Predicted: UniProt sequence G0EDV3: RecName: Full=3-dehydroquinate dehydratase {ECO:0000256|ARBA:ARBA00012060}; EC=4.2.1.10 {ECO:0000256|ARBA:ARBA00012060};
- Comment: B9CK59 may be misannotated in BRENDA. Several plant shikimate dehydrogenases may also be 3-dehydroquinate dehydratases (fusion proteins), so any similarity to shikimate dehydrogenase (EC 1.1.1.24 or 1.1.1.25 or 1.1.1.282) is ignored. PYRFU_RS04235 (G0EDV3) from Pyrolobus fumarii is diverged, especially at the C terminal part, but has conserved functional residues, and a homolog from Sedimentisphaera salicampi is fused to aroE, so it is probably the missing aroD.
- Total: 2 HMMs and 49 characterized proteins
aroE: shikimate dehydrogenase
- Curated proteins or TIGRFams with EC 1.1.1.25
- Curated proteins or TIGRFams with EC 1.1.1.282
- UniProt sequence Q8A006_BACTN: SubName: Full=Shikimate 5-dehydrogenase {ECO:0000313|EMBL:AAO79320.1};
- UniProt sequence A0A2M8WD96: RecName: Full=Shikimate dehydrogenase (NADP(+)) {ECO:0000256|HAMAP-Rule:MF_00222}; Short=SDH {ECO:0000256|HAMAP-Rule:MF_00222}; EC=1.1.1.25 {ECO:0000256|HAMAP-Rule:MF_00222};
- Ignore hits to CH_122204 when looking for 'other' hits (quinate dehydrogenase; EC 1.1.1.24. quinate dehydrogenase (EC 1.1.1.282))
- Comment: EC 1.1.1.282 is with NAD(P)H instead of NADPH. BT4215 from Bacteroides thetaiotaomicron (Q8A006_BACTN) is diverged, is the only good candidate, and is essential in various Bacteroidetes. Ga0059261_2194 / BDW16_RS10815 (A0A2M8WD96) has auxotrophic phenotypes in RB-TnSeq data and can complement an aroE- strain of E. coli (Bradley Biggs). CH_122204 in CharProtDB is ignored because it is probably quinate dehydrogenase, not shikimate dehydrogenase.
- Total: 2 HMMs and 47 characterized proteins
aroL: shikimate kinase
- Curated proteins or TIGRFams with EC 2.7.1.71
- UniProt sequence AROK_BACSU: RecName: Full=Shikimate kinase {ECO:0000255|HAMAP-Rule:MF_00109}; Short=SK {ECO:0000255|HAMAP-Rule:MF_00109}; EC=2.7.1.71 {ECO:0000255|HAMAP-Rule:MF_00109};
- UniProt sequence AROK_BACTN: RecName: Full=Shikimate kinase {ECO:0000255|HAMAP-Rule:MF_00109}; Short=SK {ECO:0000255|HAMAP-Rule:MF_00109}; EC=2.7.1.71 {ECO:0000255|HAMAP-Rule:MF_00109};
- UniProt sequence L0FT15_ECHVK: RecName: Full=Shikimate kinase {ECO:0000256|HAMAP-Rule:MF_00109}; Short=SK {ECO:0000256|HAMAP-Rule:MF_00109}; EC=2.7.1.71 {ECO:0000256|HAMAP-Rule:MF_00109};
- UniProt sequence AROK_DESVH: RecName: Full=Shikimate kinase {ECO:0000255|HAMAP-Rule:MF_00109}; Short=SK {ECO:0000255|HAMAP-Rule:MF_00109}; EC=2.7.1.71 {ECO:0000255|HAMAP-Rule:MF_00109};
- UniProt sequence AROK_CAUVN: RecName: Full=Shikimate kinase {ECO:0000255|HAMAP-Rule:MF_00109}; Short=SK {ECO:0000255|HAMAP-Rule:MF_00109}; EC=2.7.1.71 {ECO:0000255|HAMAP-Rule:MF_00109};
- UniProt sequence AROK_RHIME: RecName: Full=Shikimate kinase {ECO:0000255|HAMAP-Rule:MF_00109}; Short=SK {ECO:0000255|HAMAP-Rule:MF_00109}; EC=2.7.1.71 {ECO:0000255|HAMAP-Rule:MF_00109};
- UniProt sequence A0A135IJ25: RecName: Full=Shikimate kinase {ECO:0000256|ARBA:ARBA00012154, ECO:0000256|HAMAP-Rule:MF_00109}; Short=SK {ECO:0000256|HAMAP-Rule:MF_00109}; EC=2.7.1.71 {ECO:0000256|ARBA:ARBA00012154, ECO:0000256|HAMAP-Rule:MF_00109};
- Comment: In E. coli, AroL and AroK are isozymes. In Bacillus subtilis, this gene was known as AroI, and it was cloned by complementation (see A. Nakane et al, J. Fermentation and Bioengineering 1994, 77:312-314.) That sequence is identical to AROK_BACSU. Manually add BT3393 (AROK_BACTN) from B. thetaiotaomicron because it is diverged, is the only good candidate, and is essential in various Bacteroidetes. Similarly for Echvi_0140 (L0FT15_ECHVK) from Echinicola vietnamensis. And DVU0892 (AROK_DESVH) from D. vulgaris Hildenborough is confirmed by cofitness CCNA_03103 (AROK_CAUVN) is confirmed by cofitness and similar proteins such as SMc00695 (AROK_RHIME) and PGA1_c14090 (A0A135IJ25) are essential.
- Total: 1 HMMs and 24 characterized proteins
aroA: 3-phosphoshikimate 1-carboxyvinyltransferase
- Curated proteins or TIGRFams with EC 2.5.1.19
- UniProt sequence Q72EV5_DESVH: RecName: Full=3-phosphoshikimate 1-carboxyvinyltransferase {ECO:0000256|HAMAP-Rule:MF_00210}; EC=2.5.1.19 {ECO:0000256|HAMAP-Rule:MF_00210}; AltName: Full=5-enolpyruvylshikimate-3-phosphate synthase {ECO:0000256|HAMAP-Rule:MF_00210}; Short=EPSP synthase {ECO:0000256|HAMAP-Rule:MF_00210}; Short=EPSPS {ECO:0000256|HAMAP-Rule:MF_00210};
- UniProt sequence I9A2E3: RecName: Full=3-phosphoshikimate 1-carboxyvinyltransferase {ECO:0000256|HAMAP-Rule:MF_00210}; EC=2.5.1.19 {ECO:0000256|HAMAP-Rule:MF_00210}; AltName: Full=5-enolpyruvylshikimate-3-phosphate synthase {ECO:0000256|HAMAP-Rule:MF_00210}; Short=EPSP synthase {ECO:0000256|HAMAP-Rule:MF_00210}; Short=EPSPS {ECO:0000256|HAMAP-Rule:MF_00210};
- UniProt sequence L0FR45: RecName: Full=3-phosphoshikimate 1-carboxyvinyltransferase {ECO:0000256|ARBA:ARBA00012450}; EC=2.5.1.19 {ECO:0000256|ARBA:ARBA00012450}; AltName: Full=5-enolpyruvylshikimate-3-phosphate synthase {ECO:0000256|ARBA:ARBA00030046};
- Predicted: UniProt sequence G0EEF0: SubName: Full=EPSP synthase (3-phosphoshikimate 1-carboxyvinyltransferase) {ECO:0000313|EMBL:AEM37991.1};
- Comment: Add AroA from Desulfovibrio vulgaris (DVU0463) because it is a bit diverged, is conserved essential, and clusters with aromatic amino acid biosynthesis genes. HMPREF1058_RS13970 (I9A2E3) is cofit with chorismate synthase (Surya Tripathi); also, it is 71% identical to BT2186 / BT_RS11065, which can complement an aroA- strain of E. coli (Bradley Biggs). PYRFU_RS00635 (G0EEF0) from Pyrolobus fumarii is from this family, has similar active site residues (alignment to Q83E11 shows three conserved residues, and D315 => R331), and its closest homologs are in chorisomate synthesis operons (but, these are under 30% identity); it is probably the missing aroA.
- Total: 1 HMMs and 44 characterized proteins
aroC: chorismate synthase
tpiA: D-glyceraldehyde-3-phosphate phospholyase
- Curated proteins or TIGRFams with EC 5.3.1.1
- Ignore hits to P00941 when looking for 'other' hits (purine-nucleoside phosphorylase (EC 2.4.2.1))
- Comment: The triose-phosphate isomerase tpiA is also thought to convert D-glyceraldehyde 3-phosphate to enolaldehyde, which spontaneously converts to methylglyoxal. (Alternatively, methylglyoxal might be formed by methylgyoxal synthase, EC 4.2.3.3?) Ignore P00941, which is misannotated in BRENDA.
- Total: 1 HMMs and 57 characterized proteins
fbp: 6-deoxy-5-ketofructose 1-phosphate synthase
- Curated proteins or TIGRFams with EC 2.2.1.11
- Curated proteins or TIGRFams with EC 4.1.2.13
- Ignore hits to MONOMER-14592 when looking for 'other' hits (2-amino-3,7-dideoxy-D-threo-hept-6-ulosonate synthase (EC 2.2.1.10). 2-amino-3,7-dideoxy-D-threo-hept-6-ulosonate synthase; ADH synthase; ADHS; ADTH synthase; Transaldolase-like ADHS; EC 2.2.1.10. 2-amino-3,7-dideoxy-D-threo-hept-6-ulosonate synthase monomer (EC 4.1.2.13; EC 2.2.1.10))
- Ignore hits to items matching fructose%bisphosphate aldolase when looking for 'other' hits
- Ignore hits to P84722 when looking for 'other' hits (Putative fructose-bisphosphate aldolase, chloroplastic; PS6; EC 4.1.2.13)
- Ignore hits to P86979 when looking for 'other' hits (Fructose-bisphosphate aldolase A; Muscle-type aldolase; Allergen Thu a 3.0101; EC 4.1.2.13)
- Ignore hits to P86980 when looking for 'other' hits (Fructose-bisphosphate aldolase A; Muscle-type aldolase; Allergen Gad m 3.0101; EC 4.1.2.13)
- Ignore hits to Q980K6 when looking for 'other' hits (fructose-bisphosphate aldolase (EC 4.1.2.13))
- Ignore hits to A3MSD2 when looking for 'other' hits (fructose-bisphosphatase (EC 3.1.3.11); fructose-bisphosphate aldolase (EC 4.1.2.13))
- Ignore hits to Q8NKR9 when looking for 'other' hits (fructose-bisphosphatase (EC 3.1.3.11). Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; Fructose-1,6-bisphosphatase; FBPase; EC 3.1.3.11; EC 4.1.2.13)
- Ignore hits to A0RV30 when looking for 'other' hits (Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; EC 3.1.3.11; EC 4.1.2.13)
- Ignore hits to A4YIZ5 when looking for 'other' hits (Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; EC 3.1.3.11; EC 4.1.2.13)
- Ignore hits to A8A9E4 when looking for 'other' hits (Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; EC 3.1.3.11; EC 4.1.2.13. fructose-1,6-bisphosphate aldolase/phosphatase (EC 4.1.2.13; EC 3.1.3.11))
- Ignore hits to B1YAL1 when looking for 'other' hits (Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; EC 3.1.3.11; EC 4.1.2.13. fructose-bisphosphatase (EC 3.1.3.11); fructose-bisphosphate aldolase (EC 4.1.2.13))
- Ignore hits to B6YTP6 when looking for 'other' hits (Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; EC 3.1.3.11; EC 4.1.2.13)
- Ignore hits to D9PUH5 when looking for 'other' hits (Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; EC 3.1.3.11; EC 4.1.2.13)
- Ignore hits to F9VMT6 when looking for 'other' hits (Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; Fructose-1,6-bisphosphatase; FBPase; EC 3.1.3.11; EC 4.1.2.13. fructose-bisphosphatase (EC 3.1.3.11); fructose-bisphosphate aldolase (EC 4.1.2.13))
- Ignore hits to Q2RG86 when looking for 'other' hits (Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; EC 3.1.3.11; EC 4.1.2.13)
- Ignore hits to Q72K02 when looking for 'other' hits (Fructose-1,6-bisphosphate aldolase/phosphatase; FBP A/P; FBP aldolase/phosphatase; EC 3.1.3.11; EC 4.1.2.13)
- Comment: 6-deoxy-5-ketofructose-1-phosphate synthase is an activity of some fructose-bisphosphate aldolases (which are usually annotated as 4.1.2.13). To find the fbp in Desulfovibrio vulgaris Hildenborough and Miyazaki F, it is necessary to match more broadly. MetaCyc reports that AroA' from Methanococcus jannaschii (link) also has activity as a fructose-bisphosphate aldolase, but it's not clear that it carries out this reaction. The bifunctional fructose-1,6-bisphosphate aldolase/phosphatases are ignored because it is not obvious that they would carry out this reaction; this includes Q980K6, A3MSD2, Q8NKR9, A0RV30, A4YIZ5, A8A9E4, B1YAL1, B6YTP6, D9PUH5, F9VMT6, Q2RG86, Q72K02. And ignore CharProtDB items with incorrect EC and very short SwissProt entries.
- Total: 3 HMMs and 76 characterized proteins
asp-kinase: aspartate kinase
- Curated proteins or TIGRFams with EC 2.7.2.4
- Ignore hits to O63067 when looking for 'other' hits (homoserine dehydrogenase (EC 1.1.1.3))
- Ignore hits to Q46133 when looking for 'other' hits (aspartate kinase (EC 2.7.2.4))
- Comment: For BRENDA::O63067 -- the paper describes a monofunctional hom but the sequence of O63067 is much longer and has a close homolog of functional aspartate kinase (due to alternative splicing?). In Corynebacterium, aspartate kinase has two subunits, both apparently encoded by the same gene by using start codons (PMID:1956296); Q46133 is the shorter regulatory subunit and lacks the catalytic domain, so it does not suffice for activity and is ignored.
- Total: 3 HMMs and 36 characterized proteins
asd: aspartate semi-aldehyde dehydrogenase
aroA': 2-amino-3,7-dideoxy-D-threo-hept-6-ulosonate synthase
- Curated proteins or TIGRFams with EC 2.2.1.10
- Comment: aroA' condenses 6-deoxy-5-ketofructose 1-phosphate with L-aspartate 4-semialdehyde
- Total: 2 characterized proteins
aroB': dehydroquinate synthase II
- Curated proteins or TIGRFams with EC 1.4.1.24
- Ignore hits to P81230 when looking for 'other' hits (3-dehydroquinate synthase II (EC 1.4.1.24))
- Comment: Ignore P81230, which is misannotated as this in BRENDA
- Total: 2 characterized proteins
Links
Downloads
Related tools
About GapMind
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using
ublast (a fast alternative to protein BLAST)
against a database of manually-curated proteins (most of which are experimentally characterized) or by using
HMMer with enzyme models (usually from
TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
- ublast finds a hit to a characterized protein at above 40% identity and 80% coverage, and bits >= other bits+10.
- (Hits to curated proteins without experimental data as to their function are never considered high confidence.)
- HMMer finds a hit with 80% coverage of the model, and either other identity < 40 or other coverage < 0.75.
where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").
Otherwise, a candidate is "medium confidence" if either:
- ublast finds a hit at above 40% identity and 70% coverage (ignoring otherBits).
- ublast finds a hit at above 30% identity and 80% coverage, and bits >= other bits.
- HMMer finds a hit (regardless of coverage or other bits).
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps."
For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways.
For diverse bacteria and archaea that can utilize a carbon source, there is a complete
high-confidence catabolic pathway (including a transporter) just 38% of the time, and
there is a complete medium-confidence pathway 63% of the time.
Gaps may be due to:
- our ignorance of proteins' functions,
- omissions in the gene models,
- frame-shift errors in the genome sequence, or
- the organism lacks the pathway.
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory