Definition of L-cysteine biosynthesis
As rules and steps, or see full text
Rules
Overview: Cysteine biosynthesis in GapMind is based on MetaCyc pathways L-cysteine biosynthesis I from serine and sulfide (link), II (tRNA-dependent) (link), III from serine and homocysteine (link), V (protein-bound thiocarboxylates) (link), VIII via serine kinase (link), or IX via phosphoserine (link). There is no pathway IV. Pathway VI (from serine + methionine) is not included because it is not found in prototrophic bacteria. (It is found in H. pylori, which lacks biosynthesis of homocysteine or methionine; also, it is a supserset of the reactions in pathway III, from serine and homocysteine.) Pathway VII is not included because it requires sulfocysteine, an uncommon precursor. GapMind also describes cysteine biosynthesis with O-succinylserine as an intermediate (PMID:28581482), instead of O-acetylserine (as in pathway I).
- all:
- from-serine
- or serA, serC and from-phosphoserine
- or from-serine-homocysteine
- from-phosphoserine:
- sepS and pscS
- or cysO, moeZ, Mt_cysM and mec
- or PSSH
- Comment: Phosphoserine can be converted to cysteine by the tRNA-dependent pathway II (sepS and pscS), the protein-bound thiocarboxylate pathway V with the carrier protein cysO, or by direct sulfhydrylation (PSSH) as in pathway IX.
- from-serine-homocysteine: CBS and CGL
- Comment: In many organisms, the sulfhydryl group of cysteine is used to form homocysteine and methionine, but this pathway can also run in reverse. GapMind uses a pathway requirement to warn if an organism is modeled as synthesizing methionine and cysteine from each other.
- from-serine:
- cysE and cysK
- or SST and cysK
- or serK and PSSH
- Comment: Cysteine can be formed from serine via O-acetylserine as in pathway I (cysE and cysK), via O-succinylserine (SST), or via serine kinase (serK) as in pathway IX. For the O-succinylserine pathway, the identity of the O-succinylserine sulfhydrylase is not proven, but it is expected to be similar to cysK.
Steps
cysE: serine acetyltransferase
- Curated proteins or TIGRFams with EC 2.3.1.30
- UniProt sequence Q72EB6_DESVH: RecName: Full=Serine O-acetyltransferase {ECO:0000256|ARBA:ARBA00013266}; EC=2.3.1.30 {ECO:0000256|ARBA:ARBA00013266};
- UniProt sequence B8DIT5_DESVM: RecName: Full=Serine O-acetyltransferase {ECO:0000256|ARBA:ARBA00013266}; EC=2.3.1.30 {ECO:0000256|ARBA:ARBA00013266};
- Comment: Desulfovibrios have a somewhat diverged serine O-acetyltransferase. DVU0662 (Q72EB6_DESVH) and DvMF_2657 (B8DIT5_DESVM) are both essential which suggests that they are correctly annotated.
- Total: 1 HMMs and 30 characterized proteins
cysK: O-acetylserine sulfhydrylase
SST: serine O-succinyltransferase
CBS: cystathionine beta-synthase
CGL: cystathionine gamma-lyase
sepS: O-phosphoseryl-tRNA ligase
pscS: Sep-tRNA:Cys-tRNA synthase
Mt_cysM: CysO-thiocarboxylate-dependent cysteine synthase
mec: [CysO sulfur-carrier protein]-S-L-cysteine hydrolase
moeZ: [sulfur carrier protein CysO]--sulfur ligase
- Curated sequence G185E-7476-MONOMER: Probable adenylyltransferase/sulfurtransferase MoeZ; EC 2.7.7.-; EC 2.8.1.-. [sulfur carrier protein CysO] adenylyltransferase/sulfurtransferase
- Total: 1 characterized proteins
cysO: sulfur carrier protein CysO
- Curated sequence P9WP33: Sulfur carrier protein CysO; 9.5 kDa culture filtrate antigen cfp10A. CysO sulfur carrier protein
- Total: 1 characterized proteins
PSSH: O-phosphoserine sulfhydrylase
serK: serine kinase (ADP-dependent)
- UniProt sequence Q5JD03: RecName: Full=L-serine kinase SerK {ECO:0000305}; EC=2.7.1.226 {ECO:0000269|PubMed:27857065, ECO:0000269|PubMed:28358477};
- Total: 1 characterized proteins
serA: 3-phosphoglycerate dehydrogenase
- Curated proteins or TIGRFams with EC 1.1.1.95
- UniProt sequence A0A1X9ZCD3: SubName: Full=3-phosphoglycerate dehydrogenase {ECO:0000313|EMBL:ARS42937.1};
- Ignore hits to Q4JDI4 when looking for 'other' hits (phosphoglycerate dehydrogenase (EC 1.1.1.95))
- Comment: BRENDA::Q4JDI4 is misannotated as 3-phosphoglycerate dehydrogenase instead of 3-phosphoglycerate kinase. (The curators were notified and report that they have corrected this.) CA265_RS09010 (A0A1X9ZCD3) from Pedobacter sp. GW460-11-11-14-LB5 is annotated as 3-phosphoglycerate dehydrogenase and has auxotrophic phenotypes. In particular, mutants are partially rescued by glycine or serine. Also it is adjacent to the putative serC.
- Total: 1 HMMs and 20 characterized proteins
serC: 3-phosphoserine aminotransferase
Links
Downloads
Related tools
About GapMind
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using
ublast (a fast alternative to protein BLAST)
against a database of manually-curated proteins (most of which are experimentally characterized) or by using
HMMer with enzyme models (usually from
TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
- ublast finds a hit to a characterized protein at above 40% identity and 80% coverage, and bits >= other bits+10.
- (Hits to curated proteins without experimental data as to their function are never considered high confidence.)
- HMMer finds a hit with 80% coverage of the model, and either other identity < 40 or other coverage < 0.75.
where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").
Otherwise, a candidate is "medium confidence" if either:
- ublast finds a hit at above 40% identity and 70% coverage (ignoring otherBits).
- ublast finds a hit at above 30% identity and 80% coverage, and bits >= other bits.
- HMMer finds a hit (regardless of coverage or other bits).
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps."
For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways.
For diverse bacteria and archaea that can utilize a carbon source, there is a complete
high-confidence catabolic pathway (including a transporter) just 38% of the time, and
there is a complete medium-confidence pathway 63% of the time.
Gaps may be due to:
- our ignorance of proteins' functions,
- omissions in the gene models,
- frame-shift errors in the genome sequence, or
- the organism lacks the pathway.
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory