The Journal of Biological Chemistry DOI:10.1074/jbc.M111.233734
The GreenCut2 Resource, a Phylogenomically Derived Inventory of Proteins Specific to the Plant Lineage
Steven J. Karpowicz, Simon E. Prochnik, Arthur R. Grossman and Sabeeha S. Merchant
The plastid is a defining structure of photosynthetic eukaryotes and houses many plant-specific processes, including the light reactions, carbon fixation, pigment synthesis, and other primary metabolic processes. Identifying proteins associated with catalytic, structural, and regulatory functions that are unique to plastid-containing organisms is necessary to fully define the scope of plant biochemistry. Here, we performed phylogenomics on 20 genomes to compile a new inventory of 597 nucleus-encoded proteins conserved in plants and green algae but not in non-photosynthetic organisms. 286 of these proteins are of known function, whereas 311 are not characterized. This inventory was validated as applicable and relevant to diverse photosynthetic eukaryotes using an additional eight genomes from distantly related plants (including Micromonas, Selaginella, and soybean). Manual curation of the known proteins in the inventory established its importance to plastid biochemistry. To predict functions for the 52% of proteins of unknown function, we used sequence motifs, subcellular localization, co-expression analysis, and RNA abundance data. We demonstrate that 18% of the proteins in the inventory have functions outside the plastid and/or beyond green tissues. Although 32% of proteins in the inventory have homologs in all cyanobacteria, unexpectedly, 30% are eukaryote-specific. Finally, 8% of the proteins of unknown function share no similarity to any characterized protein and are plant lineage-specific. We present this annotated inventory of 597 proteins as a resource for functional analyses of plant-specific biochemistry.