Vanacek P, Sebestova E, Babkova P, Bidmanova S, Daniel L, Dvorak P, Stepankova V, Chaloupkova R, Brezovsky J, Prokop Z, Damborsky J, 2018: Exploration of Enzyme Diversity by Integrating Bioinformatics with Expression Analysis and Biochemical Characterization. ACS Catalysis 8: 2402–2412. full text
Millions of protein sequences are being discovered at an incredible pace, representing an inexhaustible source of biocatalysts. Here, we describe an integrated system for automated in silico screening and systematic characterization of diverse family members. The workflow consists of: (i) identification and computational characterization of relevant genes by sequence/structural bioinformatics, (ii) expression analysis and activity screening of selected proteins, and (iii) complete biochemical/biophysical characterization, was validated against the haloalkane dehalogenase family. The sequence-based search identified 658 potential dehalogenases. The subsequent structural bioinformatics prioritized and selected 20 candidates for exploration of protein functional diversity. Out of these twenty, the expression analysis and the robotic screening of enzymatic activity provided 8 soluble proteins with dehalogenase activity. The enzymes discovered originated from genetically unrelated Bacteria and Eukaryota, and, for the first time, from Archaea. Overall, the integrated system provided biocatalysts with broad catalytic diversity showing unique substrate specificity profiles, covering a wide range of optimal operational temperature from 20 to 70 °C and an unusually broad pH range from 5.7 to 10. We obtained the most catalytically proficient native haloalkane dehalogenase enzyme to date (kcat/K0.5 = 96.8 mM-1.s-1), the most thermostable enzyme with melting temperature 71 °C, three different cold-adapted enzymes showing dehalogenase activity at near-to-zero temperatures and a biocatalyst degrading the warfare chemical sulfur mustard. The established strategy can be adapted to other enzyme families for exploration of their biocatalytic diversity in a large sequence space continuously growing due to the use of next-generation sequencing technologies.