Welcome to use pPIC9 program for evaluation and design for high-level expression of foreign genes in pPIC9 vector

Even through many heterologous proteins have been successfully expressed in the methylotrophic yeast, Pichia pastoris, we still cannot predict expression level of heterologous genes before experiments because many factors are involved in affecting the expression level. Here we report a mathematical model for high-level expression of foreign genes in pPIC9 vector. At first, we collected 40 heterologous genes expressed in pPIC9 vector, and these 40 genes were classified into high-level expression group (expression level £¾ 100mg/L, 12 genes) and low-level expression group (expression level £¼ 100mg/L, 28 genes). Then, both Fisher¡¯s and Naïve bayes classification methods were used to construct the discriminant functions with RNA secondary structure profile of 3'-UTR of foreign genes as features. The related program is Tclass classification system. The classification accuracy from leave-one-out cross-validation was 100%. Finally, another 5 genes collected from literatures outside of the model were used to test the ability of the discriminant functions. The results indicated that there were four genes correctly predicted. The classification accuracy on this independent gene set was 80%. In addition, the mathematical model was also verified by expressing human neutrophil gelatinase-associated lipocalin (NGAL) gene with expression level more than 100 mg/L. Therefore, the model introduced here can be used to predict the expression level of heterologous genes before experiments and optimize the experiment designs to obtain the high level expression. Following figure demonstrates the relationship between the classification accuracy and the number of intervals based on stability analysis. Obviously, the maximum classification accuracy was obtained with 6 intervals by using Bayes method. Therefore the program pPIC9 was developed based on these six intervals.