1、hfq regulates acid tolerance and virulence by responding to acid stress in Shigella flexneri
Guang Yang, Ligui Wang, Yong Wang, Peng Li, Jiangong Zhu, Shaofu Qiu, Rongzhang Hao, Zhihao Wu, Wuju Li, Hongbin Song
Research in Microbiology 2015;166(6):476-485
Abstract：Shigella flexneri is an important etiological agent of bacillary dysentery in developing countries. The Hfq protein is thought to play a major regulatory role in various cellular processes in this organism. However, the roles of Hfq in stress tolerance and virulence in S. flexneri in response to environmental stress have not been fully studied. In this study, hfq was highly expressed when S. flexneri was exposed to low pH. Growth retardation was observed in the hfq deletion mutant at pH values ranging from 5.0 to 7.0 and the survival rate of the mutant strain was reduced by 60% in acidic conditions (pH 3.0) compared with the wild-type strain. Additionally, competitive invasion assays in HeLa cells and lung invasion assays showed that the virulence of the hfq deletion mutant was significantly decreased. An evaluation of the mechanism revealed that, along with the expression of the Type III secretion system genes, acid resistance genes were also increased with acid stress. Interestingly, a statistically strong linear correlation was observed between the expression of hfq and Type III secretion system genes, as well as between hfq and acid resistance genes, under various pH conditions. In this study, we provide evidence that Hfq regulates genes related to acid resistance for survival under acid stress and controls virulence through the positive regulation of Type III secretion systems. Importantly, we propose that hfq is a key factor in maximal adaptation to host acid stress during infection, regulating acid stress tolerance and virulence in response to acid stress in S. flexneri.
2、Predicting linear B-cell epitopes using amino acid anchoring pair composition
Weike Shen, Yuan Cao, Lei Cha, Xufei Zhang, Xiaomin Ying, Wei Zhang, Kun Ge, Wuju Li and Li Zhong
BioData Mining. 2015 Apr 29;8:14.
Accurate identification of linear B-cell epitopes plays an important role in peptide vaccine designs, immunodiagnosis, and antibody productions. Although several prediction methods have been reported, unsatisfied accuracy has limited the broad usages in linear B-cell epitope prediction. Therefore, developing a reliable model with significant improvement on prediction accuracy is highly desirable.
In this study, we developed a novel model for prediction of linear B-cell epitopes, APCpred, which was derived from the combination of amino acid anchoring pair composition (APC) and Support Vector Machine (SVM) methods. Systematic comparisons with the existing prediction models demonstrated that APCpred method significantly improved the prediction accuracy both in fivefold cross-validation of training datasets and in independent blind datasets. In the fivefold cross-validation test with Chen872 dataset at window size of 20, APCpred achieved AUC of 0.809 and accuracy of 72.94%, which was much more accurate than the existing models, e.g., Bayesb, Chen's AAP methods and the enhanced combination method of AAP with five AP scales. For the fivefold cross-validation test with ABC16 dataset, APCpred achieved an improved AUC of 0.794 and ACC of 73.00% at window size of 16, and attained an AUC of 0.748 and ACC of 67.96% on Blind387 dataset after being trained with ABC16 dataset. Trained with Lbtope_Confirm dataset, APCpred achieved an increased Acc of 55.09% on FBC934 dataset. Within sequence window sizes from 12 to 20, APCpred final model on homology-reduced dataset achieved an optimal AUC of 0.748 and ACC of 68.43% in fivefold cross-validation at the window size of 20.
APCpred model demonstrated a significant improvement in predicting linear B-cell epitopes using the features of amino acid anchoring pair composition (APC). Based on our study, a webserver has been developed for on-line prediction of linear B-cell epitopes, which is a free access at: http:/ccb.bmi.ac.cn/APCpred/.
3、Large-Scale Brain Network Coupling Predicts Total Sleep Deprivation Effects on Cognitive Capacity
Yu Lei, Yongcong Shao, Lubin Wang, Tianye Zhai, Feng Zou, Enmao Ye, Xiao Jin, Wuju Li, Jianlin Qi, Zheng Yang
PLoS One. 2015 July 28; 10(7):e0133959
Abstract：Interactions between large-scale brain networks have received most attention in the study of cognitive dysfunction of human brain. In this paper, we aimed to test the hypothesis that the coupling strength of large-scale brain networks will reflect the pressure for sleep and will predict cognitive performance, referred to as sleep pressure index (SPI). Fourteen healthy subjects underwent this within-subject functional magnetic resonance imaging (fMRI) study during rested wakefulness (RW) and after 36 h of total sleep deprivation (TSD). Self-reported scores of sleepiness were higher for TSD than for RW. A subsequent working memory (WM) task showed that WM performance was lower after 36 h of TSD. Moreover, SPI was developed based on the coupling strength of salience network (SN) and default mode network (DMN). Significant increase of SPI was observed after 36 h of TSD, suggesting stronger pressure for sleep. In addition, SPI was significantly correlated with both the visual analogue scale score of sleepiness and the WM performance. These results showed that alterations in SN-DMN coupling might be critical in cognitive alterations that underlie the lapse after TSD. Further studies may validate the SPI as a potential clinical biomarker to assess the impact of sleep deprivation.
4、Altered Superficial Amygdala–Cortical Functional Link in Resting State After 36 Hours of Total Sleep Deprivation
Yu Lei, Yongcong Shao, Lubin Wang, Enmao Ye, Xiao Jin, Feng Zou, Tianye Zhai, Wuju Li, and Zheng Yang
Journal of Neuroscience Research 2015 Dec;93(12):1795-1803
Abstract：The superficial amygdala (SFA) is important in human emotion/affective processing via its strong connection with other limbic and cerebral cortex for receptive and expressive emotion processing. Few studies have investigated the functional connectivity changes of the SFA under extreme conditions, such as prolonged sleep loss, although the SFA showed a distinct functional connectivity pattern throughout the brain. In this study, resting-state functional magnetic resonance imaging (rs-fMRI) was employed to investigate the changes of SFA-cortical functional connectivity after 36 hr of total sleep deprivation (TSD). Fourteen healthy male volunteers aged 25.9?±?2.3 years (range 18-28 years) enrolled in this within-subject crossover study. We found that the right SFA showed increased functional connectivity with the right medial prefrontal cortex (mPFC) and decreased functional connectivity with the right dorsal posterior cingulate cortex (dPCC) in the resting brain after TSD compared with that during rested wakefulness. For the left SFA, decreased connectivity with the right dorsal anterior cingulate cortex (dACC) and right dPCC was found. Further regression analysis indicated that the functional link between mPFC and SFA significantly correlated with the Profile of Mood State scores. Our results suggest that the amygdala cannot be treated as a single unit in human neuroimaging studies and that TSD may alter the functional connectivity pattern of the SFA, which in turn disrupts emotional regulation.
5、Different Effects of p52SHC1 and p52SHC3 on the Cell Cycle of Neurons and Neural Stem Cells
NING TANG，DAN LYU, TAO LIU，FANGJIN CHEN, SHUQIAN JING, TIANYU HAO, AND SHAOJUN LIU
J Cellular Physiology 2016 Jan;231(1):172-180. Epub 28 SEP 2015.
Abstract：SHC3 is exclusively expressed in postmitotic neurons, while SHC1 is found in neural stem cells and neural precursor cells but absent in mature neurons. In this study, we discovered that suppression of p52SHC1 expression by RNA interference resulted in proliferation defects in neural stem cells, along with significantly reduced protein levels of cyclin E and cyclin A. At the same time, p52SHC3 RNAi caused cell cycle re-entry (9.54% in S phase and 5.70% in G2-M phase) in primary neurons with significantly up-regulated expression of cyclin D1, cyclin E, cyclin A, CDK2, and phosphorylated CDK2. When p52SHC3 was overexpressed, the cell cycle of neural stem cells was arrested with reduced protein levels of cyclin D1, cyclin E, and cyclin A, while overexpression of p52SHC1 did not result in significant changes in postmitotic neurons. Our results indicate that p52SHC3 plays an important role in maintaining the mitotic quiescence of neurons, while p52SHC1 regulates the proliferation of neural stem cells.
6、sRNATarBase 3.0: an updated database for sRNA-target interactions in bacteria.
Jiang Wang, Tao Liu, Bo Zhao, Qixuan Lu, Zheng Wang, Yuan Cao, and Wuju Li
Nucleic Acids Res 2015 Oct 25. pii: gkv1127.
Abstract：Bacterial sRNAs are a class of small regulatory RNAs of about 40-500 nt in length; they play multiple biological roles through binding to their target mRNAs or proteins. Therefore, elucidating sRNA targets is very important. However, only targets of a few sRNAs have been described. To facilitate sRNA functional studies such as developing sRNA target prediction models, we updated the sRNATarBase database, which was initially developed in 2010. The new version (recently moved to http://ccb1.bmi.ac.cn/srnatarbase/) contains 771 sRNA-target entries manually collected from 213 papers, and 23 290 and 11 750 predicted targets from sRNATarget and sTarPicker, respectively. Among the 771 entries, 475 and 17 were involved in validated sRNA-mRNA and sRNA-protein interactions, respectively, while 279 had no reported interactions. We also presented detailed information for 316 binding regions of sRNA-target mRNA interactions and related mutation experiments, as well as new features, including NCBI sequence viewer, sRNA regulatory network, target prediction-based GO and pathway annotations, and error report system. The new version provides a comprehensive annotation of validated sRNA-target interactions, and will be a useful resource for bacterial sRNA studies.
1、Identification of a Tumor Suppressive Human Specific MicroRNA within the FHIT Tumor Suppressor Gene
Baocheng Hu, Xiaomin Ying, Jian Wang, Jittima Piriyapongsa, I. King Jordan, Jipo Sheng, Fang Yu, Po Zhao, Yazhuo Li, Hongyan Wang, Wooi Loon Ng, Shuofeng Hu, Xiang Wang, Chenguang Wang, Xiaofei Zheng, Wuju Li, Walter J. Curran, and Ya Wang
Cancer research,2014; doi: 10.1158/0008-5472.CAN-13-3279
Abstract：Loss or attenuated expression of the tumor-suppressor gene FHIT is associated paradoxically with poor progression of human tumors. Fhit promotes apoptosis and regulates reactive oxygen species; however, the mechanism by which Fhit inhibits tumor growth in animals remains unclear. In this study, we used a multidisciplinary approach based on bioinformatics, small RNA library screening, human tissue analysis, and a xenograft mouse model to identify a novel member of the miR-548 family in the fourth intron of the human FHIT gene. Characterization of this human-specific microRNA illustrates the importance of this class of microRNAs in tumor suppression and may influence interpretation of Fhit action in human cancer.
2、The potential biomarker panels for identification of Major Depressive Disorder (MDD) patients with and without early life stress (ELS) by metabonomic analysis.
Xinghua Ding, Shuguang Yang, Wuju Li, Yong Liu, Zhiguo Li, Yan Zhang, Lingjiang Li,Shaojun Liu
PLoS ONE,2014,9(5): e97479.
The lack of the disease biomarker to support objective laboratory tests still constitutes a bottleneck in the clinical diagnosis and evaluation of major depressive disorder (MDD) and its subtypes. We used metabonomic techniques to screen the diagnostic biomarker panels from the plasma of MDD patients with and without early life stress (ELS) experience.
Plasma samples were collected from 25 healthy adults and 46 patients with MDD, including 23 patients with ELS and 23 patients without ELS. Furthermore, gas chromatography/mass spectrometry (GC/MS) coupled with multivariate statistical analysis was used to identify the differences in global plasma metabolites among the 3 groups.
The distinctive metabolic profiles exist either between healthy subjects and MDD patients or between the MDD patients with ELS experience (ELS/MDD patients) and the MDD patients without it (non-ELS/MDD patients), and some diagnostic panels of feature metabolites' combination have higher predictive potential than the diagnostic panels of differential metabolites.
These findings in this study have high potential of being used as novel laboratory diagnostic tool for MDD patients and it with ELS or not in clinical application.
摘要:初步研究大肠杆菌中基因组水平的蛋白质-RNA相互作用(protein-RNA interactions,PRI)。方法 通过RNA酶消化细菌裂解液，提取与蛋白质相互作用的RNA片段，构建cDNA文库，进行高通量测序，并通过生物信息学分析获得与蛋白质结合的转录本。结果 获得了与蛋白质结合的3193条转录本，涉及2234个mRNA、47个sRNA(small regulatory RNAs)、39个tRNA、11个rRNA以及862个基因间区（intergenic region, IGR）。结论 初步获得大肠杆菌中与蛋白质相互作用的转录本信息，为进一步开展PRI研究提供了支持。
1、Optimisation of reverse transcription loop-mediated isothermal amplification assay for the rapid detection of pandemic (H1N1) 2009 virus
Xin Cai, Zha Lei, Wen-liang Fu, Zhe-yi Zhu, Min-ji Zou, Jie Gao, Yuan-yuan Wang, Min Hong, Jia-xi Wang, Wu-ju Li, Dong-gang Xu
Afr. J. Microbiol. Res. 2013，7：3919-3925
Abstract：Conventional reverse transcriptase polymerase chain reaction (RT-PCR) and optimized of a closed tube reverse-transcription loop-mediated isothermal amplification (RT-LAMP) were used for detection of pandemic (H1N1) 2009 virus and the optimized of a closed tube RT-LAMP methods were compared with the conventional RT-PCR with respect to specificity and sensitivity. In this study, optimized RT-LAMP detected 2 copies of target RNA by visual detection with modified dye. Reaction time, temperature and quantity of each reagent were optimised for the detection of the virus. The sensitivity of detection limit by optimised RT-LAMP was 100 times as that of conventional RT-PCR. Amplification of DNA can be identified by visualization with modified dye, which reduces the cross-contamination caused by opening tube. The sensitivity of visual detection was equivalent to that of electrophoresis analysis. Additionally, the method was specific as no cross-reaction was observed among samples from human blood, Escherichia coli and other related viruses including human seasonal influenza A, subtypes H1N1, H1N2 and H3N2 viruses. These results demonstrate that the optimized RT-LAMP assay for pandemic (H1N1) 2009 virus RNA was a valuable tool with simplicity, rapidity and specificity, as well as its superiority for the screening and surveillance of influenza in developing countries.
摘要:目的通过比较in vitro与in vivo的RNA-RNA相互作用(RNA-RNA interaction,RRI),探究通过in vitroRRI推测in vivo RRI的可靠性。方法采用perl语言编写脚本分析酵母转录组水平in vitro RNA的二级结构信息,得到可能的in vitro RRI,再与酵母的in vivo小核仁RNA(snoRNA)-rRNA相互作用进行比较。结果发现in vitrosnoRNA-rRNA相互作用与in vivo snoRNA-rRNA相互作用的重叠率仅为23.42%(26/111);而in vitro测定的snoRNA双链片段与in vivo测定的参与RRI的snoRNA片段重叠率为38.78%(19/49);in vitro测定的rRNA双链片段与in vivo测定的参与RRI的rRNA片段重叠率为80.70%(46/57)。结论 in vitro和in vivo条件下snoRNA-rRNA的相互作用差异很大,提示in vitro条件下测定的snoRNA-rRNA的相互作用不能真实反映它们在in vivo的相互作用。
1、Computational tools for predicting sRNA targets
Wuju Li, Xiaomin Ying, Lei Cha
Regulatory RNAs in prokaryotes，2011，ISBN 978-3-7091-0217-6：165-177
摘要:目的环介导等温扩增法(loop-mediated isothermal amplification,LAMP)是一种新型等温核酸扩增方法。由于其具有简便、高效、特异性高和成本低等优点,在甲型H1N1流感和肺结核等流行 病检测中得到了广泛应用。在LAMP技术中,关键的起始步骤是设计合适的引物序列。为了让引物设计更加方便与高效,我们开发了LAMP引物设计软件 BioSunLAMP。方法采用Delphi程序设计语言开发了界面友好、便于使用的软件系统。结果经甲型H1N1流感、结核分枝杆菌实验验 证,BioSunLAMP软件设计的引物达到了预期效果。此外,与同类软件相比,BioSunLAMP还具有如下特点:①集引物设计与引物特异性分析于一 体,可以通过本地数据库或远程调用NCBI的相关数据库来检查引物特异性;②支持针对多序列的通用引物与特异引物设计。结论 BioSunLAMP软件的开发,为LAMP技术的普及提供了很好的生物信息学支持。
4、Predicting sRNAs and their targets in bacteria
Wuju Li, Xiaomin Ying, Qixuan Lu, Linxi Chen
Genomics Proteomics Bioinformatics, 2012,10:276-284
Abstract：Bacterial small RNAs (sRNAs) are an emerging class of regulatory RNAs of about 40-500 nucleotides in length and, by binding to their target mRNAs or proteins, get involved in many biological processes such as sensing environmental changes and regulating gene expression. Thus, identification of bacterial sRNAs and their targets has become an important part of sRNA biology. Current strategies for discovery of sRNAs and their targets usually involve bioinformatics prediction followed by experimental validation, emphasizing a key role for bioinformatics prediction. Here, therefore, we provided an overview on prediction methods, focusing on the merits and limitations of each class of models. Finally, we will present our thinking on developing related bioinformatics models in future.
5、Identification and Expression of Small Non-Coding RNA,L10-Leader, in Different Growth Phases of Streptococcus mutans
Li Xia, Wei Xia, Shaohua Li,Wuju Li,..., Ningsheng Shao, and Bingfeng Chu
Nucleic Acid Therapeutics, 2012,22(3):177-186
Abstract:Streptococcus mutans is one of the major cariogenic bacteria in the oral environment. Small non-coding RNAs (sRNAs) play important roles in the regulation of bacterial growth, stress tolerance, and virulence. In this study, we experimentally verified the existence of sRNA, L10-Leader, in S. mutans for the first time. Our results show that the expression level of L10-Leader was growth-phase dependent in S. mutans and varied among different clinical strains of S. mutans. The level of L10-Leader in S. mutans UA159 was closely related to the pH value, but not to the concentrations of glucose and sucrose in culture medium. We predicted target mRNAs of L10-Leader bioinformatically and found that some of these mRNAs were related to growth and stress response. Five predicted mRNA targets were selected and detected by quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR), and we found that the expression levels of these mRNAs were closely related to the level of L10-Leader at different growth phases of the bacteria. Our results indicate that L10-Leader may play an important role in the regulation of responses in S. mutans, especially during its growth phase and acid adaption response.
6、A modified visual loop-mediated isothermal amplification method for diagnosis and differentiation of main pathogens from Mycobacterium tuberculosis complex
Ming Hong, Lei Zha, Wenliang Fu, Minji Zou, Wuju Li, Donggang Xu
World J Microbiol Biotechnol (2012) 28:523–531
Abstract:This study was aimed to rapidly identify and differentiate two main pathogens of the Mycobacterium tuberculosis complex: Mycobacterium tuberculosis subsp. tuberculosis and Mycobacterium bovis by a modified loop-mediated isothermal amplification (LAMP) assay. The reaction results could be evaluated by naked eye with two optimized closed tube detection methods as follows: adding the modified fluorescence dye in advance into the reaction mix so as to observe the color changes or putting a tinfoil in the tube and adding the SYBR Green I dye on it, then making the dye drop into the bottom of the tube by centrifuge after reaction. The results showed that the two groups of primers used jointly in this assay could successfully identify and differentiate Mycobacterium tuberculosis subsp. tuberculosis and Mycobacterium tuberculosis bovis. Sensitivity test displayed that the modified LAMP assay with the closed tube system could determine the minimal template concentration of 1 copy/μl, which was more sensitive than that of routine PCR. The advantages of this LAMP method for detection of the Mycobacterium tuberculosis complex included high specificity, high sensitivity, simplicity, and superiority in avoidance of aerosol contamination. The modified LAMP assay would provide a potential for clinical diagnosis and therapy of tuberculosis in the developing countries and the resource-limited areas.
刘倩, 应晓敏, 吴佳瑤, 查磊, 李伍举
摘要：细菌sRNA 是一类长度在40～500 nt 的调控RNA， 在细菌与环境相互作用中发挥重要功能， 因此， 细菌sRNA
识别研究具有重要意义。然而， 与蛋白编码基因具有易于识别的特征不同， 目前细菌sRNA
识别仍是一件比较困难的事。此方法介绍了一个基于已知细菌sRNA 转录终点的碱基频率矩阵来识别sRNA 的预测策略， 并在大肠杆菌K-12
MG1655 中进行了sRNA 的预测。结果表明， 该模型在独立测试集中具有较高的特异性和阳性检出率， 因此，
2、Generate gene expression profile from
high-throughput sequencing data
Hui LIU, Zhichao JIANG, Xiangzhong FANG, Hanjiang FU,
Xiaofei ZHENG, Lei CHA, Wuju LI
Front. Math. China 2011, 6(6): 1131–1145
Abstract: This work presents two methods, the Least-square and Bayesian
method, to solve the multiple mapping problem in extracting gene expression
profiles through the next-generation sequencing. We parallel the tag sequences
to genome, and partition them to improving the methods’ efficiency. The
essential feature of these methods is that they can solve the multiple
mapping problem between genes and short-reads, while generating almost the
same estimation in single-mapping situation as the traditional approaches.
These two methods are compared by simulation and a real example, which was
generated from radiation-induced lung cancer cells (A549), through mapping
short-reads to human ncRNA database. The results show that the Bayesian
method, as realized by Gibbs sampler, is more efficient and robust than the
3、sTarPicker: A Method for Efficient Prediction of Bacterial
sRNA Targets Based on a Two-Step Model for Hybridization
Xiaomin Ying, Yuan Cao, Jiayao Wu, Qian Liu, Lei Cha, Wuju Li
PLoS ONE 6(7): e22705.
Background: Bacterial sRNAs are a class of small regulatory RNAs involved in regulation of expression of a variety of genes.
Most sRNAs act in trans via base-pairing with target mRNAs, leading to repression or activation of translation or mRNA
degradation. To date, more than 1,000 sRNAs have been identified. However, direct targets have been identified for only
approximately 50 of these sRNAs. Computational predictions can provide candidates for target validation, thereby
increasing the speed of sRNA target identification. Although several methods have been developed, target prediction for
bacterial sRNAs remains challenging.
Results: Here, we propose a novel method for sRNA target prediction, termed sTarPicker, which was based on a two-step
model for hybridization between an sRNA and an mRNA target. This method first selects stable duplexes after screening all
possible duplexes between the sRNA and the potential mRNA target. Next, hybridization between the sRNA and the target
is extended to span the entire binding site. Finally, quantitative predictions are produced with an ensemble classifier
generated using machine-learning methods. In calculations to determine the hybridization energies of seed regions and
binding regions, both thermodynamic stability and site accessibility of the sRNAs and targets were considered. Comparisons
with the existing methods showed that sTarPicker performed best in both performance of target prediction and accuracy of
the predicted binding sites.
Conclusions: sTarPicker can predict bacterial sRNA targets with higher efficiency and determine the exact locations of the
interactions with a higher accuracy than competing programs. sTarPicker is available at http://ccb.bmi.ac.cn/starpicker/.
应晓敏, 朱娟娟, 王小磊, 赵东升, 付汉江, 郑晓飞, 李伍举
中国科学 生命科学，2011，41(10): 958 ~ 964
摘要：microRNA(miRNA)是一类不编码蛋白的调控小分子RNA, 在真核生物中发挥着广泛而重要的调控功能. 由于miRNA的表达具有时空特异性, 因而通过计算方法预测miRNA而后有针对性的实验验证是miRNA 发现的一条重要途径. 降低假阳性率是miRNA 预测方法面临的重要挑战. 本研究采用集成学习方法构建预测miRNA 前体的分类器SVMbagging, 对训练集、测试集和独立测试集的结果表明, 本研究的方法性能稳健、假阳性率低, 具有很好的泛化能力, 尤其是当阈值取0.9 时, 特异性高达99.90%, 敏感性在26%以上, 适合于全基因组预测. 采用SVMbagging 在人全基因组中预测miRNA 前体, 当取阈值0.9 时, 得到14933 个可能的miRNA前体. 通过与高通量小RNA 测序数据的比较, 发现其中4481 个miRNA 前体具有完全匹配的小RNA 序列, 与理论估计的真阳性数值非常接近. 最后, 对32 个可能的miRNA 进行实验验证, 确定其中2 条为真实的miRNA.
A comprehensive database of bacterial sRNA targets verified by
Yuan Cao, Jiayao Wu, Qian Liu,
Yalin Zhao, Xiaomin Ying, Lei Cha, Ligui Wang, and Wuju Li
RNA (2010), 16:2051–2057
ABSTRACT：Bacterial sRNAs are an emerging class of
small regulatory RNAs, 40–500 nt in length, which play a variety of
important roles in
many biological processes through binding to their mRNA or protein
targets. A comprehensive database of experimentally
confirmed sRNA targets would be helpful in understanding sRNA functions
systematically and provide support for developing
prediction models. Here we report on such a database—sRNATarBase. The
database holds 138 sRNA–target interactions and
252 noninteraction entries, which were manually collected from
peer-reviewed papers. The detailed information for each entry,
such as supporting experimental protocols, BLAST-based phylogenetic
analysis of sRNA–mRNA target interaction in closely
related bacteria, predicted secondary structures for both sRNAs and
their targets, and available binding regions, is provided as
accurately as possible. This database also provides hyperlinks to other
databases including GenBank, SWISS-PROT, and MPIDB. The database is
available from the web page
查磊 高杰 洪明 邹民吉
李伍举 徐东刚 吴奎武
摘要：目的：比较RT- LAMP检测甲型H 1N1流感病毒核酸几种不同的结果判定方法的差异, 优化检测方法。方法
对比电泳、直接观察、加入SYBR GREEN Ñ 核酸染料和优化后的预染核酸染料和的检测灵敏度。结果：电泳检测的灵敏度最高, 加入SYBR GREEN Ñ 染料的灵敏度略低,
而直接的肉眼观察灵敏度低约2个数量级。加入优化后的预染染料其检测灵敏度与加入SYBR GREEN Ñ 核酸染料相当。结论
高目测判定反应结果的灵敏度, 增强反应的特异性, 并可降低气溶胶污染。
查磊, 付文亮, 邹民吉,
李伍举, 吴奎武, 徐东刚
摘要: 目的:快速诊断和区分结核杆菌复合群内两种主要致病菌。方法:利用环介导等温扩增( LAM P) 技术建立快速检测和区分结核杆菌复合群内主要致病菌的方法, 利用该方法对相关的临床分离株样本进行特异性检测, 并利用1:10 倍比稀释的已知菌株DNA 模板分析其敏感性。反应结束后通过电泳或向反应管中加入DNA染料肉眼判定检测结果。结果:该方法可以成功检测到主要的结核致病菌: 人型和牛型结核分枝杆菌, 与包括卡介苗在内的其余相关菌株均未见非特异性交叉反应。检测的灵敏度可达100 拷贝/微升, 高于常规PCR方法。结论:该检测方法具有敏感、特异、低成本和快速的特点, 可检测和区分人型和牛型结核分枝杆菌, 并能排除卡介苗对诊断的干扰。
a plug-in-based software for the management of patients information
and the analysis of peptide profiles from mass spectrometry
Yuan Cao, Na Wang, Xiaomin Ying, Ailing
Li, Hengsha Wang, Xuemin Zhang and Wuju Li
BMC Medical Informatics and Decision
Making 2009, 9:13doi:10.1186/1472-6947-9-13
With wide applications of matrix-assisted laser
desorption/ionization time-of-flight mass spectrometry (MALDI-TOF
MS) and surface-enhanced laser desorption/ionization time-of-flight
mass spectrometry (SELDI-TOF MS), statistical comparison of serum
peptide profiles and management of patients information play an
important role in clinical studies, such as early diagnosis,
personalized medicine and biomarker discovery. However, current
available software tools mainly focused on data analysis rather than
providing a flexible platform for both the management of patients
information and mass spectrometry (MS) data analysis.
Here we presented a plug-in-based software, BioSunMS, for both the
management of patients information and serum peptide profiles-based
statistical analysis. By integrating all functions into a
user-friendly desktop application, BioSunMS provided a comprehensive
solution for clinical researchers without any knowledge in
programming, as well as a plug-in architecture platform with the
possibility for developers to add or modify functions without need
to recompile the entire application.
BioSunMS provides a plug-in-based solution for managing, analyzing,
and sharing high volumes of MALDI-TOF or SELDI-TOF MS data. The
software is freely distributed under GNU General Public License (GPL)
and can be downloaded from
a web server for prediction of bacterial sRNA targets
Yuan Cao, Yalin Zhao, Lei Cha, Xiaomin
Ying, Ligui Wang, Ningsheng Shao, Wuju Li*
there exist some small non-coding RNAs (sRNAs) with 40–500 nucleotides
in length. Most of them function as post-transcriptional regulation of
gene expression through binding to their target mRNAs, in which Hfq
protein acts as RNA chaper-one. With the increase of identified sRNA
genes in the bacterium, prediction of sRNA targets plays a more
important role in determining sRNA functions. However, there are few
available computational tools for predicting sRNA targets at present.
Here we introduced a web server, sRNATarget, for genome-scale prediction
of bacterial sRNA targets. The server is based on a re-cently published
model which uses Naive Bayes method as the supervised method and take
RNA secondary structure profile as the feature. The prediction results
will be returned to the users through E-mail.
a batch platform for large-scale genomic analysis of mammalian small
Xiaomin Ying, You Jung
Kim, Yiqing Mao, Ming Liu, Yanyan Hou, Hua Li, Xiaolei Wang, Yalin
Zhao, Dongsheng Zhao, Jignesh M. Patel, Wuju Li*
increasing number of small RNAs have been discovered in mammals.
However, their primary transcripts and upstream regulatory networks
remain largely to be determined. Genomic analysis of small RNAs
facilitates identification of their primary transcripts, and hence
contributes to researches of their upstream regulatory networks. We here
report a batch platform, BatchGenAna, which is specifically designed for
large-scale genomic analysis of mammalian small RNAs. It can map and
annotate for as many as 1000 small RNAs or 10,000 genomic loci of small
RNAs at a time. It provides genomic features including RefSeq genes,
mRNAs, ESTs and repeat elements in tabular and graphical results. It
also allows extracting flanking sequences of submitted queries,
specified genomic regions and host transcripts, which facilitates
subsequent analysis such as scanning transcription factor binding sites
in upstream sequences and poly(A) signals in downstream sequences.
Besides small RNA fields, BatchGenAna can also be applied to other
research fields, e.g. in silico analysis of target genes of
出 版 社：军事医学科学出版社 ISBN：9787802450752
1.Construction of two
mathematical models for prediction of bacterial sRNA targets
Yalin Zhao, Hua Li, Yanyan Hou, Lei Cha,
Yuan Cao, Ligui Wang, Xiaomin Ying, Wuju Li
Biophysical Research Communications,2008, 372:346–350
Accurate prediction of sRNA targets plays a key role in determining
sRNA functions. Here we introduced two mathematical models,
sRNATargetNB and sRNATargetSVM, for prediction of sRNA targets using
Naı¨ve Bayes method and support vector machines (SVM), respectively.
The training dataset was composed of 46 positive samples (real
sRNA–targets interaction) and 86 negative samples (no interaction
between sRNA and targets). The leave-one-out cross-validation (LOOCV)
classification accuracy was 91.67% for sRNATargetNB, and 100.00% for
sRNATargetSVM. To evaluate the performance of the models, an
independent test dataset was used, which contained 22 positive
samples and 1700 randomly generated negative samples. The results
showed that the classification accuracy, sensitivity, and
specificity were 93.03%, 40.90%, and 93.71% for sRNATargetNB and
80.55%, 72.73%, and 80.65% for sRNATargetSVM,respectively.
Therefore, the presented models provide support for experimental
identification of sRNA targets.The related software and
supplementary materials can be downloaded from webpage
2.Identification and verification of
microRNA in wheat
Jin, Nannan Li, Bin Zhang, Fangli Wu, Wuju Li, Aiguang Guo,
J Plant Res, 2008, 121:351–355
MicroRNAs (miRNAs) are small,
endogenous RNAs that regulate gene expression in both plants and
animals. A large number of miRNAs has been identified from various
animals and model plant species such as Arabidopsis thaliana and
rice (Oryza sativa); however, characteristics of wheat (Triticum
aestivum) miRNAs are poorly understood. Here, computational
identification of miRNAs from wheat EST sequences was preformed by
using the in-house program GenomicSVM, a prediction model for
miRNAs. This study resulted in the discovery of 79 miRNA candidates.
Nine out of 22 miRNA representatives randomly selected from the 79
candidates were experimentally validated with Northern blotting,
indicating that prediction accuracy is about 40%. For the 9
validated miRNAs, 59 wheat ESTs were predicted as their putative
33% ,特异性为100% ,对剩余的91例人p re2miRNA和91例3′UTR中的p seudo p
re2miRNA敏感性和特异性分别达到91. 21% ( 83 /91) 和93. 41%(85 /91)
。在除人以外的其他20种动物和病毒的1 353例p re2miRNA中,MiRscreen正确判断出其中的1 192例,敏感性达到88.
seudo p re2miRNA和随机抽取的797例人19号染色体折叠形成的p seudo p re2miRNA (共计1
353例混合阴性样本)中,MiRscreen的特异性达到85. 14%(1 152 /1 353)
达到86. 62% ,比其他方法高6%以上; MiRscreen的AUC值达到0. 938,也明显高于其他方法。
(miRNA)是近几年发现的一类长度为～21 nt 的内源非编码小RNA,
在植物和动物中发挥着重要而广泛的调控功能。它的发现主要有cDNA 克隆测序和计算发现两条途径。由于cDNA 克隆测序方法受miRNA
表达的时间和组织特异性以及表达水平的影响, 而计算发现可以弥补其不足, 因此miRNA
的计算发现方法研究受到了广泛的重视。文章对近几年计算发现miRNA 的研究进展进行了综述, 根据计算发现方法的本质,
关键词: microRNA; 计算发现; 同源搜索; 比较基因组学; 作用靶标; 机器学习
畸变 突变, 2008，20：85-88
摘 要:为建立口蹄疫病毒( Foot2and2mouth
disease virus ,FMDV) 不同血清型与基因型的基因芯片检
测方法,设计针对O 型8 个基因型、A 型3 个基因型和亚洲1 型的特异性探针。从美国GenBank
与英国世界口蹄疫参考实验室基因库下载了O 型、A 型和亚洲1 型FMDV 的VP1 基因序列547 条。对每一血清型序列用DNA
Star 软件ClastalW 程序进行多重比对,做系统发育分析并进行基因分型。用生物学软件BioSun 2. 0
条特异性探针。以各型特异性探针所对应的靶序列模板做10 倍系列稀释进行PCR 扩增,扩增产物与探针杂交,验证各探针的灵敏度。对O
型SEA、Euro2SA、ME2SA、WA 4 个基因型的各条探针的灵敏度进行了检验,结果这些探针能够检测到102
of mathematical model for high-level expression of foreign genes in
pPIC9 vector and its verification
Bingli Wu, Lei Cha, Zepeng
Du, Xiaomin Ying, Hua Li, Liyan Xu, Xiaofei Zheng, Enmin Li, Wuju Li
Biophysical Research Communications,2007, 354:498–504
In this report, we introduced a mathematical model for high-level
expression of foreign genes in pPIC9 vector. At first, we collected 40
heterologous genes expressed in pPIC9 vector, and these 40 genes were
classified into high-level expression group (expression level >100mg/L,
12 genes) and low-level expression group (expression level <100mg/L, 28
genes). Then, the Naive Bayes method was used to construct the model
with RNA secondary structure profile of 3'-end of foreign genes as
features. The classification accuracy from leave-one-out
cross-validation was 100%. Finally, another five genes collected from
literatures were used to test the ability of the model. The results
indicated that there were four genes correctly predicted. In addition,
the model was also verified by expressing human neutrophil gelatinase-associated
lipocalin (NGAL) gene with expression level more than 100mg/L.
Therefore, we propose that the model can be used to predict the
expression level of heterologous genes before experiments and optimize
the experiment designs to obtain the high-level expression. Furthermore,
we have developed a web server for evaluation and design for high-level
expression of foreign genes, which is accessible at
Full Text Download:
W. Li and L. Cha
Cell. Mol. Life Sci.,
2007, 64:1785 – 1792
the identification of RNA-mediated interference (RNAi) in 1998, RNAi has
become an effective tool to inhibit gene expression. The inhibition
mechanism is triggered by introducing a short interference
double-stranded RNA (siRNA,19~27 bp) into the cytoplasm, where the guide
strand of siRNA (usually antisense strand) binds to its target messenger
RNA and the expression of the target gene is blocked. RNAi has been
widely applied in gene functional analysis, and as a potential
therapeutic strategy in viral diseases, drug target discovery, and
cancer therapy. Among the factors which may compromise inhibition
efficiency, how to design siRNAs with high efficiency and high
specificity to its target gene is critical. Although many algorithms
have been developed for this purpose, it is still difficult to design
such siRNAs. In this review, we will briefly discuss prediction methods
for siRNA efficiency and the problems of present approaches.
Full Text Download:
Full Text Download:
Full Text Download:
查磊, 应晓敏, 曹源, 李华, 李伍举
MAX 2.05、DNAStar 5.0、Vector NTI 9.1和BioEdit 7.0
Full Text Download:
3.Mprobe 2.0:Computer-Aided Probe Design for Oligonucleotide Microarray
Wuju Li, Xiaomin Ying
bioinformatics, 2006, 5:181-186
DNA chips have proven to be effective tools in detecting gene expression
levels. Compared with DNA chips using complementary DNA as probes,
oligonucleotide microarrays using oligonucleotides as probes have
attracted great attention because of their well known advantages. The
design of gene-specific probes for each target is essential to the
development of oligonucleotide microarrays. We have previously reported
the development of a probe design software termed Mprobe 1.0. Here, we
present a new version of this software, termed Mprobe 2.0. Several new
features are included in Mprobe 2.0. Firstly, a paradox-based sequence
database management system has been developed and integrated into the
software, which consequently allows interoperability with sequences in
GenBank, EMBL, and FASTA formats. Secondly, in contrast to setting a
fixed threshold for the secondary structure of probes in Mprobe 1.0 and
other related software, Mprobe 2.0 employs a different method. After
parameters such as GC type, probe melting temperature and GC contents
have been evaluated, candidate probes are sorted by the free energy from
high to low value, followed by specificity analysis. Thirdly, Mprobe 2.0
provides users with substantial parameter options in the visual mode.
Mprobe 2.0 possesses an easier interface for users to manage sequences
annotated in different formats and design the optimal probes for
oligonucleotide microarrays and other applications. AVAILABILITY: The
program is free for non-commercial users and can be downloaded from the
Full Text Download:
1.How many genes are needed for early detection of breast cancer, based on
gene expression patterns in peripheral blood cells?
Breast Cancer Research,
2005, vol. 7 (5): E5.
Full Text Download:
their recent report ,
Sharma and coworkers explore the early detection of
breast cancer. They analyzed a gene expression data
set (1368 genes in 62 normal and 40 tumour samples,
including sample duplication in different batches)
using the nearest shrunken centroid method. They
identified a panel of 37 genes that permitted early
detection, with the classification accuracy being
about 82%. This is a typical problem with sample
classification based on gene expression profiling.
The objective is to achieve high prediction accuracy
with as few genes as possible, and so feature
selection plays an important role; examination of a
large number of genes will increase the
dimensionality, computational complexity, and
clinical cost. According to our previous study of
data sets from patients with colon cancer, leukaemia
and breast cancer ,
we estimated that five or six genes – rather than 37
-would be sufficient for the early detection of
beast cancer .
So how many genes are indeed needed? In order to
address this question, we evaluated the data
presented by Sharma and coworkers using the Tclass
Tclass system, Fisher's linear discriminant analysis
and a step-wise optimization procedure for feature
selection are used to analyze a batch adjusted data
in two ways. The first is to take the prediction
accuracy from the training set as the object
function. The second way is to take the
classification accuracy from the leave-one-out
cross-validation as the object function. For the
former, the selected optimal feature sets are
evaluated by randomly dividing all tissue samples
into a training set (e.g. 50%, 67%, or 85% of
samples) and a test set 200 times. The relationship
between the prediction accuracy and the number of
genes is illustrated in Fig.
1, which shows that the greatest prediction
accuracy was achieved using six genes (Fig.
1a); other peaks in accuracy occurred when 10,
13, or 15 genes were used (Fig.
1b). Furthermore, two genes – the 481th
(BC009696) and the 801th (BC000514) – permitted
classification accuracy as high as 86%, which is
greater than the 82% achieved by Sharma and
with the selected 37 genes.
2.An approach to studying lung cancer-related proteins in human blood
Ting Xiao, Wantao Ying,
Lei Li, Zhi Hu, Ying Ma, Liyan Jiao,
Jinfang Ma, Yun Cai, Dongmei Lin, Suping Guo,
Naijun Han, Xuebing Di, Min Li, Dechao Zhang, Kai Su,
Jinsong Yuan, Hongwei Zheng, Meixia Gao, Jie He,
Susheng Shi, Wuju Li, Ningzhi Xu, Husheng Zhang,
Yan Liu, Kaitai Zhang, yanning Gao, Xiaohong Qian,
and Shujun Cheng
& Cellular Proteomics, 2005, published online.
Early-stage lung cancer detection is the first step towards successful
clinical therapy and increased patient survival. Clinicians monitor
cancer progression by profiling tumor cell proteins in the blood plasma
of afflicted patients. Blood plasma, however, is a difficult cancer
protein assessment media, because it is rich in albumins and
heterogeneous protein species. We report herein a method to detect the
proteins released into the circulatory system by tumor cells. Initially,
we analyzed the protein components in the conditional medium (CM) of
lung cancer primary cell or organ cultures, and in the adjacent normal
bronchus using 1-D PAGE and nano-ESI-MS/MS. We identified 299 proteins
involved in key cellular process such as cell growth, organogenesis and
signal transduction. We selected 13 interesting proteins from this list,
and analyzed them in 628 blood plasma samples using ELISA. We detected
11 of these 13 proteins in the plasma of lung cancer patients and
non-patient controls. Our results showed that plasma MMP1 levels were
elevated significantly in late-stage lung cancer patients, and that the
plasma levels of 14-3-3 sigma, beta and eta in the lung cancer patients
were significantly lower than those in the control subjects. To our
knowledge, this is the first time that fascin, ezrin, CD98, annexin A4,
14-3-3 sigma, 14-3-3 beta and 14-3-3 eta proteins have been detected in
human plasma by ELISA. The preliminary results showed that a combination
of CD98, fascin, PIGR/SC and 14-3-3 eta had a higher sensitivity and
specificity than any single marker. In conclusion, we report a method to
detect proteins released into blood by lung cancer. This pilot approach
may lead to the identification of novel protein markers in blood and
provide a new method of identifying tumor biomarker profiles for guiding
both early detection and therapy of human cancer.
Full Text Download:
Wuju Li, Tao Liu, Xiaomin
Ying, and Ming Fa
& Cellular Proteomics, 2004, vol.3 (10): S79.
With genomic sequences from three domains of life become increasingly
available, the relationships between the AAC and the genome classes
(organisms' phenotype) have been widely studied in the following two
aspects. The first aspect is to concentrate on the difference of AAC of
proteins from particular type or whole proteomes in different genome
classes. The second aspect is to study the issue of genome class
prediction based on the AAC. The purpose of the above two aspects is to
explain why certain organisms can live in extreme conditions of
temperature, salinity, or pressure. Here we want to emphasize whether
there is a possibility to predict the genome classes as accurately as
possible using small subsets of amino acids. In order to investigate the
issues systematically, the Fisher linear discriminate analysis (FLDA)
was applied to the following four data sets DOMAIN, LIFE, HTHAB, and
ARCHAEA. The DOMAIN is about the three domains of life (16 archaea, 75
bacteria, and 6 eukaryotic genomes). The LIFE is about the three
lifestyles (13 HTH, 4 TH, and 79 MES). The HTHAB includes 10 HTH in
archaea and 3 HTH in bacteria. The ARCHAEA is about the three lifestyles
in archaea (10 HTH, 3 TH, and 3 MES). By using
the feature selection method of all possible
combinations of features (amino acids), we found that the
cross-validation accuracies for above four
data sets could reach 94.8%, 97.9%, 100.0%,
and 100.0% by only using the compositions of four (A, I, K, and Q), five
(I, K, P, V, and Y), two (E and Q), and two (M
and Q) amino acids respectively. The average
cross-validation accuracy reaches 98.2%.
Therefore, AAC from the proteomes provides an
alternative way to determine the genome
classes such as the lifestyle or the domains of life. According to what
we know, the correspondence
analysis, principal component analysis (PCA),
and hierarchical cluster analysis have been applied to study the
distinction of different genome classes using
the AAC, but the classification methods have
not been used. Therefore, our work represents a first attempt on this
effort in this field.
Xiaomin Ying, Hong Luo,
Jingchu Luo and Wuju Li
Nucleic Acids Research,
2004, vol.32: W150-W153.
Abstract: Prediction of
RNA secondary structure is important in the functional analysis of RNA
molecules. The RDfolder web server described in this paper provides two
methods for prediction of RNA secondary structure: random stacking of
helical regions and helical regions distribution. The random stacking
method predicts secondary structure by Monte Carlo simulations. The
method of helical regions distribution predicts secondary structure
based on the helices that appear most frequently in the set of
structures, which are generated by the random stacking method. The
RDfolder web server can be accessed at
Full Text Download:
vol. 28(5): 401-404
Wuju Li, Ming Fan and Momiao
2003, vol.19: 811-817
(gene) selection can dramatically improve the accuracy of gene
expression profile based sample class prediction. Many statistical
methods for feature (gene) selection such as stepwise optimization
and Monte Carlo simulation have been developed for tissue sample
classification. In contrast to class prediction, few statistical and
computational methods for feature selection have been applied to
clustering algorithms for pattern discovery.
Results: An integrated scheme and corresponding
program SamCluster for automatic discovery of sample classes based
on gene expression profile is presented in this report. The scheme
incorporates the feature selection algorithms based on the
calculation of CV (coefficient of variation) and t-test into
hierarchical clustering and proceeds as follows. At first, the genes
with their CV greater than the pre-specified threshold are selected
for cluster analysis, which results in two putative sample classes.
Then, significantly differentially expressed genes in the two
putative sample classes with p-values 0.01, 0.05, or 0.1 from t-test
are selected for further cluster analysis. The above processes were
iterated until the two stable sample classes were found. Finally,
the consensus sample classes are constructed from the putative
classes that are derived from the different CV thresholds, and the
best putative sample classes that have the minimum distance between
the consensus classes and the putative classes are identified. To
evaluate the performance of the feature selection for cluster
analysis, the proposed scheme was applied to four expression
datasets COLON, LEUKEMIA72, LEUKEMIA38, and OVARIAN. The results
show that there are only 5, 1, 0, and 0 samples that have been
misclassified, respectively. We conclude that the proposed scheme,
SamCluster, is an efficient method for discovery of sample classes
using gene expression profile.
Availability: The related program SamCluster is
available upon request or from the web page
Full Text Download:
刘涛. 李伍举. 范明.
张玉梅. 孙长凯. 范明. 李伍举.
刘淑红. 赵杰. 韩大跃. 王嘉玺.
methyl D aspartate, NMDA)受体主亚基M3
M4环靶片段,以此为免疫原,用于进一步免疫原性及相关应用研究.自人脑胶质瘤组织中提取总RNA ,采用RT PCR扩增出人NMDA受体主亚基M3
M4环的原核表达载体(命名为pBV NR1L3) ,通过基因优化,实现了高效表达.凝胶扫描分析表达量约占菌体总蛋白29%
Li Wuju and Xiong Momiao
2002, vol.18: 325-326
method that incorporates feature selection into Fisher’s linear
discriminant analysis for gene expression based tumor classification and
a corresponding program Tclass were developed. The proposed method was
applied to a public gene expression data set for colon cancer that
consists of 22 normal and 40 tumor colon tissue samples to evaluate its
performance for classification. Preliminary results demonstrated that
using only a subset of genes ranging from 3 to 10 can achieve high
Availability: The program is written in Matlab and is
being rewritten in the Java language. The source code is available upon
Full Text Download:
Wuju Li, Jian Huang, Ming
Fan, Shengqi Wang
The present work describes a complete probe design software system for
oligonucleotide microarrays based on Kane’s research on probe
sensitivity and specificity (Kane’s rule). Combining Kane’s rule and
traditional criteria for probe design we constructed MProbe, the
software system for oligonucleotide microarrays using Java. The general
criteria for probe design are: (1) probes may have different lengths
that range from 20 to 100 bases; (2) they should have a similar melting
temperature (Tm) or GC content; (3) they should not contain stable
secondary structures; and (4) they abide by Kane’s rule.
孙长凯. 赵杰. 李伍举. 冯健男.
1.Feature (gene) selection in gene expression-based tumor classification
Xiong M, Li W, Zhao
J, Jin L, Boerwinkle E.
Mol Genet Metab.
There is increasing interest in changing the emphasis of tumor
classification from morphologic to molecular. Gene expression profiles
may offer more information than morphology and provide an alternative to
morphology-based tumor classification systems. Gene selection involves a
search for gene subsets that are able to discriminate tumor tissue from
normal tissue, and may have either clear biological interpretation or
some implication in the molecular mechanism of the tumorigenesis. Gene
selection is a fundamental issue in gene expression-based tumor
classification. In the formation of a discriminant rule, the number of
genes is large relative to the number of tissue samples. Too many genes
can harm the performance of the tumor classification system and increase
the cost as well. In this report, we discuss criteria and illustrate
techniques for reducing the number of genes and selecting an optimal (or
near optimal) subset of genes from an initial set of genes for tumor
classification. The practical advantages of gene selection over other
methods of reducing the dimensionality (e.g., principal components),
include its simplicity, future cost savings, and higher likelihood of
being adopted in a clinical setting. We analyze the expression profiles
of 2000 genes in 22 normal and 40 colon tumor tissues, 5776 sequences in
14 human mammary epithelial cells and 13 breast tumors, and 6817 genes
in 47 acute lymphoblastic leukemia and 25 acute myeloid leukemia
samples. Through these three examples, we show that using 2 or 3 genes
can achieve more than 90% accuracy of classification. This result
implies that after initial investigation of tumor classification using
microarrays, a small number of selected genes may be used as biomarkers
for tumor classification, or may have some relevance in tumor
development and serve as a potential drug target. In this report we also
show that stepwise Fisher's linear discriminant function is a
practicable method for gene expression-based tumor classification.
methods for gene expression-based tumor classification
Xiong M, Jin L, Li W,
Gene expression profiles may offer more or additional information than
classic morphologic- and histologic-based tumor classification systems.
Because the number of tissue samples examined is usually much smaller
than the number of genes examined, efficient data reduction and analysis
methods are critical. In this report, we propose a principal component
and discriminant analysis method of tumor classification using gene
expression profile data. Expression of 2000 genes in 40 tumor and 22
normal colon tissue samples is used to examine the feasibility of gene
expression-based tumor classification systems. Using this method, the
percentage of correctly classified normal and tumor tissue was 87.0%.
The combined approach using principal components and discriminant
analysis provided superior sensitivity and specificity compared to an
approach using simple differences in the expression levels of individual
Li Wu Ju, Lei Hong Xing, Pei
Wu Hong and Wu Jia Jin
1998, vol.14: 884-885.
Based on the mathematical model of high-level expression of heterologous
genes in prokaryotic vector pBV220, we developed a program GeneDn for
high-level expression design of natural and synthetic genes.
AVAILIBILITY: The program is written in Turbo Pascal 7.0. The source
code and related material are available upon request.
Full Text Download:
Li Wuju and Wu Jiajin
1998, vol.14: 700-706.
RNAs play an important role in many biological processes and knowing
their structure is important in understanding their function. Due to
difficulties in the experimental determination of RNA secondary
structure, the methods of theoretical prediction for known sequences are
often used. Although many different algorithms for such predictions have
been developed, this problem has not yet been solved. It is thus
necessary to develop new methods for predicting RNA secondary structure.
The most-used at present is Zuker's algorithm which can be used to
determine the minimum free energy secondary structure. However many RNA
secondary structures verified by experiments are not consistent with the
minimum free energy secondary structures. In order to solve this
problem, a method used to search a group of secondary structures whose
free energy is close to the global minimum free energy was developed by
Zuker in 1989. When considering a group of secondary structures, if
there is no experimental data, we cannot tell which one is better than
the others. This case also occurs in combinatorial and heuristic
methods. These two kinds of methods have several weaknesses. Here we
show how the central limit theorem can be used to solve these problems.
RESULTS: An algorithm
for predicting RNA secondary structure based on helical regions
distribution is presented, which can be used to find the most probable
secondary structure for a given RNA sequence. It consists of three
steps. First, list all possible helical regions. Second, according to
central limit theorem, estimate the occurrence probability of every
helical region based on the Monte Carlo simulation. Third, add the
helical region with the biggest probability to the current structure and
eliminate the helical regions incompatible with the current structure.
The above processes can be repeated until no more helical regions can be
added. Take the current structure as the final RNA secondary structure.
In order to demonstrate the confidence of the program, a test on three
RNA sequences: tRNAPhe, Pre-tRNATyr, and Tetrahymena ribosomal RNA
intervening sequence, is performed.
AVAILABILITY: The program is written in Turbo Pascal 7.0. The source
code is available upon request.
Full Text Download: