On occurrences of indels and their important roles in protein functions

DWQA QuestionsCategory: QuestionsOn occurrences of indels and their important roles in protein functions
Dustin Montgomery asked 3 days ago

On occurrences of indels and their important roles in protein functions, currently there are no bioinformatics resources that archive structural and sequence information on indel sites derived from sequence alignments of similar proteins. Although early studies have shown us some common features shared by indels in limited datasets [10-15], our understanding of indels can be improved by utilizing the large amount of structural data, as accumulated in Protein Data Bank [16]. Thus we present here, Indel PDB, a structural database of insertion and deletion sites, extracted from aligned protein sequences in PDB. The goal of Indel PDB is to provide a resource of indel 3D structures, which enable various bioinformatics analyses including primary sequence composition, secondary structure assignment, solvent accessibility, length distribution, protein domain association, homology modeling and other comprehensive structural studies. Some of such applications from Indel PDB have been performed and reported in this paper. Indel PDB is different from PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/15501003 loop databases, whose scope is limited to protein loops that lack clear secondary structures [17,18]. For instance in ArchDB [18], which represents one of the most comprehensive loop databases available on the internet, loops are defined as regions that connect the regular secondary structures, extracted from 9587 protein structures. ArchDB classified a total of 58,664 loops (ArchDB95, 13-6-2007) based on their structural similarity with respect to the surrounding secondary structures. On the other hand, Indel PDB is not limited to loops, but includes all possible gaps (insertions or deletions) present in sequence alignments among closely related proteins in PDB, and therefore such indel sites can possess any possi-ble secondary structures. Although some overlap between Indel PDB Capecitabine and loop databases is expected, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. In fact, our analyses have demonstrated that many indels had recognizable 2D structures, in contrast to previous studies that showed most indels had undefined structures and loops [13]. To further distinguish between indels and loops, their differences have been investigated in three aspects: sequence composition, length distribution, and solvent accessibility. In addition, Indel PDB contains a larger structural database in comparison to ArchDB. Indel PDB is consisted of 117,266 non-redundant indel structures extracted from 11,294 indel-containing proteins. Both the indel structural data and the analysis results are freely accessible through the Indel PDB website [19]. We believe data presented in Indel PDB will not only enable future functional studies of indels, but also facilitate protein modeling of indels and the identification of novel drug binding sites against infectious diseases. Thus, potential users of Indel PDB include 1) molecular biologists who wish to study the functions of particular indel sites by integrating information on protein domains, 2) structural biologists who wish to improve protein homology models or to perform a comprehensive indel structural analysis based on the available indel 3D coordinates, and 3) computational chemists who are searching for potential compound-binding sites of new drug leads by the use of a comprehensive indel search engine available at the Indel PDB website.Construction and contentConstruct Indel PDB Building Indel PDB.