- How to download InParanoid?
- What is InParanoid?
- How does it work? How does InParanoid detect orthologs?
- Why should I use it?
- How can I search for orthologs of my favorite protein?
- What are "Clusters", "Inparalog scores" and "bootstrapping"?
- How can I cite InParanoid?
1. How to download InParanoid?
The InParanoid program is available here
2. What is InParanoid?
The InParanoid program was developed at the Center for Genomics and Bioinformatics to address the need to identify orthologs. Homologs that originate from a speciation event are called orthologs and homologs that originate from a gene duplication event are called paralogs. If a duplication event predates the speciation event the parlogs are called outparalogs, and they can be present in different species. If instead an ortholog undegoes one or several duplication events, the resulting paralogs are called inparalogs, and they are co-orthologs to one or more orthologs in another species. Since an outparalog pair ought to have a more diversified function than inparalogs, it is useful to distinguish between the two. Furthermore, clustering inparalogs together allows proper identification of both one-to-one and many-to-many orthology cases. More in-depth information on this subject, the InParanoid program and its applications have been previously published.
3. How does it work? How does InParanoid detect orthologs?
The InParanoid program uses the pairwise similarity scores, calculated using NCBI-Blast, between two complete proteomes for constructing orthology groups. An orthology group is initially composed of two so-called seed orthologs that are found by two-way best hits between two proteomes. More sequences are added to the group if there are sequences in the two proteomes that are closer to the correpsonding seed ortholog than to any sequence in the other proteome. These members of an orthology group are called inparalogs. A confidence value is provided for each inparalog that shows how closely related it is to its seed ortholog.
4. Why should I use it?
By definition orthologs between two species have evolved from one single gene in their common ancestor. Thus, orthologs are likely to have the same function in both species. Another way to detect orthologs would be from phylogenetic trees. This is widely used for single gene families, but these are slow and difficult to automate. Morover, the preliminary steps - like clustering genes into homologous families and creation of multiple alignments are needed. Also the topology of the phylogenetic tree is strongly dependent on choice of tree building method.
Automatic clustering methods based on two-way best genome-wide matches on the other hand, have so far not effectively separated in-paralogs from out-paralogs. The problem of in-paralog clustering is more important for analyzing eukaryotic genomes. Eukaryotic genes form large homologous families that cannot be classified by simple best-best hit methods. InParanoid is a fully automatic method for finding orthologs and in-paralogs between TWO species. Ortholog clusters in the InParanoid are seeded with a two-way best pairwise match, after which an algorithm for adding in-paralogs is applied. The method bypasses multiple alignments and phylogenetic trees, which can be slow and error-prone steps in classical ortholog detection. Still, it robustly detects complex orthologous relationships and assigns confidence values for in-paralogs.
5. How can I search for orthologs of my favorite protein?
If you are interested in a specific protein, you can search by gene identifier, protein identifier, or by a Blast search against our protein dataset.
6. What are Clusters, Inparalog scores and bootstrapping?
An InParanoid cluster is an ortholog group. It is seeded by a reciprocally bestmatching ortholog pair, around which inparalogs are gathered independently, while outparalogs are excluded. Here, seed-ortholog pair refers to the two seed members that are orthologous to each other, around which their inparalogs are clustered. Each is referred to the seed-inparalog when comparing against inparalogs in its own genome. Each member of the cluster receives an inparalog score, which reflects the relative distance to the seed-inparalog (1.0=identical to the seed-inparalog; 0.0=of equal distance to the seed-inparalog as the distance between the seed-ortholog pair). The confidence that the original seed-ortholog pair are true orthologs is estimated by sampling how often the pair is found as reciprocally best matches by a bootstrapping procedure. Bootstrap values were generated by counting how many times the seed-pair genes were each others best match in a sampling with replacement procedure that was applied to the original Blast alignment. In summary, an InParanoid ortholog cluster contains a seed-ortholog pair with bootstrap confidence values, and a list of inparalogs with inparalog scores.
7. How can I cite InParanoid?
"InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic"
Erik L.L.Sonnhammer and Gabriel Östlund
Nucleic Acids Res. 43:D234-D239 (2015)
"InParanoid 7: new algorithms and tools for eukaryotic orthology analysis"
Ostlund G, Schmitt T, Forslund K, Kostler T, Messina DN, Roopra S, Frings O and Sonnhammer ELL
Nucleic Acids Res. 38:D196-D203 (2010)
[PDF]
"InParanoid 6: eukaryotic ortholog clusters with inparalogs"
Berglund AC, Sjolund E, Ostlund G and Sonnhammer ELL
Nucleic Acids Res. 36:D263-266 (2008)
[PDF]
"InParanoid: A Comprehensive Database of Eukaryotic Orthologs"
O'Brien Kevin P, Remm Maido and Sonnhammer Erik L.L
Nucleic Acids Res. 33:D476-D480 (2005)
[PDF]
"Automatic clustering of orthologs and in-paralogs from pairwise species comparisons"
Maido Remm, Christian E. V. Storm, and Erik L. L. Sonnhammer
J. Mol. Biol. 314:1041-1052 (2001)
[PDF]