Gene Families

Two methods were used to cluster proteins into families. The gene set used does not contain the genes classified as Transposable Elements (TEs).

TribeMCL clusters families by first running an all-versus-all BLAS T search and parsing the output through matrices to generate the families. The ten largest families include:

Family ID Num. Family Members Family Name
1 861 Receptor-like protein kinase
2 840 Disease resistance-like protein
3 732 Unknown protein/Cytochrome P450
4 564 Leucine Rich Repeat family protein
5 335 Helicase-like protein
6 331 Cytochrome P450
7 325 NBS-LRR disease resistance protein
8 317 Pentatricopeptide repeat-containing protein
9 283 Pentatricopeptide repeat-containing protein
10 237 1-aminocyclopropane-1-carboxylate oxidase

The JCVI Paralogous Families pipeline clusters proteins into families based on domain composition. Domains are first identified by HMM search, and then by BLAST homology. The family members all contain the same domain architecture.

Family ID Num. Family Members Family Name
1 562 Unknown protein
2 304 Receptor-like protein kinase
3 247 Unknown protein
4 168 F-box/kelch-repeat protein
5 155 Helicase-like protein
6 150 Unknown protein
7 150 CCP
8 132 Unknown protein
9 128 Pentatricopeptide repeat-containing protein
10 121 Pentatricopeptide repeat-containing protein