Which is the Phage receptor?
This is my question... but before some information about Smith–Waterman algorithm.
The Smith-Waterman algorithm is a dynamic programming method for determining similarity between nucleotide or protein sequences.
From NCBI I find now this Cluster (Cluster VOGp0048 Uncharacterized). I have found by Bug View this protein Cluster before without knowing while I was finding the Phage receptor.
My intuition here is the comparison between each gene of this cluster with M.ulcerans genome.
By BugView I have compared the genome of M.ulcerans with each mycobacterium phage genome present in the list.
First comparison:D29 phage genome
The first comparison is with D29 phage genome:
1- this is the result of Local (Smith-Waterman) comparison between the whole D29 sequence and Proteins of Mycobacterium ulcerans Agy99:
By BugView:
4208 comparisons for user's query sequence and Mycobacterium ulcerans Agy99. Finished in 4107 seconds.
Highest score = 1581 for cds_3629 (118619508)
Second score = 1141 for cds_2549 (118618448)
Third score = 1044 for cds_3641 (118619519)
2-I have compared ,one by one, the indicated M.ulcerans cds with the D29 genome:
1°cds_3629
PE- PGRS proteins family (Mycobacterium ulcerans Agy99)
Other Aliases: MUL_4367
Annotation: NC_008611.1 (4845511..4848744, complement)
GeneID: 4550406
PE and PPE proteins family
The PE family of proteins all contain an amino-terminal region of about 110 amino acids.
The carboxyl terminus of this family are variable and fall into several classes. The largest class of PE proteins is the highly repetitive PGRS class which have a high glycine content. The function of these proteins is uncertain but it has been suggested that they may be related to antigenic variation of Mycobacterium tuberculosis.
These glycine-rich gene families code for PE and PPE proteins.
PE family has two subfamilies, PE and PE PGRS.
All the 99 members of PE family have a highly conserved N-terminal domain of 110 amino acid residues, whereas the C-terminal domain show marked heterogeneity, showing variation in size, sequence and repeat copy numbers.
The members of PE PGRS subfamily have a polyglycine-rich sequence at the C-terminus, along with the conserved amino terminus.
The C-terminal extension is characterized by the presence of multiple tandem repetitions of Gly–Gly–Ala or Gly–Gly–Asn encoded by PGRS motif.
The PPE family consists of 68 members and has a conserved N-terminal domain of 180 amino acid residues with varying carboxy terminal domains. The polymorphism of these two gene families is the major source of variation in M. tuberculosis complex in an otherwise genetically homogeneous bacterium[. Though the sub cellular localization of these proteins is still a mystery, a few of PE PGRS proteins have been considered as possible virulence factors in M. marinum , and some are cell surface constituents, involved in interaction of mycobacteria and macrophage.
By BugView:
84 comparisons for cds_3629 (118619508)
Finished in 1 seconds.
Highest score = 244 for cds_34 (9630416)
Second score = 126 for cds_30 (9630412)
Third score = 90 for cds_13 (9630395)
cds_34
gp32 [Mycobacterium phage D29]
Other Aliases: D29p30
Annotation: NC_001900.1 (24421..25092)
GeneID: 1261559
Accession................... GeneID......
NP_046848.1 | 1261559 |
Why the relation between gp32 and collagen protein?
I have compared , the indicated cds 34 with the M.ulcerans genome:
By BugView:
4208 comparisons for cds_34 (9630416)
Finished in 37 seconds.
Highest score = 263 for cds_2549 (118618448)
Second score = 258 for cds_3048 (118618940)
Third score = 258 for cds_3654 (118619532)
By BugView:
2°cds_2549
84 comparisons for cds_2549 (118618448)
Finished in 1 seconds.
Highest score = 263 for cds_34 (9630416)
Second score = 133 for cds_12 (9630394)
Third score = 112 for cds_30 (9630412)
3°cds_3641
84 comparisons for cds_3641 (118619519)
Finished in 1 seconds.
Highest score = 223 for cds_34 (9630416)
Second score = 111 for cds_11 (9630393)
Third score = 97 for cds_30 (9630412)