Identify and plot CpG islands in nucleotide sequence(s).
CpG site
Description
CpGplot identifies CpG islands in one or more nucleotide sequences. The ratio of observered to expected number of GC dinucleotides patterns is calculated over a window (sequence region) which is moved along the sequence. The calculated ratios are plotted graphically, together with the regions which match this program's definition of a "CpG island" (a CG dinucleotide rich area). A report file is written giving the input sequence name, CpG island parameters and data on any CpG islands that are found.
The ratio of observered to expected number of GC dinucleotides patterns is calculated over a window of user-specified size (-window parameter). The window is slid along the sequence and the ratio recalculated until the end of the sequence is reached.
By default, CpGplot defines a CpG island as a region where, over an average of 10 windows and not less than 200 bases, the calculated (%G + %C) content is over 50% and the calculated Observed/Expected ratio is over 0.6. These conditions can be modified by setting the values of the appropriate parameters.
The Observed number of CpG patterns in a window is simply the number of times a 'C' is found followed immediately by a 'G'.
The Expected number of CpG patterns is calculated for each window as the number of CpG dinucleotides you would expect to see in that window based on the frequency of C's and G's in that window. Thus, the Expected frequency of CpG's in a window is calculated as the number of 'C's in the window multiplied by the number of 'G's in the window, divided by the window length.
Expected = (number of C's * number of G's) / window length
D29
CPGPLOT islands of unusual CG composition
D29\sequenza_fasta\NC_001900_d29.txt from 1 to 49176
Observed/Expected ratio > 0.60
Percent C + Percent G > 50.00
Length > 200
Length 1760 (256..2015)
Length 2484 (2030..4513)
Length 2360 (4566..6925)
Length 8631 (7010..15640)
Length 9625 (15697..25321)
Length 1232 (25393..26624)
Length 758 (26635..27392)
Length 3235 (27449..30683)
Length 4885 (30690..35574)
Length 1372 (35594..36965)
Length 1875 (37033..38907)
Length 522 (38915..39436)
Length 4649 (39438..44086)
Length 2705 (44096..46800)
Length 2314 (46807..49120)
L5
Bxz2
TM4