The Pairwise Conservation Scores - An Algorithm to Identify Conserved K-mers

The program is designed by Jin Gu. This page has been updated at Oct. 5, 2008

Application 1. Identification of conserved 7-mers in 3'-UTRs and microRNAs in Drosophila (BMC Bioinformatics 2007, 8:432)

Online Supplementary materials

1) The predicted pre-miRNAs and corresponding miRNAs using triplet-svm (Download)

2) The predicted pre-miRNAs and corresponding miRNAs using RNAmicro (Download)

Application 2. Identification of de novo motifs in conserved regulatory regions (in progress)


News: PCS-1.5 is released!! A few bugs have been fixed for the shuffling script. (Oct. 5, 2008)


Version 1.0 (Release Nov. 11, 2006)

The perl script to compute PCSs. (Download)
Usage: perl pcs-1.0.pl -b {base_seq} -a {align_seq} [-k k] [-s 0|1|2]
Sample input files:
  3'-UTRs sequences of D. melanogaster
  Algined 3'-UTRs sequences of D. melanogaster and D. pseudoobscura
  Notice: {base_seq} file must in single line fasta format!!

 

Version 1.1 (Release Dec. 17, 2006)

new PCS program can compute the PCSs of gapped and degenerate k-mers. (Download)

Usage: perl pcs-1.1.pl -b <base_seq.fa> -a <align_seq.net.axt> -o <outputfile> [-k k] [-s 0|1|2] [-d 0|1|2|3] [-m mode]
  -s default:0
      0: only scan the positive strand
      1: only scan the negative strand
      2: scan both strands
  -d which degenerate codes will be used, default 0
      0: A,C,G,T
      1: R(AG),Y(CT),M(AC),K(GT),S(GC),W(AT)
      2: B(TGC),D(ATG),H(ATC),V(AGC)
      3: N(ACGT)
  -m the mode for searching space of motifs, default NNNNNNN
      "N" means a nucleotide, defined by the degenerate codes
      "-" means a gap

New sample files:

  The promoter sequences of D. melanogaster (-1000~+500 at TSSs)

  Aligned promoter sequences of D. melanogaster and D. pseudoobscura (-1000~+500 at TSSs)

 

Version 1.3 (Release May 17, 2007)

Gapped-PCS program, computing the gapped k-mers and multiple k-mer modes described in an assigned mode file. (Download)

Usage: perl pcs.pl -b <base_seq.fa> -a <align_seq.net.axt> [-k k] [-s 0|1|2] -m <modefile>
  -k default:6
      The length of k-mer used in computation.
  -s default:0
      0: only scan the positive strand
      1: only scan the negative strand
      2: scan both strands

The format of mode files:

  Each line in the mode file describes a specific gapped mode which should be computed in the program.

  The mode may contain one parameter "-1", which means non-gapped k-mers,

    or two parameters "gap position (0-based position), the number of gaps".

    "2,3" means inserted three gaps (the second parameter) after the third nucleotide (the first parameter): NNN---NNN

    "2,1": NNN-NNN

  Example mode file: (Download)

 

Version 1.4 (Release Jun. 26, 2007)

Revised Gapped-PCS program, computing the gapped k-mers and multiple k-mer modes described in an assigned mode file. (Download)

Usage: perl pcs.pl -i <input_align_seq.net.axt> [-k k] [-s 0|1|2] -m <modefile>
  -k default:6
      The length of k-mer used in computation.
  -s default:0
      0: only scan the positive strand
      1: only scan the negative strand
      2: scan both strands

The format of mode files:

  Each line in the mode file describes a specific gapped mode which should be computed in the program.

  The mode may contain one parameter "-1", which means non-gapped k-mers,

    or two parameters "gap position (0-based position), the number of gaps".

    "2,3" means inserted three gaps (the second parameter) after the third nucleotide (the first parameter): NNN---NNN

    "2,1": NNN-NNN

  Example mode file: (Download)

  Example input pairwise alignment: (Download)

A tool (additional tool 2) to evaluate the significance of conservation by shuffling input sequences has been provided.

 

Version 1.5 (Release Oct. 5, 2008)

Revised Gapped-PCS program, computing the gapped k-mers and multiple k-mer modes described in an assigned mode file. (Download)

Usage: perl pcs-1.5.pl -i <input_align_seq.net.axt> [-k k] [-s 0|1|2] -c [0|1] -m <modefile>
  -k default:6
      The length of k-mer used in computation.
  -s default:0
      0: only scan the positive strand
      1: only scan the negative strand
      2: scan both strands

  -c default:0

      0: case-insensitive (the lower-case letters will be included)

      1: case-sensitive (the lower-case letters in the sequence will be masked)

The format of mode files:

  Each line in the mode file describes a specific gapped mode which should be computed in the program.

  The mode may contain one parameter "-1", which means non-gapped k-mers,

    or two parameters "gap position (0-based position), the number of gaps".

    "2,3" means inserted three gaps (the second parameter) after the third nucleotide (the first parameter): NNN---NNN

    "2,1": NNN-NNN

  Example mode file: (Download)

  Example input pairwise alignment: (Download)

A tool (additional tool 2, version 1.5) to evaluate the significance of conservation by shuffling input sequences has been provided.


Additional Tool 1 (Release May 21, 2007)

The program which can cluster identified conserved motifs according to sequence similarity. (Download)

Usage: perl pcs_gap_clust.pl <pcs_result_file>

Additional Tool 2 (Release Jun. 26, 2007, Updated Oct. 5, 2008)

The program which can evaluate the significance of conservation by shuffling input sequences. (Download)

Latest version 1.5 (pcs-1.5.pl is required) (Download)

    perl pcs_shuffle_evaluate-1.5.pl <input_align_file> <mode_file> <strand, 0|1|2> <length of kmer> <time of shuffling> <case> <outfile>

    perl pcs_shuffle_evaluate-1.5.pl dm2droVir1_3utr_070113.fa mode_test 0 7 10 0 pcs.out

A third-party Perl script is needed for the computation. (Download)

Washietl S: Alifoldz algorithm [http://www.tbi.univie.ac.at/papers/SUPPLEMENTS/Alifoldz/]

***Warning: you should cite the Washietl's work if you use this tool***

Usage: please put the two Perl scripts at the same directory with the pcs-1.4.pl (lower version is never allowed).

    perl pcs_shuffle_evaluate.pl <input_align_file> <mode_file> <strand, 0|1|2> <length of kmer> <time of shuffling> <outfile>

    All options must be given.

    perl pcs_shuffle_evaluate.pl dm2droVir1_3utr_070113.fa mode_test 0 7 10 pcs.out