How many bps would be transcribed when we approved aggressive strategy?

	   	       			  	  	      Jan 25, 2006


To estimate the transcribed regions aggressively, we prepared
additional results includes multiple maps (two or more equally best
mapping) and longest transcribed regions.

Notice: These mapping results are different from Science dataset.


* GIS & GSC

 gis_longest.txt; GIS
 gsc_longest.txt; GSC

When one end was mapped into only one site but the other one was
mapped into multiple sites, in conservative way, we should chose
shortest one. However, here we chose "longest" (but less than 2.5M)
strategy.

     - tag_sequence
     - chr
     - start
     - stop


* mRNA (full-length sequenced) & EST (end sequenced)

 mRNA_all_best.txt; FANTOM3 & GenBank mRNAs
 EST_all_best.txt;  RIKEN & GenBank ESTs

There are two extra columns from psl-style format.

     - best; number of equally best-scoring mappings for this sequence;
             always 1 or 2 (mappings are not included in the file if >2 best)
     - id; a unique numeric id for each mapping


* CAGE

 cage_multiple_best.txt; CAGE

There are only multiple mapped tags in this dataset.

    - rep_tag_id
    - strand
    - chr
    - genome_start_pos
    - genome_stop_pos
    - tag_start_pos
    - tag_stop_pos
    - identity_value
    - hits_no
    - total_hits


* Forrest, in aggressive case

 forrest.txt

Using these datasets, 1,772,938,427 bps are transcribed; when we excluded gap regions, 1,717,998,114 bps, about 69.0% in mm5.

    - chr
    - start
    - stop


** Contributors
  Akira Hasegawa, Par Engstrom and Shintaro Katayama