Background
CAGE captures the 5′ ends of capped RNAs to profile transcription start sites (TSSs) at single‑base resolution[1][2][3][4][5]. This implementation maintains the cap‑trapper chemistry and single strand cDNA library construction, while adding UDI adapters and a double strand conversion/size‑selection step to support patterned flow cells and paired‑end sequencing[6].
Workflow Overview

Figure 1: Workflow overview of the Direct cDNA CAGE.[6]
Materials
- Total RNA (3–5 µg per library; A260/230 > 1.8; A260/280 > 1.8; RIN > 7)
- Reverse transcriptase (SuperScript III or equivalent)
- Random N6+TCT primer
- Agencourt RNA Clean XP and AMPure XP magnetic beads
- Sodium periodate (NaIO₄), biotin (long‑arm) hydrazide
- RNase I, RNase H
- Streptavidin magnetic beads
- UDI adapters (5′ adapter with i5, 3′ adapter with i7) compatible with Illumina paired‑end chemistry
- LongAmp Taq (or equivalent) for second‑strand conversion
- Exonuclease I
- SPRIselect beads and standard buffers
- QA/QC reagents (KAPA qPCR, fragment analyzer)
Notes: Select UDI sets validated for patterned flow cells. Maintain RNase/DNase‑free conditions throughout.
Procedure
Bench‑top
A. RNA Preparation & QC (∼1 day)
- Extract total RNA using your preferred method (e.g., Maxwell, column or phenol/chloroform).
- Quantify and assess integrity (Nanodrop/Qubit; BioAnalyzer/TapeStation).
- Proceed only if A260/230 > 1.8, A260/280 > 1.8, and RIN > 7.
B. First-Strand cDNA Synthesis
- Prime 3–5 µg RNA with Random N6+TCT primer; synthesize cDNA (SuperScript III).
- Cleanup to obtain RNA/ss-cDNA hybrids.
C. Cap Trapping
- Oxidize RNA cap with NaIO₄; biotinylate with long-arm biotin hydrazide.
- Digest single-stranded RNA with RNase I.
- Capture biotinylated RNA/cDNA hybrids on streptavidin beads; wash thoroughly.
- Treat with RNase H and RNase I; recover ss-cDNA.
D. UDI Adapter Ligations
- Ligate 5′ adapter harboring a unique i5 index to ss-cDNA; wash to remove unbound adapters.
- Ligate 3′ adapter harboring a unique i7 index; wash to remove unbound adapters.
E. ds-Conversion & Cleanup
- Perform second-strand synthesis (polymerase-mediated) to overcome adapter-dimer competition on patterned flow cells.
- Digest residual primers/ss-DNA with Exonuclease I.
- Wash and perform SPRIselect size selection to remove adapter dimers and enrich library fragments.
F. Library QC
- Quantify libraries (e.g., KAPA qPCR) and assess size distribution (fragment analyzer).
- Pool 9–12 libraries per run as a starting point; adjust by platform/read length and target depth.
Sequencing (patterned flow cells)
- Platform example: NextSeq 2000 (P2 flow cell).
- Mandatory read configuration: paired‑end, e.g., 2×50 bp or 2×100 bp.
- Use unique dual indexes (UDI) (both i5 and i7 unique per sample) to minimize index hopping and ensure strict demultiplexing.
- Save demultiplexed FASTQs (R1/R2; I1 = i7, I2 = i5).
Typical yield: 350–500 M reads/run for a pool of 9–12 libraries (∼30–45 M reads/library), depending on kit and flow cell.
Data Processing[7]
Use the dscage‑pe2 pipeline[7] (Docker/Singularity available) for paired‑end direct cDNA CAGE. The pipeline performs quality control, mapping (hg38/mm10 default, more can be added if necessary*), and CTSS calling suitable for TSS/enhancer analyses.
- Inputs: demultiplexed paired‑end FASTQs (R1/R2)
- Outputs: mapped reads, CTSSs, QC reports, and files prepared for downstream CAGE analysis/visualization.
*Extend to other genomes by adding corresponding annotations to the pipeline configuration.
Notes & Troubleshooting
- Index hopping: Always use unique i5+i7 combinations (UDI). Non‑unique dual indexes are not recommended on patterned flow cells.
- Low input: If ≤ 3 µg RNA is unavoidable, consider LQ‑ssCAGE[8] adapted by retaining the dual indexes and ds‑conversion step from this protocol before sequencing on patterned flow cells.
- Pooling: For NextSeq 2000 P2, avoid pooling >12 libraries unless using longer kits/flow cells (e.g., P3) to maintain depth.
Compatibility
- Sequencers: NextSeq 1000/2000, NovaSeq series (patterned flow cells).
- Indexing: UDI plates/sets (96×96 commonly used) with matched i7/i5 pairs.
- Read mode: Paired‑end only (pipeline assumes PE).
Safety
Follow institutional guidelines for handling hazardous chemicals (e.g., sodium periodate) and RNase‑free techniques.
References
- ^ Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat Methods 3, 211-22 (2006), doi: 10.1038/nmeth0306-211
- ^ Takahashi, H. et al. 5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat Protoc. 7, 542-561 (2012), doi: 10.1038/nprot.2012.005
- ^ Kanamori‑Katayama, M. et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 21(7), 1150-1159 (2011), doi: 10.1101/gr.115469.110
- ^ FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507(7493), 462-470 (2014), doi: 10.1038/nature13182
- ^ Murata, M. et al. Detecting expressed genes using CAGE. Methods Mol Biol. 1164, 67-85 (2014), doi: 10.1007/978-1-4939-0805-9_7
- ^ Delobel, D. et al. Protocol for direct cDNA cap analysis of gene expression for paired-end patterned flow cell sequencing. STAR Protocols. (2025), doi: 10.1016/j.xpro.2024.103594
- ^ Github: dscage-pe2
- ^ Takahashi, H. et al. Low Quantity Single Strand CAGE (LQ-ssCAGE) Maps Regulatory Enhancers and Promoters. Methods Mol Biol. 2351, 67-90 (2021), doi: 10.1007/978-1-0716-1597-3_4