paresnip - the uea small rna workbench
TRANSCRIPT
PAREsnip User Guide Page 1
P
PAREsnip User Guide A tool for rapid genome-wide discovery of small RNA/target interactions evidenced through degradome sequencing
Guide
Version
2.3
Leighton Folkes, Simon Moxon, Hugh Woolfenden, Mathew
Stocks, Gyorgy Szittya, Tamas Dalmay, Vincent Moulton.
06/03/2011
PAREsnip User Guide Page 2
Contents Contents ............................................................................................................................................... 2
Introduction .......................................................................................................................................... 3
System Requirements .......................................................................................................................... 3
Launching PAREsnip (GUI) .................................................................................................................. 3
Loading Data ....................................................................................................................................... 4
PAREsnip Parameters .......................................................................................................................... 5
Description of Parameters .................................................................................................................. 6
Start an Analysis ................................................................................................................................... 7
Viewing Results ..................................................................................................................................... 8
T-Plots (VisSR) ........................................................................................................................................ 9
Saving Results ....................................................................................................................................... 9
Command Line Operation ............................................................................................................... 10
Appendix 1 ......................................................................................................................................... 11
Glossary ............................................................................................................................................... 11
PAREsnip User Guide Page 3
Introduction Next generation sequencing has become a de-facto standard for the analysis of small RNA (sRNA)
samples in recent years. Typically, a single experiment will produce millions of sRNA reads (the
sRNAome) thereby capturing a snapshot of an organism's sRNA expression profile. An essential first
step in understanding their function is to confidently identify sRNA targets. In plants several classes of
sRNAs have been shown to bind with near-perfect complementarity to their messenger RNA (mRNA)
targets, generally leading to cleavage of the mRNA. Recently, a high-throughput technique known
as Parallel Analysis of RNA Ends (PARE) has made it possible to sequence mRNA cleavage products
on a large-scale. A library of cleavage products obtained using the PARE technique (the
degradome) can be used to provide evidence of the interactions between sRNAs and their
complementary mRNA targets. PAREsnip takes the sRNAome, degradome and an mRNA dataset (or
transcriptome) as input, and outputs the potential sRNA target duplexes evidenced through the
degradome data.
System Requirements
Required:
Java SE 6. (available from http://java.com/en/download/manual.jsp)
Recommended:
Intel i7 CPU. 16Gb RAM.
Launching PAREsnip (GUI)
PAREsnip is a tool within the UEA sRNA Workbench. In order to run
the sRNA Workbench in GUI mode, simply extract all the files from
your downloaded zip archive to a new directory, then launch the
sRNAWorkbenchStartup.jar. This will scan your system for available
memory and allocate a portion of it to the Workbench. You will be
presented with the main UEA sRNA Workbench window. Go to
Tools->PAREsnip to launch the tool.
Note: If you would prefer to use the command line, please see section
Command Line Operation.
Click to
launch
PAREsnip User Guide Page 4
Loading Data
The inputs for PAREsnip are:
mRNA dataset (transcriptome),
transcript degradation fragments obtained
from a PARE experiment (degradome),
small RNA dataset (sRNAome), and
genome.
The first three inputs are required but the genome is optional. When included, sRNAs not mapping
exactly to the genome sequence are discarded. All of the inputs must be in FASTA format and must
only contain the characters ‘A’, ‘C’, ‘G’, ‘T’ and ‘U’. Sequences containing unknown characters
and ambiguity codes are discarded as they cannot be accurately aligned.
To load the input data please go to File->Open.
Clicking on open will show the Choose Data dialog where you
may browse your computer and select your input files. It is
important to tell PARESnip which files contain the transcripts, the
degradome, the small RNAs and the genome. This is done by
making sure the correct radio button on the left hand side is
selected:
Example FASTA format:
>"unique identifier"
AGCTAGCTAGCTAGCTACTACCC
>"unique identifier"
AAGCTTAGCTTTACGTTAGTTAC
PAREsnip User Guide Page 5
PAREsnip Parameters
The parameter settings are user configurable within the Parameters Browser. To adjust the
parameters, ensure PARESnip is the top most window within the Workbench. Go to Tools->Show
Parameter Browser.
The Parameters Browser shows the possible settings for PARESnip. Each parameter is labeled in the
illustration below (1a - 7b). There are two default parameter settings which can be used according
to the type of analysis required. The first default setting is for a low throughput analysis. This uses more
liberal settings and is useful for finding targets of a small number sRNAs, such as known miRNAs for a
particular organism. The second default setting is for high throughput analysis. This uses tighter
parameters and is useful for finding targets of a large number of sRNAs, such as millions of sRNAs
obtained from a high-throughput sequencing experiment.
1a
2b
2c
3a
3b
3d
3f
4a
4b
4c
4d
4f
5a
5b
5c
5d
6a
6b
6d
3c
3e
4e
4g
6c
7b
7a
1b
2a
Parameters
Browser
PAREsnip User Guide Page 6
Description of Parameters
The following list describes the parameters within the Parameters Browser:
1a) Parameter settings may be saved and reloaded at a later date using these buttons.
1b) The stringency of the search parameters may be quickly changed from low stringency to
high stringency using these buttons. See appendix 1.
2a) Targets will not be sought for any sRNA with a raw abundance less than the specified
minimum.
2b) One sRNA sequence which hits a transcript cleavage site may also have several sRNA
subsequences. A subsequence has the same nucleotide composition as its longer parent
sequence, but has one or fewer nucleotides (3' end). For example:
Parent sequence: AGCTAGCTAGCTAGCTAGCTAGCT
Subsequence 1: AGCTAGCTAGCTAGCTAGCTAGC
Subsequence 2: AGCTAGCTAGCTAGCTAGCTAG
Subsequence 3: AGCTAGCTAGCTAGCTAGCTA
As subsequence 1, 2 and 3 can target the same cleavage site as the parent sequence, we
call the subsequences secondary hits. It is not always required to independently report
secondary hits as this unnecessarily increases the number of results. Ticking this check-box
ensures subsequences of hits are recorded as secondary hits, but not reported within the
main output.
2c) Secondary hits (2b) are recorded and may be output to file using the File->Save/Save As
menu item.
3a) Categories can be calculated using either raw degradation fragment abundance or
weighted degradation fragment abundance. Weighted fragment abundance is the raw
fragment abundance divided by the number of times the fragment aligned to the
transcriptome. Ticking this check-box tells PARESnip to calculate categories using weighted
fragment abundance.
3b) Peaks identified as category 0 will be included as potential sRNA cleavage sites.
3c) Peaks identified as category 1 will be included as potential sRNA cleavage sites.
3d) Peaks identified as category 2 will be included as potential sRNA cleavage sites.
3e) Peaks identified as category 3 will be included as potential sRNA cleavage sites.
3f) Peaks identified as category 4 will be included as potential sRNA cleavage sites.
4a) Any sRNA which has a full length match to t/rRNA will be discarded from the search.
4b) Candidate targets are removed if of low complexity. A low complexity sequence contains 2
or fewer unique nucleotides.
PAREsnip User Guide Page 7
4c) Sequences within the sRNAome are discarded if the composition of the sequence is of low
complexity. A low complexity sequence contains 2 or fewer unique nucleotides.
4d) Any degradation fragment having fewer nucleotides that this threshold will be discarded.
4e) Any degradation fragment having more nucleotides that this threshold will be discarded.
4f) Any sRNA having fewer nucleotides than this threshold will be discarded.
4g) Any sRNA having more nucleotides than this threshold will be discarded.
5a) A sRNA/target duplex may contain a single nucleotide gap.
5b) A sRNA/target duplex may contain a mismatch at position 11 (5' sRNA).
5c) A sRNA/target duplex may contain more than 2 adjacent mismatches after position 12 (3'
sRNA).
5d) The maximum number of mismatches permitted within an sRNA/target duplex (G-U pairs =
0.5 mismatches).
6a) PARESnip will calculate p-values for each interaction reported.
6b) The number of dinucleotide shuffles to be used for p-value calculation.
6c) The p-value threshold. A p-value calculation will not continue past this threshold.
6d) Interactions with a p-value exceeding the threshold will not be reported.
7a) The number of CPU cores available to PARESnip.
7b) The number of threads PARESnip should use to perform the analysis. More threads reduce
the time taken to complete an analysis.
Start an Analysis
Three mandatory inputs must be provided. A transcriptome, sRNAome and degradome. Once the
required inputs are provided the Start button becomes activated. The user should review the settings
within the Parameter Browser (see PAREsnip Parameters ) and make any changes as required.
PAREsnip is now ready to perform the analysis. Click on Start.
or
PAREsnip User Guide Page 8
Viewing Results
Small RNA/target Interactions are shown within a table. The
table may be saved in CSV format. Additional information
and statistics for each of the input data sets is provided in
the information area (top left box). This information includes
the number of sequences and sequence length
distributions.
Messages are provided to guide you through the analysis
along with the estimated time remaining to complete
processing the data in the messages area (top right box)
Helpful hint:
The table columns can
be sorted by clicking
on the column header.
They can also be
picked-up and moved
around within the table
using drag and drop.
Statistics
relating to
the input
data set
Table of
results
Helpful
messages
PAREsnip User Guide Page 9
T-Plots (VisSR)
Once an analysis is complete, you may view the results in
the form of t-plots. To do this go to:
T-Plots->View t-plots in VisSR.
This will launch the VisSR tool and load the sRNA/target
interactions.
Saving Results
The table of results can be saved in CSV format by going to File->Save. T-Plots shown in VisSR can be
saved in PDF format by clicking on the save buttons:
PAREsnip User Guide Page 10
Command Line Operation
In order to execute the sRNA Workbench and PAREsnip from the command line, navigate to the
directory that you extracted the sRNA Workbench files to. Then type:
java -jar Workbench.jar -tool paresnip
If no options are entered, the usage instructions will be printed to the command line. An example of
a complete instruction is given below:
java -jar Workbench.jar [-verbose] -tool paresnip -srna_file <srna-file> -
deg_file <degradome-file-path> -tran_file <transcriptome-file-path> -out_file
<output-file> [-genome_file <genome-file>] [-params <params-file>]
Note: parameters in square brackets are optional.
The optional -params <params file> contains a list of parameters to be used by PAREsnip.
An example of the content and format of a parameters file:
Content: ("true" = on/yes) Description (see page 6):
min_sRNA_abundance=1 2a
subsequences_are_secondary_hits=false 2b
output_secondary_hits_to_file=false 2c
use_weighted_fragments_abundance=true 3a
category_0=true 3b
category_1=true 3c
category_2=true 3d
category_3=true 3e
category_4=true 3f
discard_tr_rna=true 4a
discard_low_complexity_srnas=true 4c
discard_low_complexity_candidates=true 4b
min_fragment_length=20 4d
max_fragment_length=21 4e
min_sRNA_length=19 4f
max_sRNA_length=24 4g
allow_single_nt_gap=true 5a
allow_mismatch_position_11=true 5b
allow_adjacent_mismatches=true 5c
max_mismatches=4.5 5d
calculate_pvalues=true 6a
number_of_shuffles=100 6b
pvalue_cutoff=0.05 6c
do_not_include_if_greater_than_cutoff=true 6d
number_of_threads=7 7b
auto_output_tplot_pdf=false <not available at present>
PAREsnip User Guide Page 11
Appendix 1 Low stringency:
Allow mismatches at position 11 (5’ of sRNA)
Allow 2 or more adjacent mismatches (3’ sRNA)
Allow 4.5 mismatches in total
High stringency:
Do not Allow mismatches at position 11 (5’ of sRNA)
Do not Allow 2 or more adjacent mismatches (3’ sRNA)
Allow 4.0 mismatches in total
Glossary
CPU: Central Processing Unit.
Degradome: A collection of degraded transcript fragments obtained from an high-throughput
sequencing experiment using the Parallel Analysis of RNA Ends (PARE) protocol.
miRNA: Micro RNA
sRNA (s): Small RNA(s).
SRNAome: A collection of sRNAs.
Thread: A unit of execution upon the CPU.
Transcriptome: A collection of transcripts.