paresnip - the uea small rna workbench

11
PAREsnip User Guide A tool for rapid genome-wide discovery of small RNA/target interactions evidenced through degradome sequencing Guide Version 2.3 Leighton Folkes, Simon Moxon, Hugh Woolfenden, Mathew Stocks, Gyorgy Szittya, Tamas Dalmay, Vincent Moulton. 06/03/2011

Upload: others

Post on 23-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 1

P

PAREsnip User Guide A tool for rapid genome-wide discovery of small RNA/target interactions evidenced through degradome sequencing

Guide

Version

2.3

Leighton Folkes, Simon Moxon, Hugh Woolfenden, Mathew

Stocks, Gyorgy Szittya, Tamas Dalmay, Vincent Moulton.

06/03/2011

Page 2: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 2

Contents Contents ............................................................................................................................................... 2

Introduction .......................................................................................................................................... 3

System Requirements .......................................................................................................................... 3

Launching PAREsnip (GUI) .................................................................................................................. 3

Loading Data ....................................................................................................................................... 4

PAREsnip Parameters .......................................................................................................................... 5

Description of Parameters .................................................................................................................. 6

Start an Analysis ................................................................................................................................... 7

Viewing Results ..................................................................................................................................... 8

T-Plots (VisSR) ........................................................................................................................................ 9

Saving Results ....................................................................................................................................... 9

Command Line Operation ............................................................................................................... 10

Appendix 1 ......................................................................................................................................... 11

Glossary ............................................................................................................................................... 11

Page 3: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 3

Introduction Next generation sequencing has become a de-facto standard for the analysis of small RNA (sRNA)

samples in recent years. Typically, a single experiment will produce millions of sRNA reads (the

sRNAome) thereby capturing a snapshot of an organism's sRNA expression profile. An essential first

step in understanding their function is to confidently identify sRNA targets. In plants several classes of

sRNAs have been shown to bind with near-perfect complementarity to their messenger RNA (mRNA)

targets, generally leading to cleavage of the mRNA. Recently, a high-throughput technique known

as Parallel Analysis of RNA Ends (PARE) has made it possible to sequence mRNA cleavage products

on a large-scale. A library of cleavage products obtained using the PARE technique (the

degradome) can be used to provide evidence of the interactions between sRNAs and their

complementary mRNA targets. PAREsnip takes the sRNAome, degradome and an mRNA dataset (or

transcriptome) as input, and outputs the potential sRNA target duplexes evidenced through the

degradome data.

System Requirements

Required:

Java SE 6. (available from http://java.com/en/download/manual.jsp)

Recommended:

Intel i7 CPU. 16Gb RAM.

Launching PAREsnip (GUI)

PAREsnip is a tool within the UEA sRNA Workbench. In order to run

the sRNA Workbench in GUI mode, simply extract all the files from

your downloaded zip archive to a new directory, then launch the

sRNAWorkbenchStartup.jar. This will scan your system for available

memory and allocate a portion of it to the Workbench. You will be

presented with the main UEA sRNA Workbench window. Go to

Tools->PAREsnip to launch the tool.

Note: If you would prefer to use the command line, please see section

Command Line Operation.

Click to

launch

Page 4: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 4

Loading Data

The inputs for PAREsnip are:

mRNA dataset (transcriptome),

transcript degradation fragments obtained

from a PARE experiment (degradome),

small RNA dataset (sRNAome), and

genome.

The first three inputs are required but the genome is optional. When included, sRNAs not mapping

exactly to the genome sequence are discarded. All of the inputs must be in FASTA format and must

only contain the characters ‘A’, ‘C’, ‘G’, ‘T’ and ‘U’. Sequences containing unknown characters

and ambiguity codes are discarded as they cannot be accurately aligned.

To load the input data please go to File->Open.

Clicking on open will show the Choose Data dialog where you

may browse your computer and select your input files. It is

important to tell PARESnip which files contain the transcripts, the

degradome, the small RNAs and the genome. This is done by

making sure the correct radio button on the left hand side is

selected:

Example FASTA format:

>"unique identifier"

AGCTAGCTAGCTAGCTACTACCC

>"unique identifier"

AAGCTTAGCTTTACGTTAGTTAC

Page 5: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 5

PAREsnip Parameters

The parameter settings are user configurable within the Parameters Browser. To adjust the

parameters, ensure PARESnip is the top most window within the Workbench. Go to Tools->Show

Parameter Browser.

The Parameters Browser shows the possible settings for PARESnip. Each parameter is labeled in the

illustration below (1a - 7b). There are two default parameter settings which can be used according

to the type of analysis required. The first default setting is for a low throughput analysis. This uses more

liberal settings and is useful for finding targets of a small number sRNAs, such as known miRNAs for a

particular organism. The second default setting is for high throughput analysis. This uses tighter

parameters and is useful for finding targets of a large number of sRNAs, such as millions of sRNAs

obtained from a high-throughput sequencing experiment.

1a

2b

2c

3a

3b

3d

3f

4a

4b

4c

4d

4f

5a

5b

5c

5d

6a

6b

6d

3c

3e

4e

4g

6c

7b

7a

1b

2a

Parameters

Browser

Page 6: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 6

Description of Parameters

The following list describes the parameters within the Parameters Browser:

1a) Parameter settings may be saved and reloaded at a later date using these buttons.

1b) The stringency of the search parameters may be quickly changed from low stringency to

high stringency using these buttons. See appendix 1.

2a) Targets will not be sought for any sRNA with a raw abundance less than the specified

minimum.

2b) One sRNA sequence which hits a transcript cleavage site may also have several sRNA

subsequences. A subsequence has the same nucleotide composition as its longer parent

sequence, but has one or fewer nucleotides (3' end). For example:

Parent sequence: AGCTAGCTAGCTAGCTAGCTAGCT

Subsequence 1: AGCTAGCTAGCTAGCTAGCTAGC

Subsequence 2: AGCTAGCTAGCTAGCTAGCTAG

Subsequence 3: AGCTAGCTAGCTAGCTAGCTA

As subsequence 1, 2 and 3 can target the same cleavage site as the parent sequence, we

call the subsequences secondary hits. It is not always required to independently report

secondary hits as this unnecessarily increases the number of results. Ticking this check-box

ensures subsequences of hits are recorded as secondary hits, but not reported within the

main output.

2c) Secondary hits (2b) are recorded and may be output to file using the File->Save/Save As

menu item.

3a) Categories can be calculated using either raw degradation fragment abundance or

weighted degradation fragment abundance. Weighted fragment abundance is the raw

fragment abundance divided by the number of times the fragment aligned to the

transcriptome. Ticking this check-box tells PARESnip to calculate categories using weighted

fragment abundance.

3b) Peaks identified as category 0 will be included as potential sRNA cleavage sites.

3c) Peaks identified as category 1 will be included as potential sRNA cleavage sites.

3d) Peaks identified as category 2 will be included as potential sRNA cleavage sites.

3e) Peaks identified as category 3 will be included as potential sRNA cleavage sites.

3f) Peaks identified as category 4 will be included as potential sRNA cleavage sites.

4a) Any sRNA which has a full length match to t/rRNA will be discarded from the search.

4b) Candidate targets are removed if of low complexity. A low complexity sequence contains 2

or fewer unique nucleotides.

Page 7: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 7

4c) Sequences within the sRNAome are discarded if the composition of the sequence is of low

complexity. A low complexity sequence contains 2 or fewer unique nucleotides.

4d) Any degradation fragment having fewer nucleotides that this threshold will be discarded.

4e) Any degradation fragment having more nucleotides that this threshold will be discarded.

4f) Any sRNA having fewer nucleotides than this threshold will be discarded.

4g) Any sRNA having more nucleotides than this threshold will be discarded.

5a) A sRNA/target duplex may contain a single nucleotide gap.

5b) A sRNA/target duplex may contain a mismatch at position 11 (5' sRNA).

5c) A sRNA/target duplex may contain more than 2 adjacent mismatches after position 12 (3'

sRNA).

5d) The maximum number of mismatches permitted within an sRNA/target duplex (G-U pairs =

0.5 mismatches).

6a) PARESnip will calculate p-values for each interaction reported.

6b) The number of dinucleotide shuffles to be used for p-value calculation.

6c) The p-value threshold. A p-value calculation will not continue past this threshold.

6d) Interactions with a p-value exceeding the threshold will not be reported.

7a) The number of CPU cores available to PARESnip.

7b) The number of threads PARESnip should use to perform the analysis. More threads reduce

the time taken to complete an analysis.

Start an Analysis

Three mandatory inputs must be provided. A transcriptome, sRNAome and degradome. Once the

required inputs are provided the Start button becomes activated. The user should review the settings

within the Parameter Browser (see PAREsnip Parameters ) and make any changes as required.

PAREsnip is now ready to perform the analysis. Click on Start.

or

Page 8: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 8

Viewing Results

Small RNA/target Interactions are shown within a table. The

table may be saved in CSV format. Additional information

and statistics for each of the input data sets is provided in

the information area (top left box). This information includes

the number of sequences and sequence length

distributions.

Messages are provided to guide you through the analysis

along with the estimated time remaining to complete

processing the data in the messages area (top right box)

Helpful hint:

The table columns can

be sorted by clicking

on the column header.

They can also be

picked-up and moved

around within the table

using drag and drop.

Statistics

relating to

the input

data set

Table of

results

Helpful

messages

Page 9: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 9

T-Plots (VisSR)

Once an analysis is complete, you may view the results in

the form of t-plots. To do this go to:

T-Plots->View t-plots in VisSR.

This will launch the VisSR tool and load the sRNA/target

interactions.

Saving Results

The table of results can be saved in CSV format by going to File->Save. T-Plots shown in VisSR can be

saved in PDF format by clicking on the save buttons:

Page 10: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 10

Command Line Operation

In order to execute the sRNA Workbench and PAREsnip from the command line, navigate to the

directory that you extracted the sRNA Workbench files to. Then type:

java -jar Workbench.jar -tool paresnip

If no options are entered, the usage instructions will be printed to the command line. An example of

a complete instruction is given below:

java -jar Workbench.jar [-verbose] -tool paresnip -srna_file <srna-file> -

deg_file <degradome-file-path> -tran_file <transcriptome-file-path> -out_file

<output-file> [-genome_file <genome-file>] [-params <params-file>]

Note: parameters in square brackets are optional.

The optional -params <params file> contains a list of parameters to be used by PAREsnip.

An example of the content and format of a parameters file:

Content: ("true" = on/yes) Description (see page 6):

min_sRNA_abundance=1 2a

subsequences_are_secondary_hits=false 2b

output_secondary_hits_to_file=false 2c

use_weighted_fragments_abundance=true 3a

category_0=true 3b

category_1=true 3c

category_2=true 3d

category_3=true 3e

category_4=true 3f

discard_tr_rna=true 4a

discard_low_complexity_srnas=true 4c

discard_low_complexity_candidates=true 4b

min_fragment_length=20 4d

max_fragment_length=21 4e

min_sRNA_length=19 4f

max_sRNA_length=24 4g

allow_single_nt_gap=true 5a

allow_mismatch_position_11=true 5b

allow_adjacent_mismatches=true 5c

max_mismatches=4.5 5d

calculate_pvalues=true 6a

number_of_shuffles=100 6b

pvalue_cutoff=0.05 6c

do_not_include_if_greater_than_cutoff=true 6d

number_of_threads=7 7b

auto_output_tplot_pdf=false <not available at present>

Page 11: PARESnip - The UEA small RNA Workbench

PAREsnip User Guide Page 11

Appendix 1 Low stringency:

Allow mismatches at position 11 (5’ of sRNA)

Allow 2 or more adjacent mismatches (3’ sRNA)

Allow 4.5 mismatches in total

High stringency:

Do not Allow mismatches at position 11 (5’ of sRNA)

Do not Allow 2 or more adjacent mismatches (3’ sRNA)

Allow 4.0 mismatches in total

Glossary

CPU: Central Processing Unit.

Degradome: A collection of degraded transcript fragments obtained from an high-throughput

sequencing experiment using the Parallel Analysis of RNA Ends (PARE) protocol.

miRNA: Micro RNA

sRNA (s): Small RNA(s).

SRNAome: A collection of sRNAs.

Thread: A unit of execution upon the CPU.

Transcriptome: A collection of transcripts.