visualising errors in animal pedigree genotype data

26
VISUALISING ERRORS IN ANIMAL PEDIGREE GENOTYPE DATA Martin Graham, Jessie Kennedy, Trevor Paterson & Andy Law Edinburgh Napier University & The Roslin Institute, Univ of Edinburgh, UK

Upload: martinjgraham

Post on 07-Jul-2015

289 views

Category:

Education


0 download

DESCRIPTION

Presentation I gave at EuroVis 2011 on the VIPER project

TRANSCRIPT

Page 1: Visualising errors in animal pedigree genotype data

VISUALISING ERRORS IN

ANIMAL PEDIGREE

GENOTYPE DATA

Martin Graham, Jessie Kennedy, Trevor Paterson & Andy Law

Edinburgh Napier University & The Roslin Institute, Univ of Edinburgh, UK

Page 2: Visualising errors in animal pedigree genotype data

Pedigrees

Animal pedigrees are their family trees – who’s

whose father, mother etc

In animal breeding these pedigrees are strictly

controlled to maximise traits of value or

suppress unwanted ones

Page 3: Visualising errors in animal pedigree genotype data

A genotype is the genetic make-up of an

animal

Pedigree + genotype = pedigree genotype

Not the whole genotype, use sets of markers

Marker type: SNP (Single Nucleotide

Polymorphism)

Each SNP has 2 alleles, one inherited from each

Pedigree Genotypes

Marker Values

M1 C|T

M2 A|A

M3 A|G

... ...

Example

Individual

Page 4: Visualising errors in animal pedigree genotype data

But...

However, most large datasets have errors

Errors when recording pedigree

Technical errors e.g. wrongly detected marker

Misassigned samples

Also incomplete data

These errors make the data genetically

inconsistent

This makes them unusable for most downstream

analyses

Page 5: Visualising errors in animal pedigree genotype data

?

C | ?

Example

Various possibilities here

Dad is Juniors’ father but the genotyping is

incorrect

Dad isn’t Junior’s father and the genotypes are

correct

Need to find/isolate/clean such data

Mum

A | A

Dad

G | G

Junior

A | C

G | G

C

Page 6: Visualising errors in animal pedigree genotype data

Table Viewer

Current table-based viewer

Grid of markers x individuals; genotype values in

cells

Universally ‘bad’ markers or individuals stand out

Page 7: Visualising errors in animal pedigree genotype data

Table Viewer

Expert biologists are needed to pinpoint the source of reported errors

But without a pedigree context to anchor the errors in, it’s impossible to do this

Page 8: Visualising errors in animal pedigree genotype data

Previous Work

Multitude of pedigree viewers, but all have

issues with scalability or handling extra

(genotype) data

Page 9: Visualising errors in animal pedigree genotype data

Voyage of Discovery

Mainly discovering representations that didn’t

work

Iterated through a number of different

representation styles that failed for various

reasons

Page 10: Visualising errors in animal pedigree genotype data

Node-Link View

Can see that the pedigree clusters around a few males

But hard to follow edges/directions, loss of generational context

Page 11: Visualising errors in animal pedigree genotype data

Hierarchical Node-Link View

Regain visual generation structure of pedigree

But plagued with more edge crossings than

before

Page 12: Visualising errors in animal pedigree genotype data

Matrix View

Matrices are the main alternative to drawing node-link diagrams for relational information

We rejected having one overall matrix due to sparsity

Page 13: Visualising errors in animal pedigree genotype data

Matrix View

One matrix per generation ‘gap’ (parent offspring)

Rather than sources v sinks - sires v dams; offspring in cells

Allows sorting of parent genders by properties

Page 14: Visualising errors in animal pedigree genotype data

Sandwich View

Realised that in these matrices, either the rows

or columns will only have one filled cell each if

one of the parent genders is monogamous

In animal experiments this tends to be the

case, a female breeds with only one male per

generation

Each matrix can thus be replaced with a

compressed view

Page 15: Visualising errors in animal pedigree genotype data

Sandwich View

The sandwich view is a specialised view of the

bipartite graph between two generations

With the top layer split into males/females and the

females pushed beneath the bottom layer

Sires

Offsprin

g

Dams

Parents

Offsprin

g

Connectors to repeated

node representations if

necessary

Page 16: Visualising errors in animal pedigree genotype data

Sandwich View

Sandwich view of the relationships between

two adjacent generations

All the other pedigree views of full generations

involved tracing paths between

parents/offspring

Sires (Male Parents)

Dams (Female Parents)

Offspring

1 male has children

with multiple females

Page 17: Visualising errors in animal pedigree genotype data

Sandwich View

Page 18: Visualising errors in animal pedigree genotype data

Error Information

Colour is used to convey an individual’s error

status over all the markers in a data set

More errors = higher saturation

Parent – coloured by overall error count

Offspring drawn as hexagonal glyphs

‘Up’ triangle – incompatibilities with sire

‘Down’ triangle – incompatibilities with dam

Middle portion – markers exist that are not present

in either parent

Page 19: Visualising errors in animal pedigree genotype data

Aggregating offspring

Groups of siblings who share the same

parents can be aggregated under one glyph

Colouring now represents errors in all markers

over a group of individuals

Troublesome families & parents can be clearly

seen

Error Information

Page 20: Visualising errors in animal pedigree genotype data

Filtering

Error Filtering

The table view ( ) clearly showed

rogue markers and individuals, and these can be

filtered by a user in that application

To the sandwich view we add two complementary

histograms that perform the same purpose

Page 21: Visualising errors in animal pedigree genotype data

Filtering

Error Filtering

Each histogram shows number of errors along the X axis

Number of individuals/markers with that number of errors on the Y axis

Typical pattern: A few individuals / markers have lots of errors, and the majority have a few or no errors

Mantra is to discard bad markers and look at bad individuals

Page 22: Visualising errors in animal pedigree genotype data

Sandwich view

Pic/Vid of full view (To Do)

Page 23: Visualising errors in animal pedigree genotype data

Video

Page 24: Visualising errors in animal pedigree genotype data

Conclusion

Developed new style of pedigree visualisation

Shows detailed errors at a family level

Shows overview of errors in an entire pedigree

Keeps offspring close to their parents for family-

centric view

Page 25: Visualising errors in animal pedigree genotype data

Future Work

Single marker views of errors

Making the sandwich into a club sandwich

Split the middle layer into multiple layers

i.e. By gender to spot sex-related marker errors

Page 26: Visualising errors in animal pedigree genotype data

Acknowledgements

Reviewers

BBSRC funded project