bayesialab satisfaction poll analysis

Data analysis –

satisfaction poll

In this part we

present how to

define global

satisfaction and how

to see all

interactions between

variables.

Data is contained in

text file (CSV).

There is a title line

The separator

is a semicolon

The import

wizard

automatically

detects the file

separators and

title line.

The first column is

an identifier. Since

this information is

not useful for

analysis, the

column becomes

grey: it is unused.

The file contains

missing data. The

average value of

present data shall

replace any

missing value in

the considered

column.

Data information is displayed

here. 711 poll responses are

gathered in this dataset.

Discretizing

continuous values

Variables represent evaluation

marks from 1 to 10. Manual

discretization allows showing

repartition function of the

selected continuous variable.

Generate a

discretization

with equal

distances with

three intervals

leads to this

graph.

Since the

discretization is

adequate, it can be

applied to all

variables

For transferring the

discretization mode

to other variables Ctrl + A for applying

discretization to all

variables.

The Bayesian

network is

created with one

node per column.

For characterizing

global satisfaction,

the first step is to

use the search

function for

finding

“Satisfaction”

node.

The search function

* and % can be used for

simplifying search

Clicking on the

line causes the

node to blink.

This node is the

target variable of

the analysis. We

are interested in

the >7 satisfaction

value.

The augmented

Markov blanket

shall be used for

characterizing the

target variable. It

allows to find the

minimal set of

variables that

characterize

global satisfaction.

Zoom in and out

tools are available

for better graph

visualization.

Force directed

layout positioning

algorithm allows

organizing the

nodes on the

workspace

While switching

to validation

mode, note that

only 15 nodes

among 215 are

selected relevant

by the network

For highlighting

important

relationships

between

variables, the

force of the arcs

tools shall be

used.

An arc’s thickness

is proportional to

its relevance with

regards to target

variable. SE1

variable is the

most important

for global

satisfaction

Unconnected nodes

become transparent.

BayesiaLab can

generate reports.

SE1 node is in first

position : it is the

most important

variable of this

analysis.

The probabilistic

profile of polls

presenting a

global satisfaction

mark >=7 is also

reported.

After closing the

report, note that it

is possible to

monitor all

correlations

between variables

by right clicking in

the right side of

the screen.

The monitors

display the

probability

distribution and

permit changing

the variables

values.

Target variable has

red background.

As the most important,

SE1 variable appears in

first position.

Monitors can be

used for finding

the probabilistic

profile of polls

presenting high

satisfaction mark.

When clicking on this modality,

the probabilities are

propagated throughout the

network. The probabilistic

profile becomes readable.

The same

technique can be

applied to other

modalities and

variables. The

results are

automatically

propagated to the

remaining

variables.

Poor SE1 mark is

reported on all monitors.

After target

variable

characterization,

the second part of

this tutorial

explores the

relationship

between all

variables of the

poll.

In modelization

mode, delete all arcs.

The SopLEQ

algorithm is

appropriate for

discovering

associations

between

variables.

After some

computational

time, SopLEQ

learning finds a

complex network.

By using

positioning and

zoom tools, the

graph becomes

more reader-

friendly.

In this case, where

the graph is large

but with average

connectivity,

symmetric

positioning is

adequate.

For increasing

network

readability, a

comments

dictionary can be

linked with the

graph. In this file,

the name of each

node is completed

with comments.

When done, hints

indicate that the

node has

comments.

Clicking this button displays

or disables comments for

selected nodes

A modality

dictionary can

also be

interactively

designed. This can

be done by double

clicking on a node

and opening

“modality name”

sheet

Give a name to

each modality

Once the

modalities labels

are validated, the

dictionary can be

exported as a text

file

The file is defined

only for SK5 node.

#Wed Oct 11 14:28:27 CEST 2006

SK5.<\=7=Average

SK5.<\=4=Poor

SK5.>7=Very good

By a simple

modification, it

becomes valid for

all nodes of the

graph.

#Wed Oct 11 14:28:27 CEST 2006

<\=7=Average

<\=4=Poor

>7=Very good

The dictionary can

now be associated

back to all nodes

of the graph

The monitors

from the

validation mode

become easier to

read.

The same process

can be applied for

attributing values

to modalities and

generating

modality values

dictionary.

This is done in

modelization

mode, by double

clicking a node

and opening the

“values” sheet.

When the

modality is poor,

it marks 0 points,

10 points for

average and 20

points for very

good

The same process

consisting of

exporting the

dictionary,

modifying the text

file and importing

back can be

applied for

attributing values

to all nodes

modalities

The total and average

values of the graph

modalities are calculated

The values are also

computed depending on

the probability distribution.

Every question is

related to a theme.

For instance, this

pool has 36

themes. The class

concept in

BayesiaLab is

useful for

associating

themes to nodes.

The themes

dictionary is

contained in a text

file.

By clicking on the

new-appeared

icon on the

bottom right of

the window, the

class editor opens.

It becomes

possible to apply

modifications to

classes instead of

applying to nodes

Opens the class editor

The readability

can be increased

by applying

automatic class

colours. This is

done by selecting

all the classes

with <ctrl + a>

and clicking the

“color” button.

Note that nodes

are globally

gathered by

colour. This

provides useful

information about

links inter and

intra-theme. In

this case, this also

denotes a well-

designed poll.

When closing the “Edit

classes” window, the

nodes become coloured

depending on their class.

The comments are

also coloured

depending on the

class.

A “colours

dictionary” can

also be saved as a

text file.

In this example,

themes have been

created base on

expert knowledge.

Nevertheless,

BayesiaLab

provides tools for

automatic theme

design by

grouping

semantically close

variables.

In validation mode, the

variable clustering is based

on association rules

discovering in the network.

Since the

clustering is

applied, new

colours are

applied to nodes.

BayesiaLab identified

48 nodes groups.

Moving this cursor forces

the number of groups.

The nodes colours are

also changed.

There are two

other new icons in

the clustering

toolbar.

Exiting the clustering modeThis is for validating

the current clustering

BayesiaLab is able

to build latent

variables

according to the

recently realized

clustering.

When validating, a

confirmation is asked.

In modelization

mode, the

multiple

clustering allows

clustering

individuals from

each single

variable group.

This wizard tunes

the multiple

clusterings

realized. (one per

identifier cluster).

Data is saved in this directory

Specifying the

number of

classes for each

new latent

variable

In the same

fashion as data

clustering, a

HTML report is

created for each

clustering. They

are useful for

renaming new

variables and

their modalities

Once the

clusterings are

realized, a new

network is

created with one

node per latent

variable (keeping

the initial colour)

An internal database is

created. It contains the most

probable cluster values for

each line of the initial file.

This database can be saved in

a spare file with the “data”

menu.

Probabilistic

relationships

between the

nodes of this new

network can be

discovered with

the SopLEQ

algorithm.

After computation

and automatic

nodes positioning,

the obtained

network present

51 nodes

representing the

latent variables of

the initial dataset.

bayesialab satisfaction poll analysis

Documents

satisfaction node

important variable

se1 variable

satisfaction value

global satisfaction

variables values

remaining variables

variables ctrl