statistica

22
Student’s Laboratory, Department of Biophysics Statistica instruction 1 1. Data import .. 2 2. Basic data processing .. 3 3. Data recalculation .. 5 4. Descriptive statistics 7 5. Histogram 8 6. Plot creating . 10 7. Plot formatting . 12 8. Plot points deleting ... 13 9. Plot data reading 14 10. Multiple plots of data contained in different worksheets .. 15 11. Fitting data with standard models 16 12. Fitting data with non-standard models .. 19

Upload: brenden-kramer

Post on 15-Sep-2015

234 views

Category:

Documents


0 download

DESCRIPTION

Manuel estatistica

TRANSCRIPT

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    1

    1. Data import .. 2

    2. Basic data processing .. 3

    3. Data recalculation .. 5

    4. Descriptive statistics 7

    5. Histogram 8

    6. Plot creating . 10

    7. Plot formatting . 12

    8. Plot points deleting ... 13

    9. Plot data reading 14

    10. Multiple plots of data contained in different worksheets .. 15

    11. Fitting data with standard models 16

    12. Fitting data with non-standard models .. 19

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    2

    1. Data import

    Fig. 1. Statistica program main window. Two example worksheets are presented.

    All data managed in Statistica program are presented in a form of data

    worksheets (Fig. 1). Data can be entered manually, pasted through the clipboard or

    imported form a disk file.

    Most frequently in labs the data are imported from the disk files created by

    software applied in the particular exercise in a form of DAT or TXT text files. Use File

    Open option in order to import the data. A dialog window occurs in order to

    define the location and name of imported file. The default file type filter applied in the

    open dialog does not allow to see the text files so All files (*.*) filter has to be

    chosen in Pliki typu (File types) edit box. Ones a file location and a file name was

    defined use Otwrz (Open) button which opens subsequent dialog window entitled

    Importing file. In all cases valid in labs, both Import as Spreadsheet and Delimited

    options, should be checked.

    Next window called Import Delimited Text Files (Fig. 2) contains a lot of

    options. but only few are used in labs. If the use of any particular option would be

    necessary the detailed suggestions will be contained in the exercise instruction.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    3

    Here only one problem should be pointed out. The most of problems with the

    data import occurs when the data decimal separator (comma or dot) is different from

    the decimal separator defined as the default in the operating system.

    It is quite easy to see what is the data decimal separator as the file preview is

    visible in the lower part of the dialog. Check Decimal separator character option

    and put the same separator into the edit box next to the option, as visible in the

    preview. Than use Refresh View button. If the data columns are well separated and

    the Format String contains a string of R letters only without any T letters and

    numbers the import will be performed properly. In the example shown in Fig. 2 the

    decimal separator was not defined properly. T10 T5 string visible in Format String

    field means that the program recognized data as two columns table and the columns

    represent text variables consisted of 10 and 5 characters.

    In case if the chosen options are not working well use Reset button and start

    with another options combination.

    Fig. 2. Data import dialog window.

    2. Basic data processing

    The operations of data contained in Statistica worksheets resemble the way

    of processing in MS Excel or Open Office Calc programs.

    In order to mark the whole data column click the gray button in the column

    header denoted with the column name and number. For instance in order to mark the

    third column from the left in the example shown in Fig. 1, one has to click the button

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    4

    denoted as 3 Var3. Var3 is the name of variable represented by the column, and 3

    is the column number. Column names and numbers are important because can be

    used in formulas used for data recalculations.

    How to create an empty worksheet?

    When a new worksheet is created (File New option, Spreadsheet tab)

    the columns number (Number of variables:) and the rows number (Number of

    cases:) have to be defined. Other options usually stay not changed. In the created

    worksheet the defined numbers of columns and rows are marked with white, while

    the rest of the worksheet is marked in gray. As far as the columns/rows numbers will

    not be redefined the gray part of the worksheet is not accessible.

    How to change the data in the worksheet?

    Click the appropriate worksheet cell and input the new value. The change is

    confirmed with the use of ENTER button or after changing focus to the other cell.

    How to delete the cell contents?

    Click onto the cell and use DELETE button.

    How to add new columns/rows at the end of the worksheet?

    Double click on the gray worksheet area. The Add cases and/or variables

    dialog appears allowing the definition of the number of new columns/rows. The

    default values of dialog edit boxes depend on the position which was double clicked.

    For instance if one would like to introduce two new columns and three new rows in

    the worksheet shown in Fig. 1, he has to double click two positions to the left and

    three positions down from the most right and bottom active (white) cell.

    How to add a new column/row in any worksheet position?

    The Insert option allows the definition of new columns/rows. Add variables

    or Add cases should be chosen depending on needs. In the case of columns

    insertion How many:, After: and Name: fields should be fulfilled while only two: How

    many: and Insert after case: edit boxes in the case of rows insertion.

    How to delete a piece of worksheet?

    It is possible to remove whole rows or columns from the worksheet. In order to

    achieve that, mark the area which should be removed and use Edit Delete

    Variables or Edit Delete Cases option.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    5

    How to change variable (column) properties?

    Some column properties can be defined by double click on the column header.

    The most important possibility offered by the dialog appearing (Fig. 3) is the change

    of variable name. The column name is very important in Statistica program. It can be

    used in data recalculations or plotting.

    One of the most usefull and interesting options of presented dialog is the possibility

    of calculating the mean value, standard deviation and number of valid cases in

    chosen column. Use Values/Stats to do that.

    Fig. 3. The dialog window used for the variable (column) properties definition.

    3. Data recalculation

    There are few different possibilities of performing the data recalculations. Most

    often the effect of data recalculations is stored in a new column and is the effect of

    using the variables stored in the worksheet earlier.

    The simples possibility is to use the dialog window applied for column

    properties definition (Fig. 3). It is opened by double click on the column header.

    The formula used for data recalculation should be introduced in Long name

    (label or formula with Functions): edit box. For instance in the case when in

    column called Var3 the sum of columns Var1 and Var2 has to be calculated put the

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    6

    following formula: =Var1+Var2 into the edit box. Notice the Functions button above

    the edit box. It can be used for the choice of the functions implemented in Statistica.

    Another possibility is Data Batch Transformation Formulas option. It is

    opening the dialog window shown in Fig. 4. The edit box contained in the dialog

    allows for the definition of very complicated formulas applying numerous functions,

    operators and variables. It is possible to recalculate many variables at the same time

    with separated formulas.

    If concern the example shown in Fig. 4, the sum of variables Var1 and Var2 will

    be stored in variable Var3. At the same time the values of column Var4 will be

    calculated as Var1 multiplied by the square root of variable Var2.

    Fig. 4. Batch transformation Formulas dialog window.

    Calculations performed in Statistica on the basis of user defined formulas are

    not refreshed automatically in default. It means, that any changes of data used earlier

    in calculations are not automatically influencing calculations results. To achieve the

    automatic data update use Data Recalculate Spreadsheet formulas and

    subsequently Auto-recalculate when the data change option in the appeared

    dialog window. Since that all formulas will be recalculated automatically. This rule is

    valid only for formulas defined in the dialog window used for column properties

    definition (Fig. 3), not for the batch transformations.

    Batch transformations can be updated only by the use of Batch

    Transformations Formulas dialog.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    7

    4. Descriptive statistics

    There are three methods for calculating statistical parameters for variables.

    The amount of achieved information is different in these three methods and they are

    not equivalent.

    Method 1: Click twice the column header for desired variable and choose

    Values/Stats button in the appeared dialog window (Fig. 3). The average, the

    standard deviation and the number of valid cases is displayed (Fig. 5). It is possible

    to copy statistics to the clipboard with button.

    Method 2: Mark a part of the worksheet for which the statistics have to be

    calculated. Click ones with the right mouse button to display the context menu

    (Fig. 6) and choose Statistics of Block Data Block Columns. Than pick the

    statistics which should be produced, e.g. All. Calculated results are placed in a new

    worksheet.

    Fig. 5. The dialog window displaying statistics of variable.

    Fig. 6. Descriptive statistics calculations for choosen part of the worksheet.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    8

    Method 3: Use Statistics Basic Statistics/Tables option. Subsequently

    choose Descriptive statistics option in appearing window and confirm the choice

    with OK button. Next dialog contains a lot of options, use Sumary: Statistics or

    Summary buttons to calculate statistics. Calculations can be done on the basis of

    chosen or all variables. Mark desired variables before using the presented option or

    use Variables button in the last described dialog window. The results are displayed

    as a new worksheet.

    5. Histogram

    Use Graphs Histograms option in order to create the histogram of data

    stored in the worksheet. The option opens a dialog window shown in Fig. 7. Variables

    button displays form allowing choice of variables for analysis (Fig. 8).

    Fig. 7. 2D Histograms dialog window.

    Fig. 8. Select Variables for Histogram window allowing the choice of columns for

    histogram analysis.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    9

    The window 2D Histograms contains Advanced tab (Fig. 9) which offers

    some useful features. The created histogram can be fitted with one of many

    statistical distributions. The choice can be done in File type list.

    As the analysis is performed some statistical data are calculated. Four check

    buttons in Statistics panel allow to decide what kind of statistics has to be

    calculated.

    Another interesting and useful possibility is Boundaries option. If the radio

    button titled Boundaries is checked, the Specify Boundaries button becomes

    active. The use of mentioned button opens the possibility of redefining the data range

    taken into account in the histogram (Fig. 10). Also the width of analyzed bins can be

    set in Interval Step edit box. All elements of the histogram plot, as well as in the

    case of any other plot types can be performed by double click on the element which

    needs edition. An example of a histogram with descriptive statistics and Gaussian fit

    is presented in Fig. 11.

    Fig. 9. Advanced tab in 2D Histograms dialog window.

    The type of fitted statistical distribution

    Statistics calculated In the histogram analysis procedure.

    The button allowing redefinition of analysis range and bin width.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    10

    Fig. 10. The dialog box used for the histogram range redefinition.

    Fig. 11. An example of histogram created in Statistica program. The histogram is

    fitted with normal distribution (Gaussian). The normal distribution parameters are

    displayed in the plot header. Descriptive statistics in the bottom part of the plot is

    displayed because Descriptive Statistics check box was used in Advanced tab of

    2D Histograms dialog (Fig. 9).

    6. Plot creating

    A plot is a set of points marked in the coordinating system and usually illustrates

    a dependence between two quantities e.g. pressure as a function of time. Avery point

    is characterized by two co-ordinates: Y called dependent variable, ad X called

    independent variable. In this nomenclature Y is a function of X.

    The first thing to do before creating the plot is the choice of independent and

    dependent variables. Variables are represented in Statistica by data columns.

    Program allows the plot of many dependences on single plot. In order to create the

    plot one independent (X) variable (column) has to be chosen while many dependent

    variables (Y) can be applied.

    Descriptive statistics.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    11

    Most plots in labs, except from histograms described in the previous point, can

    be created with Graphs Scatterplots option (Fig. 12) which displays

    2D Scatterplots dialog window (Fig. 13).

    Fig. 12. A main program menu used for plots creation.

    Fig. 13. 2D Scatterplots dialog window.

    The variables used for plot preparation can be chosen in dialog box (Fig. 14)

    run by Variables button located in Quick tab of 2D Scatterplots dialog window.

    Appearing Select Variables for Scatterplot dialog box contains two lists of variables

    defined in the active worksheet. Four elements are contained in the example shown

    in Fig. 14: Var1, Var2, Var3 and Var 4. Var2 and Var3 were chosen as the dependent

    variables while Var1 was defined as the independent one.

    The choice of many dependent variables at the same time gives the possibility

    of creating multiple dependencies on a single plot but the Grapht type option in 2D

    Scatterplots window should be defined as Multiple. In case if Regular option was

    set, only one dependency will be plotted. An example of a scatter plot created in

    Statistica program is shown in Fig. 15.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    12

    Fig. 14. The dialog box allowing the choice of variables used for plot preparation.

    Scatterplot of Var2 against Var1

    RR-Proba 2v*218c

    0,80 0,85 0,90 0,95 1,00 1,05 1,10 1,15 1,20 1,25

    Var1

    0,80

    0,85

    0,90

    0,95

    1,00

    1,05

    1,10

    1,15

    1,20

    1,25

    Va

    r2

    Fig. 15. An example of a regular scatter plot created in Statistica program.

    7. Plot formatting

    In order to modify any plot element click it double. The appearing menu

    depends on the element one would like to alter. E.g. when double click on Var1 axis

    title was done in example shown in Fig 15, a dialog window allowing the axis title

    edition appears (Fig. 16a).

    In all cases when any graph element is chosen by double clicking a Graph

    Options dialog window is displayed but depending on situation it is opened with

    different options tree position. The options structure is shown on the left site of the

    window (Fig. 16a,b). The Graph Options window could be always displayed from the

    right mouse button context menu.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    13

    a) b)

    Fig. 16. Graph Options dialog window.

    One of the most frequently options used, concerning the plot format, is axis

    scaling. Usually default procedure does not working well and plots demand manual

    rescaling. Choose Scalling option in Graph Options window on the options

    structure tree visible on the right site of the window (Fig. 16b) and then change the

    Mode option from Auto to Manual. Afterwards redefine the axis ranges in Minimum

    and Maximum edit boxes.

    Options shown in Fig. 16b allow also for the choice of different scale types

    (Scale type panel). From five available possibilities, Linear and Logarithmic scale

    types are most often used.

    All choices done in dialog shown in Fig. 16b concern the axis which was picked

    up in Axis edit box in the upper part of the dialog.

    8. Plot points deleting

    Sometimes because of some mistakes and errors wrong data are occurring.

    They can be excluded from the analysis if necessary with the use of brushing

    function. In order to exclude some data use right mouse button on the plot and pick

    Show Brushing option (Fig. 17a). The cursor shape changes for magnifying glass.

    Also another tool box titled 2D Brushing appears. Since that the unnecessary data

    points can be marked and eliminated. In order to do that mark the data first with the

    left mouse button. Marked data are displayed on the plot in black. In order to mark

    more than single point mark subsequent points using CTRL keyboard key. The

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    14

    marked points can be removed by clicking on them with the right mouse button and

    choosing Brushing Off option in appearing context menu (Fig. 17b).

    In order to switch off the brushing mode use ESC button from the clipboard.

    a)

    b)

    Fig. 17. The use of brushing function.

    9. Plot data reading

    It is very easy to read the data values from the plot. In order to display the data

    point co-ordinates and case (row) number of particular point it is enough to pick the

    point with the mouse cursor without clicking. The picked poin description is displayed

    in a small box next to the point as shown in Fig. 18.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    15

    Fig. 18. Data point values reading.

    10. Multiple plots of data contained in different worksheets

    The creation of multiple plots is easy when the data are contained in one

    worksheet. Use the same method as presented in Chapter 6. Set Graph type option

    for Multiple in 2D Scatterplots dialog (Fig. 13) and then use Variables button to

    define dependent and independent variables.

    When the data which should be presented on a single plot belong to different

    worksheets it is necessary to merge them before plotting.

    It is useful to set specific names for variables (columns) in both merged

    worksheets first. It helps in proper variables identification and allows to avoid

    mistakes. Click twice the column header to call a dialog allowing the column name

    change (Fig. 3).

    Use Data Merge option (Fig. 19) in order to merge two separate

    worksheets. A Merge Options dialog appears (Fig. 20).

    Fig. 19. Data Merge option.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    16

    Fig. 20. Merge Options dialog window.

    The most important is to choose the worksheets for merge using File 1 and

    File 2 buttons. Chosen worksheet names are placed next to mentioned above

    buttons and OK button creates a new merged worksheet. Then use Graphs

    ScatterplotsF. option and follow standard procedure of creating plots as described

    in Chapter 6.

    11. Fitting data with standard models

    There is in Statistica program the possibility of fitting mathematical models to

    data. The most popular models (linear, exponential, logarithmic etc.) can be fitted to

    the data when the data are plotted. This method will be described in this point. It has

    to be pointed out that it is not possible to calculate the fitted model parameter errors

    in this solution. Only model parameters and some simple statistics are calculated. If

    parameters errors are necessary refer to the next point of the instruction.

    For the plot creation one of the two subsequent options can be used:

    Graphs Scatterplots or Graphs 2D Graphs Scatterplots . The first

    tab of appearing window called Quick was described earlier. The second tab called

    Advanced gives the chance to fit mathematical models to experimental data

    (Fig. 21).

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    17

    Rys. 21. 2D Scatterplots widndow, Advanced tab.

    After choosing the dependent and independent variables for plotting and fitting

    with the use of Variables button, use Fit panel to choose the fitting model. Between

    8 models implemented here only three will be used in labs: Linear (Y=A+BX),

    Logarithmic (Y=A+Blog(X)) and Exponential (Y=AeBx). Finally, decide which

    statistical data have to be included to the plot in Statistics panel.

    An example of a plot of data fitted with the linear function is shown in Fig. 22.

    Var2 variable was plotted as a function of Var1 variable. The plot was fitted with a

    linear function which equation is Var2 = 26.4067 - 2.2257x (see the plot header).

    Thera are statistical data, including also the function equation in the left lower corner.

    In many cases there is a need to fit a mathematical model only to the part of

    data. In such case prepare the plot with fitting in normal way i.e. fit the whole data

    range. And then limit the fitting range in Graph Options window. The window starts

    after double clicking the fitted curve. Ensure that Fitting branch is active in the

    options tree visible on the left site of the options window. If not click it. The Graph

    Options window should look like one showed in Fig. 23. The default value in Range

    list box is Full range. Change it for Axis range in order to limit the fitting range to the

    data range visible on plot. In the case when Custom range was set it is necessary to

    define manually the beginning and the end of the range of data which should be fitted

    in Min and Max edit boxes.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    18

    Fig. 22. Example of a plot with linear model fitted to the experimental data.

    Fig. 23. Graph Options window opened in Fitting section. The range list box allows

    for the redefinition of the data range taken into consideration when the mathematical

    model is fitted.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    19

    12. Fitting data with non standard models

    The procedure described here can be used in two cases: (1) if there is a need

    for data fitting with non-standard models i.e. model not implemented in 2D

    Scatterplots window or (2) if it is necessary to use standard models but there is

    necessary to calculate model parameters and evaluate their errors.

    Use Statistics Advanced Linear/Nonlinear Models Nonlinear

    Estimation (Fig. 24) option which starts Nonlinear Estimation dialog (Fig. 25). Pick

    User-specified regression, last squares and confirm your choice with OK button.

    Click Function to be estimated button in the next dialog window which allows for

    the definition of any mathematical model for data fitting.

    Fig. 24. Nonlinear Estimation option allows fitting of any mathematical model to the

    experimental data.

    Fig. 25. A dialog window used for the estimation method choice.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    20

    The dialog window showed In Fig. 26 allows for the definition of mathematical

    model used for data fitting. There are some examples showing the syntax used here

    on the bottom.

    Variables names, operators and Statistica functions can be used in model

    function definition. All characters and character strings not recognized as one of the

    categories mentioned above are treated as model parameters.

    Following string was defined in the example showed in Fig. 26:

    PrzY=A+B*Exp(-C*X). It means that the function xC

    eBAy

    += will be used. PrzY

    variable (column 4 in the worksheet) plays the role of dependent variable (y) while X

    (column 1) is the independent variable (x). A, B and C are model parameters which

    will be calculated in the estimation (fitting) procedure.

    Fig. 26. Dialog window used for definition of the mathematical model.

    After model function is set use OK button to come back to User-Specified

    Regresion, Last Squares dialog. The next procedure stage starts after using OK

    button. Dialog window showed in Fig. 27 allows the fitting procedure start (OK

    button). The fitting procedure is based on the minimization method. The computer

    looks for the best parameters values starting from some initial values. The default

    initial values for parameters are always the same and are equal to 0.1. This default

    choice is usually not proper and sometimes causes some errors when trying to fit the

    model (e.g. Predictors are probably very redundand; estimates suspect). In other

    cases the model could be fitted without errors, but the fitted curve will not fit perfectly.

    In both cases the solution is to choose new initial parameters.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    21

    Use Advanced tab to do that (Fig. 27). There is a button titled Start values:

    which allows the definition of parameters initial values (Fig. 28). The choice of initial

    values is not easy and needs some experience. The simpler solution is to set all

    parameters for 0. If this would not work it is necessary to take into consideration the

    physical interpretation of experimental data and mathematical model to guess the

    initial values. Some solution could be also to try to fit the simpler but standard model

    to the experimental data and use its fitted parameters as the initials for the non-

    standard model.

    After possible choice of starting parameters values use OK button in window

    showed in Fig. 27. If the procedure would be succeed results window appears (Fig.

    29). Two buttons need attention here: Summary and Fitted 2D function &

    observed vals.

    Summary button displays the worksheet with fitting procedure results (Fig. 30).

    There are some statistical data and the most important among them are parameters

    values (first column) and their errors (second column).

    Fitted 2D function & observed vals button creates a plot with fitted curve.

    There is also the estimated function equation visible in the plot header.

    Fig. 27. The window used for data fitting with non-standard models. Advanced tab

    with Start values: button allows for the initial parameters values definition.

  • Students Laboratory, Department of Biophysics

    Statistica instruction

    22

    Fig. 28. The dialog window used for initial values definition.

    Rys. 29. The dialog appearing if the fitting procedure succeed. .

    Rys. 30. Fitting procedure Summary.