1 luigi salmaso associate professor of statistics university of padova research group for the...

25
1 Luigi Salmaso Associate Professor of Statistics University of Padova Research Group for the Bladder Cancer multicentric study : PF. Bassi, C. Brombin, L. Corain, M. Racioppi, L. Salmaso ROBUST CLINICAL PREDICTION INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY 2008

Upload: ami-reynolds

Post on 26-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • 1 Luigi Salmaso Associate Professor of Statistics University of Padova Research Group for the Bladder Cancer multicentric study: PF. Bassi, C. Brombin, L. Corain, M. Racioppi, L. Salmaso ROBUST CLINICAL PREDICTION INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY 2008
  • Slide 2
  • 2 Topics Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY Case study: INVASIVE BLADDER CANCER Application and results of several statistical methods to the case study Robust clinical prediction using the NonParametric Combination of Dependent Permutation Tests ( NPC Test ) Conclusions and practical suggestions
  • Slide 3
  • 3 Necessary steps for optimal statistical predictions Study design Collecting data using a Web-based Database Study protocol . . . Robust Statistical Analysis by suitable statistical methods (e.g. Nonparametric permutation methods) Individual predictions based, e. g., on nomograms or other techniques
  • Slide 4
  • 4 Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY The availability of an electronic database can improve the quality and completeness of collected data, reducing, in particular, the number of missing data and the risk of imputation errors. Accuracy in defining the nature (observational/ randomized/) and the endpoints of the study can lead to a better choice of the sample size and of the subsequent statistical analysis to perform.
  • Slide 5
  • 5 ELECTRONIC DATABASE : An example WEB-based Database Variables coding WEB-based Database
  • Slide 6
  • 6 NonParametric Combination of Dependent Permutation Tests (NPC Test) STATISTICAL ANALYSIS: standard methods and recent advances Survival Analysis Univariate Test (Student t test, Wilcoxon) Classification complex methods (Neural Networks, Artificial Intelligence, ) Multivariate Methods (Logistic regression, )
  • Slide 7
  • 7 Case study: INVASIVE BLADDER CANCER Total sample size: 1,003 subjects 469 subjects including DOD (Dead of Disease) and AWD (Alive with Disease, i.e. statistically died) patients 534 subjects including NED (Non Evidence of Disease) patients Lost patients and DOC (Dead for Other Causes) patients were excluded Aim of the study: Detecting variables (factors) that best predict the outcome (DEAD or ALIVE) after a BLADDER CANCER DIAGNOSIS Italian multicentric observational study (from Jan 2001 to Dec 2006) Reference: prof. PF. Bassi (Univ. Cattolica, Rome)
  • Slide 8
  • 8 TNM-Classification of Bladder Cancer has been used, according to Wittekind & Sobin (2002), thus the original variables were transformed into ordinal variables. 30 endpoints were considered as relevant for the statistical analysis. Case study: INVASIVE BLADDER CANCER First symptonDiagnosis patient state of health at the first medical visit I Phase Diagnosis patient condition after bladder cancer diagnosis II Phase Surgery patient state after surgery (histopathological variables were examined) Diagnosis III Phase In particular, the interest is in evaluating the importance of endpoints, collected at three phases of the study, in predicting the outcome.
  • Slide 9
  • 9 Results of Kaplan-Meier (survival analysis) (artificial example)
  • Slide 10
  • 10 Results of univariate tests
  • Slide 11
  • 11 The logistic regression model has been applied to the same dataset but very poor results were obtained (only two significant predictors: Stage TNM at I and II Phase) The main problems for application: the inability of logistic regression to handle missing values (missing data are present in 522 subjects out of 1,003 individuals); the high number of coefficients to be estimated so that the recursive algorithm do not converge (after 1000 iterations). Note that when convergence is not achieved for parameter estimates, results may be unreliable. Results of Logistic Regression
  • Slide 12
  • 12 Results of Logistic Regression
  • Slide 13
  • 13 Results of Logistic Regression: Number and % of missing values by variable Mean (missing values): 85,9 % mean (missing values): 9% Subjects with at least one missing values: 522 (52%)
  • Slide 14
  • 14 The multivariate permutation approach for hypothesis testing by NonParametric Combination (NPC) offers the following advantages: PERMUTATION APPROACH FOR HYPOTHESIS TESTING No need to specify the dependence structure among variables Exact solutions Powerful tests Treatment of missing values (missing completely at random, MCAR, or not completely at random, not-MAR) It also deals with: -Stratification -Multivariate categorical variables It handles: -Mixed variables -Multivariate restricted alternatives NPC Test implements methods and algorithms presented in several international papers by prof. L. Salmaso and prof. F. Pesarin. L. Salmaso leads an internationally recognised research group in theoretical and applied nonparametric statistics. NPC TEST is a unique and innovative statistical method (and software) that provides researchers with authentic and powerful innovative solutions in the field of hypotheses testing. Robust statistical prediction using NPC Test
  • Slide 15
  • 15 Robust statistical prediction using NPC Test FEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0 NPC TEST allows us to perform hypothesis testing in the case of: Two and C samples with dependent or independent variables Two and C samples with repeated measures Stratified analysis NPC TEST also provides: Powerful test statistics for the treatment of missing values One or two tailed test Data (including mixed variables): categorical ordered categorical numeric or continuous binary
  • Slide 16
  • 16 t Statistic ANOVA differ. of means test statistics - missing values Anderson Darling Cramer- Von-Mises Chi- square Modified Chi- square Likelihoo d Ratio Robust statistical prediction using NPC Test FEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0 Combining functions for intermediate tests include: An innovation of NPC TEST w.r.t. existing methods consists in the performance of any combination of tests, starting with an appropriate set of elementary tests, leading to a multivariate or multistrata overall global test through the NPC methodology. Elementary partial test statistics include: Fisher Liptak Tippet Direct NPC TEST supports all statistical software standard functions: data import, data manipulating and produces an effective report that can be easily integrated and customized by means of an efficient text editor.
  • Slide 17
  • 17 Robust statistical prediction using NPC Test
  • Slide 18
  • 18 After processing variables thus obtaining p-values using NPC methods, we also performed a control of the familywise error rate (FWE) The need for multiplicity control arises when any problem is structured into two or more experimental hypotheses (Finos and Salmaso, 2006) In order to have an inference on all the hypotheses defining the multivariate problem, it is necessary to control the probability of erroneously rejecting at least one univariate (elementary) hypothesis; this is called multivariate type I error or familywise error rate (FWE) (Marcus et al., 1976) Robust statistical prediction using NPC Test
  • Slide 19
  • 19 Robust statistical prediction using NPC Test CLOSED TESTING GRAPHICAL REPRESENTATION
  • Slide 20
  • 20 Results of NPC Test
  • Slide 21
  • 21 Results of NPC Test
  • Slide 22
  • 22 Results of NPC Test
  • Slide 23
  • 23 NPC method can offer a significant contribution to successful research in biomedical studies with several endpoints The advantages of NPC Test are connected with its flexibility of handling any type of variables We recommended the use of this methodology whenever the normality assumption is hard to justify, in presence of missing values and when the number of variables is higher than the number of subjects Conclusions and practical suggestions
  • Slide 24
  • 24 Bassi P.F., Pagano F. (2007). Invasive Bladder Cancer. Springer. Corain L., Salmaso L. (2007). A critical review and a comparative study on conditional permutation tests for two-way ANOVA. Communications in Statistics Simulations and Computation, 36, 791-805. Finos L., Salmaso L. (2006). Weighted methods controlling the multiplicity when the number of variables is much higher than the number of observations. Journal of Nonparametric Statistics, 18, 245-261. Finos L., Salmaso L. (2006). FDR- and FWE-controlling methods using data- driven weights. Journal of Statistical Inference and Planning, 137, 3859-3870. Finos L., Salmaso L., Solari A. (2007). Conditional Inference under simultaneous stochastic ordering constraints. Journal of Statistical Inference and Planning, 137, 2633-2641. Marcus R., Peritz E., Gabriel K.R. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63, 655-660. Marozzi M., Salmaso L. (2006). Multivariate Bi-Aspect Testing for Two-Sample Location Problem. Communications in Statistics Theory and Methods, 35, 477-488. Salmaso L., Solari A. (2005). Multiple aspect testing for case-control designs. Metrika, 62, 331-340. Wittekind C., Sobin L. H. (2002). TNM Classification of malignant tumours UICC, International Union Against cancer (6. ed.). Wiley-Liss, New York. http://www.gest.unipd.it/~salmaso/NPC_TEST.htm REFERENCES
  • Slide 25
  • 25 We applied a neural network model (Multilayer Perceptron) to the same dataset By applying a k-fold cross-validation, we obtained a rate of right classification of 75.3% for DOD+AWD and of 60.5% for NED. By using the subset of variables identified by univariate analysis we got a very similar performance (74.5% and 62.4%) Main problems of neural networks are: Neural network work as black boxes, hence it is not possible to convert the neuronal structure into a known model structure All input fields must be numeric (in the study we do not have numerical but ordinal categorical variables) Neuronal networks can suffer from a problem called interference (i.e. to forget some of what it learned on older data) Results of Neural Networks