machine learning in php php con poland

43
Machine Learning in PHP Poland, Warsaw, October 2016 "Learn, someday this pain will be useful to you"

Upload: damien-seguy-

Post on 26-Jan-2017

193 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Machine learning in php   php con poland

Machine Learning in PHP

Poland, Warsaw, October 2016

"Learn, someday this pain will be useful to you"

Page 2: Machine learning in php   php con poland

Agenda

• How to teach tricks to your PHP

• Application : searching for code in comments

• Complex learning

Page 3: Machine learning in php   php con poland

Speaker

• Damien Seguy

• Exakat CTO

• Static analysis of PHP code

Page 4: Machine learning in php   php con poland

Machine Learning

• Teaching the machine

• Supervised learning : learning then applying

• Application build its own model : training phase

• It applies its model to real cases : applying phase

Page 5: Machine learning in php   php con poland

Applications

• Play go, chess, tic-tac-toe and beat everyone else

• Fraud detection and risk analysis

• Automated translation or automated transcription

• OCR and face recognition

• Medical diagnostics

• Walk, welcome guest at hotels, play football

• Finding good PHP code

Page 6: Machine learning in php   php con poland

Php Applications

• Recommendations systems

• Predicting user behavior

• SPAM

• conversion user to customer

• ETA

• Detect code in comments

Page 7: Machine learning in php   php con poland

Real Use Case

• Identify code in comments

• Classic problem

• Good problem for machine learning

• Complex, no simple solution

• A lot of data and expertise are available

Page 8: Machine learning in php   php con poland

Supervised Training

Historydata Training

ModelReal data Results

Page 9: Machine learning in php   php con poland

Supervised Training

Historydata Training

ModelReal data Results

Page 10: Machine learning in php   php con poland

The Fann Extension

• ext/fann (https://pecl.php.net/package/fann)

• Fast Artificial Neural Network

• http://leenissen.dk/fann/wp/

• Neural networks in PHP

• Works on PHP 7, thanks to the hard work of Jakub Zelenka

• https://github.com/bukka/php-fann

Page 11: Machine learning in php   php con poland

Neural Networks

• Imitation of nature

• Input layer

• Output layer

• Intermediate layers

Page 12: Machine learning in php   php con poland

Neural Networks

• Imitation of nature

• Input layer

• Output layer

• Intermediate layers

Page 13: Machine learning in php   php con poland

<?php 

$num_layers         = 1;  $num_input          = 5;  $num_neurons_hidden = 3;  $num_output         = 1;  $ann = fann_create_standard($num_layers, $num_input,                              $num_neurons_hidden, $num_output); 

// Activation function fann_set_activation_function_hidden($ann,                                    FANN_SIGMOID_SYMMETRIC);  fann_set_activation_function_output($ann,                                     FANN_SIGMOID_SYMMETRIC); 

Initialisation

Page 14: Machine learning in php   php con poland

Preparing Data

Raw data Extract Filter Human review Fann ready

• Extract data from raw source

• Remove any useless data from extract

• Apply some human review to filtered data

• Format data for FANN

Page 15: Machine learning in php   php con poland

Expert At Work// Test if the if is in a compressed format

// nie mowie po polsku

// There is a parser specified in `Parser::$KEYWORD_PARSERS`

// $result should exist, regardless of $_message

// TODO : fix this; var_dump($var);

// $a && $b and multidimensional

// numGlyphs + 1

//$annots .= ' /StructParent ';

// $cfg['Servers'][$i]['controlpass'] = 'pmapass';

// if(ob_get_clean()){

Page 16: Machine learning in php   php con poland

Input Vector

• 'length' : size of the comment

• 'countDollar' : number of $

• 'countEqual' : number of =

• 'countObjectOperator' number of -> operator ($o->p)

• 'countSemicolon' : number of semi-colon ;

Page 17: Machine learning in php   php con poland

Input Data

47 5 1 825 0 0 0 1 0 37 2 0 0 0 0 55 2 2 0 1 1 61 2 1 3 1 1 ...

Number Of Input Number Of Incoming Data Number Of Outgoing Data

 * (at your option) any later version.   *   * Exakat is distributed in the hope that it will be useful,   * but WITHOUT ANY WARRANTY; without even the implied warranty of   * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the   * GNU Affero General Public License for more details.   *   * You should have received a copy of the GNU Affero General Public License   * along with Exakat.  If not, see <http://www.gnu.org/licenses/>.   *   * The latest code can be found at <http://exakat.io/>.   *  */ 

// $x[3] or $x[] and multidimensional 

//if ($round == 3) { die('Round '.$round);} 

//$this->errors[] = $this->language->get('error_permission'); 

Page 18: Machine learning in php   php con poland

Black Magic

1 5 1 37 2 0 0 0 0

// $X[3] Or $X[] And Multidimensional

EXT/FANN

It's A Comment

Page 19: Machine learning in php   php con poland

Training<?php

$max_epochs         = 500000;  $desired_error      = 0.001; 

// the actual training if (fann_train_on_file($ann,                         'incoming.data',                         $max_epochs,                         $epochs_between_reports,                         $desired_error)) {         fann_save($ann, 'model.out');  } fann_destroy($ann);  ?>

Page 20: Machine learning in php   php con poland
Page 21: Machine learning in php   php con poland
Page 22: Machine learning in php   php con poland
Page 23: Machine learning in php   php con poland

Training

• 47 cases

• 5 characteristics

• 3 hidden neurons

• + 5 input + 1 output

• Duration : 5.711 s

Page 24: Machine learning in php   php con poland

Application

Historydata Training

ModelReal data Results

Page 25: Machine learning in php   php con poland

Application<?php  

$ann = fann_create_from_file('model.out');  

$comment = '//$gvars = $this->getGraphicVars();'; 

$input   = makeVector($comment);  $results = fann_run($ann, $input);  

if ($results[0] > 0.8) {       print "\"$comment\" -> $results[0] \n";   }  

?>

Page 26: Machine learning in php   php con poland

Results > 0.8

• Answer between 0 and 1

• Values ranges from -14 to 0,999

• The closer to 1, the safer. The closer to 0, the safer.

• Is this a percentage? Is this a carrots count ?

• It's a mix of counts…

Page 27: Machine learning in php   php con poland

Scores Distribution

- 1 6

- 1 2

- 8

- 4

0

6 0 . 0 0 0 0 0 0

7 0 . 0 0 0 0 0 0

8 0 . 0 0 0 0 0 0

9 0 . 0 0 0 0 0 0

1 0 0 . 0 0 0 0 0 0

Page 28: Machine learning in php   php con poland

Real Cases

• Tested on 14093 comments

• Duration 68.01ms

• Found 1960 issues (14%)

Page 29: Machine learning in php   php con poland

0.99999893 // $cfg['Servers'][$i]['controlhost'] = '';    

0.99999928 //$_SESSION['Import_message'] = $message->getDisplay();    

/* 0.99999928 if (defined('SESSIONUPLOAD')) {      // write sessionupload back into the loaded PMA session 

    $sessionupload = unserialize(SESSIONUPLOAD);      foreach ($sessionupload as $key => $value) {          $_SESSION[$key] = $value;      } 

    // remove session upload data that are not set anymore      foreach ($_SESSION as $key => $value) {          if (mb_substr($key, 0, mb_strlen(UPLOAD_PREFIX))              == UPLOAD_PREFIX              && ! isset($sessionupload[$key])          ) { 

Page 30: Machine learning in php   php con poland

0.98780382 //LEAD_OFFSET = (0xD800 - (0x10000 >> 10)) = 55232    

0.99361396 // We have server(s) => apply default configuration      0.98383027 // Duration = as configured    

0.99999928 // original -> translation mapping    

0.97590065 // = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in 

Page 31: Machine learning in php   php con poland

TRUE POSITIVE FALSE POSITIVE

TRUE NEGATIVE FALSE NEGATIVE

FOUND BY

FANN

(MACHINE

LEARNING)

TARGET (EXPERT WORK)

Page 32: Machine learning in php   php con poland

TRUE

POSITIVE

FALSE

POSITIVE

TRUE

NEGATIVE

FALSE

NEGATIVE

FOUND BY

FANN

TARGET

0.99999923

0.73295981

0.99999851

0.2104115

// $cfg['Servers'][$i]['table_coords'] = 'pma__table_coords';    

//(isset($attribs['height'])?$attribs['height']: 1);    

// if ($key != null) did not work for index "0"    

// the PASSWORD() function  

Page 33: Machine learning in php   php con poland

Results

• 1960 issues

• 50+% of false positive

• With an easy clean, 822 issues reported

• 14k comments, analyzed in 68 ms (367ms in PHP5)

• Total time of coding : 27 mins.

// = (   59 X 84   ) Mm  = (  2.32 X 3.31  ) In     /* Vim: Set Expandtab Sw=4 Ts=4 Sts=4: */

Page 34: Machine learning in php   php con poland

Learn Better, Not Harder

• Better training data

• Improve characteristics

• Configure the neural network

• Change algorithm

• Automate learning

• Update constantly Real data

Historydata

Training

Model Results

Retroaction

Page 35: Machine learning in php   php con poland

Better Training Data

• More data, more data, more data

• Varied situations, real case situations

• Include specific cases

• Experience is capital

• https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

Page 36: Machine learning in php   php con poland

Improve Characteristics

• Add new characteristics

• Remove the one that are less interesting

• Find the right set of characteristics

Page 37: Machine learning in php   php con poland

Network Configuration

• Input vector

• Intermediate neurons

• Activation function

• Output vector

0

5 0 0 0

1 0 0 0 0

1 5 0 0 0

2 0 0 0 0

1 2 3 4 5 6 7 8 9 1 0

1 layer 2 layers 3 layers 4 layers

Time Of Training (Ms)

Page 38: Machine learning in php   php con poland

Change Algorithm

• First add more data before changing algorithm

• Try cascade2 algorithm from FANN

• 0.6 => 0 found

• 0.5 => 2 found

• Not found by the first algorithm

• Ant colony, genetics algorithm, gravitational search, artificial immune, nie mowie po polsku, annealing, harmony search, interior point search, taboo search

Page 39: Machine learning in php   php con poland

Finding The Best

• Test with 2-4 layers10 neurons

• Measure results

0

2 2 5 0

4 5 0 0

6 7 5 0

9 0 0 0

1 2 3 4 5 6 7 8 9 1 0 11 1 2 1 3

1 layer 2 layers 3 layers 4 layers

Page 40: Machine learning in php   php con poland

Deep Learning

• Chaining the neural networks

• Translators, scorers, auto-encoders

• Unsupervised Learning

Page 41: Machine learning in php   php con poland

Other Tools

• PHP ext/fann

• Langage R

• https://github.com/kachkaev/php-r

• Scikit-learn

• https://github.com/scikit-learn/scikit-learn

• Mahout

• https://mahout.apache.org/

Page 42: Machine learning in php   php con poland

Conclusion

• Machine learning is about data, not code

• There are tools to use it with PHP

• Fast to try, easy results or fast fail

• Use it for complex problems, that accepts error

Page 43: Machine learning in php   php con poland

H T T P : / / W W W. E X A K AT. I O

@ E X A K AT

H T T P : / / W W W. S L I D E S H A R E . N E T / D S E G U Y /

P H P 7 . 1 P R E PA R AT I O N W O R K S H O P

D z i ę k i C z e m u