Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2

Download Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2

Post on 17-Dec-2015

217 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

<ul><li> Slide 1 </li> <li> Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2 Computer Science Columbia University LEARNING 06 </li> <li> Slide 2 </li> <li> Overview of the Talk Introduction to the Electricity Distribution Network of New York City What are we doing and why? Early solution using MartiRank, a boosting- like algorithm for ranking Current solution using Online learning Related projects </li> <li> Slide 3 </li> <li> Overview of the Talk Introduction to the Electricity Distribution Network of New York City What are we doing and why? Early solution using MartiRank, a boosting- like algorithm for ranking Current solution using Online learning Related projects </li> <li> Slide 4 </li> <li> The Electrical System 1. Generation2. Transmission 3. Primary Distribution 4. Secondary Distribution </li> <li> Slide 5 </li> <li> Electricity Distribution: Feeders </li> <li> Slide 6 </li> <li> Problem Distribution feeder failures result in automatic feeder shutdown called Open Autos or O/As O/As stress networks, control centers, and field crews O/As are expensive ($ millions annually) Proactive replacement is much cheaper and safer than reactive repair </li> <li> Slide 7 </li> <li> Our Solution: Machine Learning Leverage Con Edisons domain knowledge and resources Learn to rank feeders based on susceptibility to failure How? Assemble data Train model based on past data Re-rank frequently using model on current data </li> <li> Slide 8 </li> <li> New York City </li> <li> Slide 9 </li> <li> Some facts about feeders and failures About 950 feeders: 568 in Manhattan 164 in Brooklyn 115 in Queens 94 in the Bronx </li> <li> Slide 10 </li> <li> Some facts about feeders and failures About 60% of feeders failed at least once On average, feeders failed 4.4 times (between June 2005 and August 2006) </li> <li> Slide 11 </li> <li> Some facts about feeders and failures mostly 0-5 failures per day more in the summer strong seasonality effects </li> <li> Slide 12 </li> <li> Feeder data Static data Compositional/structural Electrical Dynamic data Outage history (updated daily) Load measurements (updated every 5 minutes) Roughly 200 attributes for each feeder New ones are still being added. </li> <li> Slide 13 </li> <li> Feeder Ranking Application Goal: rank feeders according to likelihood to failure (if high risk place near the top) Application needs to integrate all types of data Application needs to react and adapt to incoming dynamic data Hence, update feeder ranking every 15 min. </li> <li> Slide 14 </li> <li> Application Structure Static data SQL Server DB ML Engine ML Models Rankings Decision Support GUI Action Driver Action Tracker Decision Support App Outage data Xfmr Stress data Feeder Load data </li> <li> Slide 15 </li> <li> Goal: rank feeders according to likelihood to failure </li> <li> Slide 16 </li> <li> Overview of the Talk Introduction to the Electricity Distribution Network of New York City What are we doing and why? Early solution using MartiRank, a boosting-like algorithm for ranking Pseudo ROC and pseudo AUC MartiRank Performance metric Early results Current solution using Online learning Related projects </li> <li> Slide 17 </li> <li> (pseudo) ROC sorted by score 0 0 0 1 2 1 3 outagesfeeders </li> <li> Slide 18 </li> <li> (pseudo) ROC Number of feeders Number of outages 941 210 </li> <li> Slide 19 </li> <li> Fraction of outages (pseudo) ROC 1 1 Area under the ROC curve Fraction of feeders </li> <li> Slide 20 </li> <li> Some observations about the (p)ROC Adapted to positive labels (not just 0/1) Best pAUC is not always 1 (actually it almost never is..) E.g.: pAUC = 11/15 = 0.73 Best pAUC with this data is 14/15 = 0.93 corresponding to ranking 21000 11 20 32 40 50 ranking outages 12345 3 2 1 </li> <li> Slide 21 </li> <li> MartiRank Boosting-like algorithm by [Long &amp; Servedio, 2005] Greedy, maximizes pAUC at each round Adapted to ranking Weak learners are sorting rules Each attribute is a sorting rule Attributes are numerical only If categorical, then convert to indicator vector of 0/1 </li> <li> Slide 22 </li> <li> MartiRank feeder list begins in random order sort list by best variable divide list in two: split outages evenly divide list in three: split outages evenly choose separate best variables for each part, sort choose separate best variables for each part, sort continue </li> <li> Slide 23 </li> <li> MartiRank Advantages: Fast, easy to implement Interpretable Only 1 tuning parameter nr of rounds Disadvantages: 1 tuning parameter nr of rounds Was set to 4 manually.. </li> <li> Slide 24 </li> <li> Using MartiRank for real-time ranking of feeders MartiRank is a batch algorithm, hence must deal with changing system by: Continually generate new datasets with latest data Use data within a window, aggregate dynamic data within that period in various ways (quantiles, counts, sums, averages, etc.) Re-train new model, throw out old model Seasonality effects not taken into account Use newest model to generate ranking Must implement training strategies Re-train daily, or weekly, or every 2 weeks, or monthly, or </li> <li> Slide 25 </li> <li> Performance Metric Normalized average rank of failed feeders Closely related to (pseudo) Area-Under-ROC-Curve when labels are 0/1: avgRank = pAUC + 1 / #examples Essentially, difference comes from 0-based pAUC to 1-based ranks </li> <li> Slide 26 </li> <li> Performance Metric Example 10 21 31 40 51 60 70 80 rankingoutages 3 2 1 12345678 pAUC=17/24=0.7 </li> <li> Slide 27 </li> <li> How to measure performance over time Every ~15 minutes, generate new ranking based on current model and latest data Whenever there is a failure, look up its rank in the latest ranking before the failure After a whole day, compute normalized average rank </li> <li> Slide 28 </li> <li> MartiRank Comparison: training every 2 weeks </li> <li> Slide 29 </li> <li> Using MartiRank for real-time ranking of feeders MartiRank seems to work well, but.. User decides when to re-train User decides how much data to use for re-training . and other things like setting parameters, selecting algorithms, etc. Want to make system 100% automatic! Idea: Still use MartiRank since it works well with this data, but keep/re-use all models </li> <li> Slide 30 </li> <li> Overview of the Talk Introduction to the Electricity Distribution Network of New York City What are we doing and why? Early solution using MartiRank, a boosting-like algorithm for ranking Current solution using Online learning Overview of learning from expert advice and the Weighted Majority Algorithm New challenges in our setting and our solution Results Related projects </li> <li> Slide 31 </li> <li> Learning from expert advice Consider each model as an expert Each expert has associated weight (or score) Reward/penalize experts with good/bad predictions Weight is a measure of confidence in experts prediction Predict using weighted average of top- scoring experts </li> <li> Slide 32 </li> <li> Learning from expert advice Advantages Fully automatic No human intervention needed Adaptive Changes in system are learned as it runs Can use many types of underlying learning algorithms Good performance guarantees from learning theory: performance never too far off from best expert in hindsight Disadvantages Computational cost: need to track many models in parallel Models are harder to interpret </li> <li> Slide 33 </li> <li> Weighted Majority Algorithm [Littlestone &amp; Warmuth 88] Introduced for binary classification Experts make predictions in [0,1] Obtain losses in [0,1] Pseudocode: Learning rate as main parameter, in (0,1] There are N experts, initially weight is 1 for all For t=1,2,3, Predict using weighted average of each experts prediction Obtain true label; each expert incurs loss l i Update experts weights using w i,t+1 = w i,t pow(,l i ) </li> <li> Slide 34 </li> <li> In our case, cant use WM directly Use ranking as opposed to binary classification More importantly, do not have a fixed set of experts </li> <li> Slide 35 </li> <li> Dealing with ranking vs. binary classification Ranking loss as normalized average rank of failures as seen before, loss in [0,1] To combine rankings, use a weighted average of feeders ranks </li> <li> Slide 36 </li> <li> Dealing with a moving set of experts Introduce new parameters B: budget (max number of models) set to 100 p: new models weight percentile in [0,100] : age penalty in (0,1] When training new models, add to set of models with weight corresponding to p th percentile (among current weights) If too many models (more than B), drop models with poor q-score, where q i = w i pow( , age i ) I.e., is rate of exponential decay </li> <li> Slide 37 </li> <li> Other parameters How often do we train and add new models? Hand-tuned over the course of the summer Every 7 days Seems to achieve balance of generating new models to adapt to changing conditions without overflowing system Alternatively, one could train when observed performance drops.. not used yet How much data do we use to train models? Based on observed performance and early experiments 1 week worth of data, and 2 weeks worth of data </li> <li> Slide 38 </li> <li> Performance </li> <li> Slide 39 </li> <li> Failures rank distribution </li> <li> Slide 40 </li> <li> Daily average rank of failures </li> <li> Slide 41 </li> <li> Other things that I have not talked about but took a significant amount of time DATA Data is spread over many repositories. Difficult to identify useful data Difficult to arrange access to data Volume of data. Gigabytes of data accumulated on a daily basis. Required optimized database layout and the addition of a preprocessing stage Had to gain understanding of data semantics Software Engineering (this is a deployed application) </li> <li> Slide 42 </li> <li> Current Status Summer 2006: System has has been debugged, fine-tuned, tested and deployed Now fully operational Ready to be used next summer (in test mode) After this summer, were going to do systematic studies of Parameter sensitivity Comparisons to other approaches </li> <li> Slide 43 </li> <li> Related work-in-progress Online learning: Fancier weight updates with better guaranteed performance in changing environments Explore direct online ranking strategies (e.g. the ranking perceptron) Datamining project: Aims to exploit seasonality Learn mapping from environmental conditions to good performing experts characteristics When same conditions arise in the future, increase weights of experts that have those characteristics Hope to learn it as system runs, continually updating mappings MartiRank: In presence of repeated/missing values, sorting is non-deterministic and pAUC takes different values depending on permutation of data Use statistics of the pAUC to improve basic learning algorithm Instead of input nr of rounds, stop when AUC increase is not significant Use better estimators of pAUC that are not sensitive to permutations of the data </li> <li> Slide 44 </li> <li> Other related projects within collaboration with Con Edison Finer-grained component analysis Ranking of transformers Ranking of cable sections Ranking of cable joints Merging of all systems into one Mixing ML and Survival Analysis </li> <li> Slide 45 </li> <li> Acknowledgments Con Edison: Matthew Koenig Mark Mastrocinque William Fairechio John A. Johnson Serena Lee Charles Lawson Frank Doherty Arthur Kressner Matt Sniffen Elie Chebli George Murray Bill McGarrigle Van Nest team Columbia: CCLS: Wei Chu Martin Jansche Ansaf Salleb Albert Boulanger David Waltz Philip M. Long (now at Google) Roger Anderson Computer Science: Philip Gross Rocco Servedio Gail Kaiser Samit Jain John Ioannidis Sergey Sigelman Luis Alonso Joey Fortuna Chris Murphy Stats: Samantha Cook </li> </ul>