# Predicting Electricity Distribution Feeder Failures using Machine Learning

Post on 13-Jan-2016

32 views

DESCRIPTION

Predicting Electricity Distribution Feeder Failures using Machine Learning. Marta Arias 1 , Hila Becker 1,2 1 Center for Computational Learning Systems 2 Computer Science Columbia University LEARNING 06. Overview of the Talk. - PowerPoint PPT PresentationTRANSCRIPT

Predicting Electricity Distribution Feeder Failures using Machine LearningMarta Arias 1, Hila Becker 1,21Center for Computational Learning Systems2Computer ScienceColumbia University

LEARNING 06

Overview of the TalkIntroduction to the Electricity Distribution Network of New York CityWhat are we doing and why?Early solution using MartiRank, a boosting-like algorithm for rankingCurrent solution using Online learningRelated projects

Overview of the TalkIntroduction to the Electricity Distribution Network of New York CityWhat are we doing and why?Early solution using MartiRank, a boosting-like algorithm for rankingCurrent solution using Online learningRelated projects

The Electrical System

Electricity Distribution: Feeders

ProblemDistribution feeder failures result in automatic feeder shutdowncalled Open Autos or O/As O/As stress networks, control centers, and field crews O/As are expensive ($ millions annually)Proactive replacement is much cheaper and safer than reactive repair

Our Solution: Machine LearningLeverage Con Edisons domain knowledge and resourcesLearn to rank feeders based on susceptibility to failureHow?Assemble dataTrain model based on past dataRe-rank frequently using model on current data

New York City

Some facts about feeders and failuresAbout 950 feeders:568 in Manhattan164 in Brooklyn115 in Queens 94 in the Bronx

Some facts about feeders and failuresAbout 60% of feeders failed at least onceOn average, feeders failed 4.4 times (between June 2005 and August 2006)

Some facts about feeders and failuresmostly 0-5 failures per daymore in the summerstrong seasonality effects

Feeder dataStatic dataCompositional/structuralElectricalDynamic dataOutage history (updated daily)Load measurements (updated every 5 minutes)Roughly 200 attributes for each feederNew ones are still being added.

Feeder Ranking ApplicationGoal: rank feeders according to likelihood to failure (if high risk place near the top)Application needs to integrate all types of dataApplication needs to react and adapt to incoming dynamic dataHence, update feeder ranking every 15 min.

Application Structure

Goal: rank feeders according to likelihood to failure

Overview of the TalkIntroduction to the Electricity Distribution Network of New York CityWhat are we doing and why?Early solution using MartiRank, a boosting-like algorithm for rankingPseudo ROC and pseudo AUCMartiRankPerformance metricEarly resultsCurrent solution using Online learningRelated projects

(pseudo) ROCsorted by scoreoutagesfeeders

(pseudo) ROCNumber of feedersNumber of outages

941210

(pseudo) ROCFractionof outages

11Area underthe ROC curveFraction of feeders

Some observations about the (p)ROCAdapted to positive labels (not just 0/1)Best pAUC is not always 1 (actually it almost never is..)

E.g.: pAUC = 11/15 = 0.73Best pAUC with this data is 14/15 = 0.93 corresponding to ranking 21000

rankingoutages

1120324050

12345

321

MartiRankBoosting-like algorithm by [Long & Servedio, 2005]Greedy, maximizes pAUC at each roundAdapted to rankingWeak learners are sorting rulesEach attribute is a sorting ruleAttributes are numerical onlyIf categorical, then convert to indicator vector of 0/1

MartiRankfeeder list begins in random ordersort list by best variabledivide list in two: split outages evenlydivide list in three: split outages evenlychooseseparate bestvariables foreach part, sortchooseseparate bestvariables foreach part, sortcontinue

MartiRankAdvantages:Fast, easy to implementInterpretableOnly 1 tuning parameter nr of roundsDisadvantages:1 tuning parameter nr of roundsWas set to 4 manually..

Using MartiRank for real-time ranking of feedersMartiRank is a batch algorithm, hence must deal with changing system by:Continually generate new datasets with latest dataUse data within a window, aggregate dynamic data within that period in various ways (quantiles, counts, sums, averages, etc.)Re-train new model, throw out old modelSeasonality effects not taken into accountUse newest model to generate rankingMust implement training strategiesRe-train daily, or weekly, or every 2 weeks, or monthly, or

Performance MetricNormalized average rank of failed feedersClosely related to (pseudo) Area-Under-ROC-Curve when labels are 0/1:avgRank = pAUC + 1 / #examplesEssentially, difference comes from 0-based pAUC to 1-based ranks

Performance Metric ExamplerankingoutagespAUC=17/24=0.7

1021314051607080

32112345678

How to measure performance over timeEvery ~15 minutes, generate new ranking based on current model and latest dataWhenever there is a failure, look up its rank in the latest ranking before the failureAfter a whole day, compute normalized average rank

MartiRank Comparison: training every 2 weeks

Using MartiRank for real-time ranking of feedersMartiRank seems to work well, but..User decides when to re-trainUser decides how much data to use for re-training. and other things like setting parameters, selecting algorithms, etc.Want to make system 100% automatic!Idea:Still use MartiRank since it works well with this data, but keep/re-use all models

Overview of the TalkIntroduction to the Electricity Distribution Network of New York CityWhat are we doing and why?Early solution using MartiRank, a boosting-like algorithm for rankingCurrent solution using Online learningOverview of learning from expert advice and the Weighted Majority AlgorithmNew challenges in our setting and our solutionResultsRelated projects

Learning from expert adviceConsider each model as an expertEach expert has associated weight (or score)Reward/penalize experts with good/bad predictionsWeight is a measure of confidence in experts predictionPredict using weighted average of top-scoring experts

Learning from expert adviceAdvantagesFully automaticNo human intervention neededAdaptiveChanges in system are learned as it runsCan use many types of underlying learning algorithmsGood performance guarantees from learning theory: performance never too far off from best expert in hindsightDisadvantagesComputational cost: need to track many models in parallelModels are harder to interpret

Weighted Majority Algorithm [Littlestone & Warmuth 88]Introduced for binary classificationExperts make predictions in [0,1]Obtain losses in [0,1]Pseudocode:Learning rate as main parameter, in (0,1]There are N experts, initially weight is 1 for allFor t=1,2,3, Predict using weighted average of each experts predictionObtain true label; each expert incurs loss liUpdate experts weights using wi,t+1 = wi,t pow(,li)

In our case, cant use WM directlyUse ranking as opposed to binary classificationMore importantly, do not have a fixed set of experts

Dealing with ranking vs. binary classificationRanking loss as normalized average rank of failures as seen before, loss in [0,1]To combine rankings, use a weighted average of feeders ranks

Dealing with a moving set of expertsIntroduce new parametersB: budget (max number of models) set to 100p: new models weight percentile in [0,100]: age penalty in (0,1]When training new models, add to set of models with weight corresponding to pth percentile (among current weights)If too many models (more than B), drop models with poor q-score, whereqi = wi pow(, agei)I.e., is rate of exponential decay

Other parametersHow often do we train and add new models?Hand-tuned over the course of the summerEvery 7 daysSeems to achieve balance of generating new models to adapt to changing conditions without overflowing systemAlternatively, one could train when observed performance drops .. not used yetHow much data do we use to train models?Based on observed performance and early experiments1 week worth of data, and2 weeks worth of data

Performance

Failures rank distribution

Daily average rank of failures

Other things that I have not talked about but took a significant amount of timeDATAData is spread over many repositories. Difficult to identify useful data Difficult to arrange access to dataVolume of data.Gigabytes of data accumulated on a daily basis. Required optimized database layout and the addition of a preprocessing stageHad to gain understanding of data semanticsSoftware Engineering (this is a deployed application)

Current StatusSummer 2006: System has has been debugged, fine-tuned, tested and deployedNow fully operationalReady to be used next summer (in test mode)

After this summer, were going to do systematic studies ofParameter sensitivityComparisons to other approaches

Related work-in-progressOnline learning:Fancier weight updates with better guaranteed performance in changing environmentsExplore direct online ranking strategies (e.g. the ranking perceptron)Datamining project:Aims to exploit seasonalityLearn mapping from environmental conditions to good performing experts characteristicsWhen same conditions arise in the future, increase weights of experts that have those characteristicsHope to learn it as system runs, continually updating mappingsMartiRank:In presence of repeated/missing values, sorting is non-deterministic and pAUC takes different values depending on permutation of dataUse statistics of the pAUC to improve basic learning algorithmInstead of input nr of rounds, stop when AUC increase is not significantUse better estimators of pAUC that are not sensitive to permutations of the data

Other related projects within collaboration with Con EdisonFiner-grained component analysisRanking of transformersRanking of cable sectionsRanking of cable jointsMerging of all systems into oneMixing ML and Survival Analysis

AcknowledgmentsCon Edison:Matthew KoenigMark MastrocinqueWilliam FairechioJohn A. JohnsonSerena LeeCharles LawsonFrank DohertyArthur KressnerMatt SniffenElie ChebliGeorge MurrayBill McGarrigleVan Nest teamColumbia: CCLS:Wei ChuMartin JanscheAnsaf SallebAlbert BoulangerDavid WaltzPhilip M. Long (now at Google)Roger Anderson Computer Science:Philip GrossRocco ServedioGail KaiserSamit JainJohn IoannidisSergey SigelmanLuis AlonsoJoey FortunaChris MurphyStats:Samantha Cook

Generation: a prime mover, typically the force of water, steam, or hot gasses on a turbine, spins an electromagnet, generating large amounts of electrical current at a generating stationTransmission: the current is sent at very high voltage (hundreds of thousands of volts) from the generating station to substations closer to the customersPrimary Distribution: electricity is sent at mid-level voltage (tens of thousands of volts) from substations to local transformers, over cables called feeders, usually 10-20 km long, and with a few tens of transformers per feeder. Feeders are composed of many feeder sections connected by joints and splicesSecondary Distribution: sends electricity at normal household voltages from local transformers to individual customers Many occur in the summer when the load on the system increases, during heat waves When an O/A occurs, the load that had been carried by the failed feeder must shift to adjacent feeders, further stressing them. This can lead to a failure cascade in a distribution networkwith sufficient accuracy so that timely preventive maintenance can be taken on the right feeders at the right time.(e.g. age and composition of each feeder section) and e.g. electrical load data for a feeder and its transformers, accumulating at a rate of several hundred megabytes per day) Software engineering challenges to manage data

How to deal with impending failures that are corrected on the basis of our ranking Note decay of performanceNote that svms and linear regression do much worse

Recommended