cloudflare does not play fair

27
12/13/2015 CloudFlare does not play fair https://webcache.googleusercontent.com/search?sourceid=chromepsyapi2&ion=1&ie=UTF8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflareplay… 1/27 This is Google's cache of http://blog.random.io/cloudflareplaysunfair/ . It is a snapshot of the page as it appeared on Dec 11, 2015 02:12:31 GMT. The current page could have changed in the meantime. Learn more Full version Textonly version View source Tip: To quickly find your search term on this page, press Ctrl+F or F (Mac) and use the find bar. Home Subscribe CloudFlare does not play fair 10 December 2015 on cloudflare , sla , madness , support , operator from hell , insane , cloudflare is not fair , marketing win , customer fail , unfair , cloudflair unfair I've been using CloudFlare services for few years personally and about a year ago (at work) signed up for their $200/month Business plan, which comes with bold 100% SLA. The next plan up (their Enterprise offering) comes with outrageous 2500% SLA. Which is casualy explained as 5x customers, 5x traffic = 25x100% = 2500% SLA. Brilliant marketing, I've got to admit! Source: https://www.cloudflare.com/plans/ TL;DR CloudFlare refuses to issue credits for admitted 7 hour San Jose node/datacenter Outage to paying Businessplan customer and plays dumb despite selfimposed 100% SLA. Communicating with support is infuriating, as they demand error logs, RayID (would be present on error pages if origin is unavailable or other network error occurs), traceroutes, screenshots, trace from "domain.com/cdncgi/trace" from user side.. They explain that it's all a standard procedure. Even though our site is build with AngularJS and bunch of AJAX requests talking to an API. Not only user would not see errors from failed AJAX requests, but you've got to be insane to suggest that I'd contact everyone in SF/SJ area, ask when if they visited our site, had an issue, get a screenshot, error response and traceroute from them, which whey most certainly saved expecting my phone call.. So that we can get an SLA credit. ... Every attempt to explain that to CF Support was met with absolute refusal to hear what I'm saying, misquoting, misdirection and various "helpful" suggestions like "to capture CFRAY headers in your Apache logs follow these instructions.." When I point out that CloudFlare itself would have the data and, come to think of it, should report a tally of errors served per domain, then proactivly issue SLA credits without solicitation I'm told that I can have access to raw logs with Enterprise level subscription. And I need to provide information they request so that they know what to look for, despite knowing exactly what we'd be looking for. And they don't keep logs passed 48 hours, it's too late anyways. See, since CloudFlare serves error pages with diag data in the event of network falues, assuming error serving infrastructure is not down itself, all relevant data could and should be captured either offline in

Upload: thor

Post on 10-Apr-2016

23 views

Category:

Documents


1 download

DESCRIPTION

Guy complains about CloudFlare for no good reason then deletes reddit and blog post:https://pay.reddit.com/r/sysadmin/comments/3waut6/cloudflair_plays_unfair_story_of_outage_outrage/

TRANSCRIPT

Page 1: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­play… 1/27

This is Google's cache of http://blog.random.io/cloudflare­plays­unfair/. It is a snapshot of the page as itappeared on Dec 11, 2015 02:12:31 GMT.The current page could have changed in the meantime. Learn more

Full version Text­only version View source

Tip: To quickly find your search term on this page, press Ctrl+F or ­F (Mac) and use the find bar.

Home Subscribe

CloudFlare does not play fair

10 December 2015 on cloudflare, sla, madness, support, operator from hell, insane, cloudflare is not fair,marketing win, customer fail, unfair, cloudflair unfair

I've been using CloudFlare services for few years personally and about a year ago (at work) signed up fortheir $200/month Business plan, which comes with bold 100% SLA. The next plan up (their Enterpriseoffering) comes with outrageous 2500% SLA. Which is casualy explained as 5x customers, 5x traffic =25x100% = 2500% SLA. Brilliant marketing, I've got to admit! Source: https://www.cloudflare.com/plans/

TL;DR

CloudFlare refuses to issue credits for admitted 7 hour San Jose node/datacenter Outage to payingBusiness­plan customer and plays dumb despite self­imposed 100% SLA.

Communicating with support is infuriating, as they demand error logs, RayID (would be present on errorpages if origin is unavailable or other network error occurs), traceroutes, screenshots, trace from"domain.com/cdn­cgi/trace" from user side.. They explain that it's all a standard procedure. Even though our site is build with AngularJS and bunch of AJAX requests talking to an API. Not only userwould not see errors from failed AJAX requests, but you've got to be insane to suggest that I'd contacteveryone in SF/SJ area, ask when if they visited our site, had an issue, get a screenshot, errorresponse and traceroute from them, which whey most certainly saved expecting my phone call.. Sothat we can get an SLA credit.

...

Every attempt to explain that to CF Support was met with absolute refusal to hear what I'm saying,misquoting, misdirection and various "helpful" suggestions like "to capture CF­RAY headers in yourApache logs follow these instructions.." When I point out that CloudFlare itself would have the data and, come to think of it, should report a tallyof errors served per domain, then proactivly issue SLA credits without solicitation ­ I'm told that I canhave access to raw logs with Enterprise level subscription. And I need to provide information they request so that they know what to look for, despite knowing exactlywhat we'd be looking for. And they don't keep logs passed 48 hours, it's too late anyways.

See, since CloudFlare serves error pages with diag data in the event of network falues, assuming error­serving infrastructure is not down itself, all relevant data could and should be captured either offline in

Page 2: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­play… 2/27

log post­processing or asynchronously as errors are being served, per domain, later to be aggregatedby customers as well. I am extremely surprised that it's not yet done. And if it is done, at this point not at allsurprised it's not presented and used properly.

After countless interations of "we need data" and number of "this is our final answer" and "no, we cannotescalate this any further", I get a "one­time exception.. $10 credit". Brilliant! Considering the $1600+ thatwe've paid so far over the course of the year and user acquisition cost marketing pays is 2­10x higherthis shouldn't been offered withing first few rounds. Obviously, $10 wouldn't make much difference in ourbudget. But it would make a tremendous differense in my opinion of techically brilliant company, who'sservices I've sold internally. How impressive it would be to get "no bullshit" ((c) Gandi.net) response toSLA credit request. Was it worth it for CloudFlare in Support Agent's time?

Actual SLA credit value

10 cents of prorated downtime is what I come up with based on $200/month bill and some very genereousand crude assumptions. If 5% of traffic hits get 1% of downtime per month, 0.0005% of all requests would fail. 5% traffic assumptions comes from counting US­only traffic, where West Coast visitors are responsible for25% of all US traffc, and San Jose / San Francisco area node would handle 25% of that (CF node locationsmap). If traffic would be distributed evenly across the geos.

Using CloudFlare's formula, and I came up with ridiculous $0.0005 of SLA credit: $0.0005 = (420 minutes * 0.05) ÷ (43800 minutes in a month - 0 planned or force majeuredowntime)

Service Credit = (Outage Period minutes * Affected Customer Ratio) ÷ Scheduled Availabilityminutes. Source: https://www.cloudflare.com/business­sla/

Scheduled Availability is the total number of minutes in the month minus any CustomerPlanned Downtime, and downtime caused by Force Majeure. Source:https://www.cloudflare.com/__mesa/

All aboard the coo­coo train! What use is 100% SLA if even during an admitted Outage, customer needs to provide an unreasonableproof. Might as well say you've got a 2500% SLA. Oh, wait! You do!

marketing #win customer support #fail

Entire conversation with CloudFlare

Anastas Dancha November 18, 2015 15:55

Hello,

Some of our customers on the west coast reported an issue accessing out application around8am this morning, Later, we noticed San Jose location outage reported on

Page 3: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­play… 3/27

CloudFlareStatus.com Should we be expecting a refund/credit applied to our account for this 7hour outage?

Regards, Anastas Dancha

Lyn November 19, 2015 13:25

Hello, thank you for contacting CloudFlare.

Do you have any more details on how this reported issue would have affected the application?ie what was the application? The website?

Thanks for any details so we can check if the two were related and if any credits should beapplied.

Investigating Network Related Issues in San Jose Resolved ­ This incident has been resolved.Nov 18, 15:37 UTC Monitoring ­ A fix has been implemented and we are monitoring the results.Nov 18, 15:20 UTC Investigating ­ investigating issues in San Jose Nov 18, 08:12 UTC

Regards

Anastas Dancha November 19, 2015 17:36

Few users reported errors accessing application, where an API call would no go though,resulting in "blank" graph. Which otherwise would display portfolio comparison with sponsoredmutual fund / portfolio, driving user to fund page or our main product page, hopefully resulting innew sign­up.

Jameson S. November 23, 2015 16:45

This outage would have been resulting in 52X errors, not blank pages. Do you have any ray IDsor logs that would associate this issue with your specific issue? I am not seeing anythingcorrelating a relationship here.

Kind Regards, Jameson | TSE | CloudFlare

Anastas Dancha November 24, 2015 17:17

As per outage's description, it would cause intermittent connectivity issues. We have an

Page 4: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­play… 4/27

AngularJS App that makes AJAX calls to remote API. The responses are not exposed to theuser. Hence, no RayID would be visible to the end­user. When certain requests fail, out app issometimes able to return a user friendly message. While other failed requests will produce ablank graph or a page with missing elements, as was the case with the customer reporting it.For every customer reporting the issue, there are hundred who are not reporting it. (not a fact,research required). Site issues affect potential clients and customers, business partners, futureemployees coming to check out the marketing/corporate site, etc.. Do I really need to make thiscase? CloudFlare admittedly had an Outage. Not just a five minute outage, but a 7 hoursOutage.. Okay, things happen, nothing/no­one is perfect. CloudFlare is also advertising it'scommitments to 100% SLA. Deal with it. Do the right thing..

Jameson S. November 27, 2015 21:07

We're happy to do the right thing, but we don't have any correlating ray IDs or evidences fromyour own application logs to indicate this was a related event. Happy to help, but need moredata than speculation before we can be sure your errors were due to CloudFlare connectivity.Your application doesn't perform intelligent error logging?

Kind Regards, Jameson | TSE | CloudFlare

Anastas Dancha November 30, 2015 18:03

I'll see what I can find.. But again, it's a frontend application running in user's browser. There isonly so much you can do when browser­side AngularJS talks to an API and that API isunavailable due to network issue, there is no backend event ­ I hope you can see how thatmight happen, despite "intelligent error logging". It's tricky to show unreceived traffic. Especiallywhen a single poor impression and person making a decision on whether to integrate with ourproduct or not might say "Fuck it, their shit is not working". The attached Google Analyticsscreenshot showing comparison of the Day of the Outage vs same day last week is the best Ican do. cl.png (60 KB)

Martijn Gonlag December 01, 2015 01:49

Hi Anastas,

None of this information helps us identify the root cause of the issue, and whether it was relatedto an outage on our services.

Page 5: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­play… 5/27

As we noted, we're happy to do the right thing, but without any information that indicates thiswas indeed related to our location we cannot offer any SLA credit. Please note that we havemany different locations throughout the world, so only a portion of visitors that hit the SJC datacenter would have been affected due to any issues on our end, but from the sound of it, thiswas global.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha December 02, 2015 17:37

You are asking to produce something impossible. CloudFlare does not (as far as I know /please correct me if I'm wrong) provide list of locations with IPs an a way to make a request toour CL enabled site via specific CL location... then how am I suppose to monitor all thelocations? I don't see how anyone of our size would have a system that polls the site frommultiple geos, captures the entire response and makes results available. Basically, you aretelling me that we need to pay for watchmen service.. yet, who watches the watchmen. But hey,why not say that you got 2500% uptime guarantee.. oh, wait.. you actually do!

Martijn Gonlag Saturday at 22:20

Hi Anastas,

While you are technically correct that our IP addresses are not unique per location due to thefact that we use Anycast, we do offer other ways to track this. More specifically, you would wantto add the CF­RAY header to your logging schema to track what locations requests are comingfrom:

Adding the CF­RAY header to your logs

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Monday at 01:06

Great suggesting. However it would not be possible to capture the logs of requests that are notmaking to our servers that easily. Since those RayID errors would be returned to the users. Isuppose, if CloudFlare self­report how many times error pages were served on our domain, wewould now. Come to think of it, CF is the best position to answer the question of the about ofcustomer impact of any outage with some per domain error stats. Assuming the error pages

Page 6: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­play… 6/27

serving infrastructure itself if operational. I don't know why we didn't talk about you digging theproof of how insignificant of the impact this outage had on our traffic. The again, a single poorexperience for the right person for the company of our size could be immeasurable.

­Anastas

Anastas Dancha Monday at 01:09

Pardon the typos. I can clarify if something is too ambiguous.

Martijn Gonlag Monday at 16:41

Hi Anastas,

We do not keep access or error logs for customers other than for those at the Enterprise level.While we can dig through our error logs, we will need to know what we're looking for to identify ­for which you didn't provide any information.

If this is data that you absolutely need, you may want to consider upgrading, but afraid there isnothing else we can do for you at this time without actual information that indicates any issues.

Please note that we would display specific errors to your customers if this was caused by ourend, and to this you specifically said they were seeing whitepages.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Tuesday at 10:20

The whole reason I'm mentioning the logs, is because you seem to think that we ought to havelogs for requests we didn't get. It is not "the data I absolutely need", it is what you unreasonablydemand of us as proof of out sites being affected by the Outage. As for the data you'd belooking for in your logs, any 500 class errors for our main domain ­ *.finmason.com, obviously.Noted that CF would return errors, yet they won't necessarily be displayed to out users. And tothat I've specifically said, we had user reports with "blank graphs" or "missing elements", not a"whitepages" as you suggest ­ an enormously huge difference.

I understand you have many tickets and it might be hard to follow the conversation when haveto context switch as much as you do. However, you misleadingly quoted me as saying

Page 7: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­play… 7/27

something I did not, making it look like it's done on purpose, which it might very well not beintentional.

Please escalate this ticket to next­in­command. We're just spinning our wheels here and notgoing anywhere.

Main points: 1. CF admits to 7 hour outage at one of the locations. 2. CF demands logs showingfailed requests and/or RayIDs from error pages 3. How are we I expected to provide logs forrequests that are never forwarded to our servers due to Outage? It's a Catch­22 kind of aproblem.

Martijn Gonlag Tuesday at 19:43

Hi Anastas,

I'm sorry to hear you're unsatisfied with the handling of this ticket, but I am afraid we havealready given our final answer, and we will not be able to provide any SLA credit as we cannotclearly show that this was solely caused by the outage in SJC. Please note that we have over60 data centers world wide ­ unless your visitors were all in the San Jose area, they would hitmultiple data centers closer to their own region. One single data center being offline would notcause your entire website to be unavailable, which is why we have asked ­again and again­ foryou to provide more information or anything that can point us at this. If you're looking to keeplogs on CloudFlare's end, this is something we offer as part of our Enterprise plan, which wouldgive you full control over all traffic passing through our edge whether it be errors or access logs.But we do not have this available at lower tiers.

Best Regards,

Martijn Gonlag CloudFlare | Support Engineer

Martijn Gonlag Tuesday at 19:44

To note ­ per my previous reply ­ we also do not have the ability to go back in time to when theincident occurred, so I have no way to check the error logs for that zone. Clearly, this issomething that should have been done, and I will ensure that moving forward our support teamreviews error logs when these types of requests come through.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Page 8: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­play… 8/27

Anastas Dancha Yesterday at 17:42

Yes, out visitors are not all concentrated in San Jose area. We are however targeting USmarket and are, in fact, unable to offer our services to non­US users. The only reason I've evennoticed the fact of the Outage, is due to the customer complain, received by our CEO. We are astartup company with limited users, so it's important for us to deliver great experience to everycustomer. Our CEO forwarded the complaint to me for investigation and while looking into it,I've found out about CF Outage.. I cannot provide the logs of failed requests, no one besidesCF could. I've found logs of successful requests that user (who complained) was generating..See attached. Again, I request to be escalated to the manager. cf_sj_user.log (2 KB)

Martijn Gonlag Yesterday at 17:52

Anastas,

We can go back and forth, but we have already given you an answer, two of which have comefrom our senior support engineers. We will not be able to provide an SLA because we wereunable to determine the cause of the issues with your customers. Please note that you yourselfhave stated that the error they were seeing was a white page ­­ this would not happen ifCloudFlare had an outage, as you would instead be seeing a CloudFlare branded error (52Xmost likely). There is no other escalation path available in this case, and our decision is final.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Martijn Gonlag Yesterday at 18:01

Anastas, out of curiosity, what time zone are you in?

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Yesterday at 18:12

There is no need to guess what I've said. And there is no need to keep misquoting me.

Blank graph does not equal blank page. Missing elements also does not equal blank page. SLAguarantees mean jack when you refuse to be reasonable and demand proof that is virtuallyunobtainable, while yourself being in the position to retain, report and present any outage thatwould trigger SLA related credit proactively. Let me reiterate: CloudFlare should do it

Page 9: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­play… 9/27

proactively and automatically, without customers having to be subjected to procedural abuse.Instead of having you paid customer of 1+ years on highest non­enterprise plan have thisinfuriating discussion.

You've got a great marketing department. And generally, outrageous marketing claims tend tobite one's ass, when engineering, business and/or support sides are unable or unwilling. If youinsist on resolving this case as before, I need you to provide explicit and official documentationon submitting SLA claims, that indicate precisely what proof must be submitted.

Anastas Dancha

Martijn Gonlag Yesterday at 18:30

Please note that we're not trying to work against you in any way, but normally we can onlyprovide an SLA credit when there is clear evidence that an issue was caused by an outage onour network.

Please find all information related to the SLA that comes with your Business plan here:https://www.cloudflare.com/business­sla/ Specifically, this is the part that you'll want to review:

Service Credit Claims.

3.1 Company provides this SLA subject to the following terms.

3.2 In order to be eligible to submit a Claim with respect to any Incident, the Customer must firsthave notified Customer Support of the Incident, using the procedures set forth by Company,within five business days following the Incident.

3.3 To submit a Claim, Customer must contact Customer Support and provide notice of itsintention to submit a Claim. Customer must provide to Customer Support all reasonable detailsregarding the Claim, including but not limited to, detailed descriptions of the Incident(s), theduration of the Incident, network traceroutes, the URL(s) affected and any attempts made byCustomer to resolve the Incident.

3.4 In order for Company to consider a Claim, Customer must submit the Claim, includingsufficient evidence to support the Claim, by the end of the billing month following the billingmonth in which the Incident which is the subject of the Claim occurs.

3.5 Company will use all information reasonably available to it to validate Claims and make a

Page 10: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 10/27

good faith judgment on whether the SLA and Service Levels apply to the Claim.

For future cases when you experience technical difficulties that you think are the fault ofsomething on our network, it would be useful to include:

Output from having the customer visit finmason.com/cdn­cgi/trace Traceroute from affectedcustomer to finmason.com Screenshots of any errors that were observed, and preferably a copyof the RayID associated with the error That said, I have reviewed this ticket once more, and Ihave to admit that we could have handled it better from the start to better assess what occurredhere. Therefore, I will make a one­time exception and grant an SLA credit for the full duration ofthe outage that will be used towards your next payment automatically.

The credit applied to your account totals $10 which is more than we typically grant per thecalculation below.

Service Credit = (Outage Period minutes * Affected Customer Ratio) ÷ Scheduled Availabilityminutes

Please let me know if I can be of any further assistance on this matter at this time.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Yesterday at 19:17

I refuse your SLA credit, unless you admit that your request to provide logs for failed requestwas unreasonable. Not unless you'll work to put "proactive SLA credit" feature on the roadmap.Not unless you realise how absurd is the though of providing the "traceroute from affectedcustomer"; how completely insane is an expectation of "output from having customer visitfinmason.com/cdn­cgi/trace" ­ when we are talking about web traffic from user browsers in theage of AngularJS and AJAX requests. It's completely bonkers!

You got the realise it soon! Trust me, it's for the best.

I suppose, someone must've though that using "reasonable" in the SLA terms is a nice trick andallows for an endless tautology with Catch­22 style demands. No! You can't operate like that."Reasonable" means as deemed by the majority of industry leaders. And I assure you, majoritywould agree with me.

By the way, by my calculations, Nov 18th San Jose Outage related SLA credit should be around

Page 11: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 11/27

$0.10 (you can read all about it in an upcoming article).

This ticket should've been resolved with support's first response, with that same $10 credit. AndI would be talking to my colleagues and peers about brilliant service, that I've sold to our bossover a year ago. Not just technically brilliant, but customer­centric "no bullshit" ((c) Gandi) kindof service. Consider $1600+ paid by us and the lightness of our traffic.. $10 is 5 to 10 times lessthan your marketing spends on paid user acquisition.

Think about this, pass it to your manager, bring it to the meeting, use it in training material.Worst thing you can do is to close this ticket, check out and learn nothing from this.

Sincerely, Anastas Dancha

Martijn Gonlag Yesterday at 19:28

Anastas,

The information I requested is what we ask from all our customers, including Enterprisecustomers. If you're unwilling to provide this, and any other information, then you risk that wecannot properly assist and provide an SLA credit, per the agreement you signed when youbecame a customer.

Please note that I have already stated on multiple occasions that the ticket wasn't handled right,and for that I do apologize. I have also noted that we can no longer retrieve the logs due to thetime that has passed since the incident, which prevented me from looking up this informationwhen I took over the ticket.

While I can understand you feel that this should have been resolved on the first response, wedo require information from you as a customer, or from your visitors. We don't do magic, and wecannot see a lot of the things you appear to be assuming we can. For example, we only keeperror logs for a ~48 hour period. We do not keep any access logs at all, except for at theEnterprise plan. This makes it extremely hard for us to identify issues unless we know whatwe're looking for with your help.

I will now close this ticket as there is no longer any point to go back and forth. You have thecredit you asked for ­ and I have increased the credit because I do agree with you that the ticketwas not handled appropriately.

Best Regards,

Page 12: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 12/27

Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Yesterday at 19:43

Quote:

If you're unwilling to provide this, and any other information, then you risk thatwe cannot properly assist and provide an SLA credit, per the agreement yousigned when you became a customer. I'm not unwilling! It's impossible! For Imight not even know who are our customers when they cannot load the page..But it's pointless, you're right.

Sadly, you don't seem to get it. For the record, I refuse your $10 SLA credit with sameconditions as stated.

Feel free to close this ticket, I won't respond triggering it to reopen anymore.

Regards, Anastas D

Martijn Gonlag Yesterday at 22:17

Anastas,

Please understand that I am not disputing that the ticket was handled poorly in any way. I agreewith you 100%, and I have left internal feedback to prevent this from happening in the future.This is also not how most of our customers experience support interactions, so I apologize thatwe left a bad taste, and can ensure you that we strive to provide quality support to all ourcustomers.

However, hope you can appreciate that not all issues are as straight forward to troubleshoot,especially when dealing with network related problems. This means that sometimes we have toask you, or your customers, for additional information so that we can better troubleshoot theissue. During incidents, this also allows us to determine whether the issue is related to theincident, or otherwise. Without this information we're guessing at best, because error logs aloneare, in most cases, not enough to resolve issues with.

As an example, customers may be experiencing a network issue, but only on one single ISP.We would only be able to determine this with information from the customer, which wouldinclude a traceroute or MTR that may show is the faulty routes. In addition to that, the CDN

Page 13: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 13/27

trace shows us what specific location and metal that was hit, as well as the client IP in case wehave to run any tests back to the client from our network.

This is part of our standard troubleshooting, so we appreciate if this type of information can beincluded in requests to ensure we can swiftly resolve issues. We ask this of all our customers,from Free, to Enterprise, and is not specific to just your case or us trying to dodge blame. Ifthere is an issue with our servers, we're more than happy to acknowledge. But only if we are100% certain it is.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Today at 14:16

Please understand that I am not disputing that the ticket was handled poorly inany way. I agree with you 100%, and I have left internal feedback to preventthis from happening in the future.

Yes, it was handled poorly, by every team member and every time you've provided a "finalanswer" or suggested that I'm not providing data you oh so reasonably request because I'munwilling. I can't tell which part of what I said are you agreeing with? Any chance it's the partwhere I ask you to consider how you have a specific 7 hour Outage and expect me to providelogs showing you traceroutes from our affected customers? It's not like I can call every personin San Francisco and ask them: "Any chance you've visited our site Nov. 18th between thehours of 8am and 3:30pm UTC? Can I have some screenshots, traceroutes and serverresponses for AJAX requests our AngularJS web­app made that you've most certainly savedexpecting my call?".

This is also not how most of our customers experience support interactions, so Iapologize that we left a bad taste, and can ensure you that we strive to providequality support to all our customers.

Our entire conversation paints a picture of institutional tactic of evasive maneuvering, whereeach support member participating in conversation followed, what appears to be, an approvedprocedure of misdirection to resolve these types of requests.

However, hope you can appreciate that not all issues are as straight forward totroubleshoot, especially when dealing with network related problems. Thismeans that sometimes we have to ask you, or your customers, for additional

Page 14: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 14/27

information so that we can better troubleshoot the issue. During incidents, thisalso allows us to determine whether the issue is related to the incident, orotherwise. Without this information we're guessing at best, because error logsalone are, in most cases, not enough to resolve issues with.

As someone who's been in the industry for some time now, I'm aware of the difficulties introubleshooting peculiar network issues; where only by having access to detailed diagnosticsoutput might help determine the root cause. Networks are hard. Combine that withconfiguration, software, and OS tweak ­ it's a cat farm. Not in this case though. Cause is not amystery here. CloudFlare is in the best position to collect all necessary information during anoutage. Keeping per­domain tally of served error pages, either derived from logs in post­processing or with asynchronous "bean­counter" ­ is the only way to determine per­customerper­domain impact of any outage.. and issue unsolicited SLA credits, proactively.

As an example, customers may be experiencing a network issue, but only onone single ISP. We would only be able to determine this with information fromthe customer, which would include a traceroute or MTR that may show is thefaulty routes. In addition to that, the CDN trace shows us what specific locationand metal that was hit, as well as the client IP in case we have to run any testsback to the client from our network.

I appreciate how reasonable this (and other) sentence looks in a vacuum. And how it's acomplete misdirection, for this example is not applicable here. We are not talking about someobscure network issue I might or might not have discovered. We are discussing a self­imposed100% SLA obligation in the context of a specific and admitted 7 hour Outage that affected majormetropolitan area; arguably, the most important geo for startups and high­tech. And unlike otheroutages, I see no mention of "traffic have been redirected via alternative location".

This is part of our standard troubleshooting, so we appreciate if this type ofinformation can be included in requests to ensure we can swiftly resolve issues.We ask this of all our customers, from Free, to Enterprise, and is not specific tojust your case or us trying to dodge blame. If there is an issue with our servers,we're more than happy to acknowledge. But only if we are 100% certain it is.

Having a standard, doesn't make it right. And this bring me back to my point of "it cannot beallowed to remain a standard". When Amazon or Digital Ocean is having an outage with 1 of100 machines affected, they don't require a proof showing how users hit that server and itresulted in an error. There are some differences between paying hourly/month for VPS versus

Page 15: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 15/27

view raw

CDN, granted. But in the context of an Outage, it's the same. Whether I had 1 user hit the siteand get an error or 1k, or 1m. Downtime is downtime. We have a 100% certainty in the fact thatthere was no mention of root cause or any details explaining the nature of the outage, whatservices were affected. There is no way to sign up for alerts, as far as I know. Only limitedfunctionality StatusPage.io that's dry on details. Did you know that StatusPage.io can becustomized to allow users to subscribe to updates via email, SMS, Twitter, iCalendar (!),webhook? That functionality is disabled on CloudFlareStatus.com. I ponder the reasons.

cloudflare_nov18_outage_sla_madness.md hosted with by GitHub

Anastas Dancha November 18, 2015 15:55

Hello,

Some of our customers on the west coast reported an issue accessing out application around8am this morning, Later, we noticed San Jose location outage reported onCloudFlareStatus.com Should we be expecting a refund/credit applied to our account for this 7hour outage?

Regards, Anastas Dancha

Lyn November 19, 2015 13:25

Hello, thank you for contacting CloudFlare.

Do you have any more details on how this reported issue would have affected the application?ie what was the application? The website?

Thanks for any details so we can check if the two were related and if any credits should beapplied.

Investigating Network Related Issues in San Jose Resolved ­ This incident has been resolved.Nov 18, 15:37 UTC Monitoring ­ A fix has been implemented and we are monitoring the results.Nov 18, 15:20 UTC Investigating ­ investigating issues in San Jose Nov 18, 08:12 UTC

Regards

Anastas Dancha November 19, 2015 17:36

Page 16: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 16/27

Few users reported errors accessing application, where an API call would no go though,resulting in "blank" graph. Which otherwise would display portfolio comparison with sponsoredmutual fund / portfolio, driving user to fund page or our main product page, hopefully resulting innew sign­up.

Jameson S. November 23, 2015 16:45

This outage would have been resulting in 52X errors, not blank pages. Do you have any ray IDsor logs that would associate this issue with your specific issue? I am not seeing anythingcorrelating a relationship here.

Kind Regards, Jameson | TSE | CloudFlare

Anastas Dancha November 24, 2015 17:17

As per outage's description, it would cause intermittent connectivity issues. We have anAngularJS App that makes AJAX calls to remote API. The responses are not exposed to theuser. Hence, no RayID would be visible to the end­user. When certain requests fail, out app issometimes able to return a user friendly message. While other failed requests will produce ablank graph or a page with missing elements, as was the case with the customer reporting it.For every customer reporting the issue, there are hundred who are not reporting it. (not a fact,research required). Site issues affect potential clients and customers, business partners, futureemployees coming to check out the marketing/corporate site, etc.. Do I really need to make thiscase? CloudFlare admittedly had an Outage. Not just a five minute outage, but a 7 hoursOutage.. Okay, things happen, nothing/no­one is perfect. CloudFlare is also advertising it'scommitments to 100% SLA. Deal with it. Do the right thing..

Jameson S. November 27, 2015 21:07

We're happy to do the right thing, but we don't have any correlating ray IDs or evidences fromyour own application logs to indicate this was a related event. Happy to help, but need moredata than speculation before we can be sure your errors were due to CloudFlare connectivity.Your application doesn't perform intelligent error logging?

Kind Regards, Jameson | TSE | CloudFlare

Anastas Dancha November 30, 2015 18:03

Page 17: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 17/27

I'll see what I can find.. But again, it's a frontend application running in user's browser. There isonly so much you can do when browser­side AngularJS talks to an API and that API isunavailable due to network issue, there is no backend event ­ I hope you can see how thatmight happen, despite "intelligent error logging". It's tricky to show unreceived traffic. Especiallywhen a single poor impression and person making a decision on whether to integrate with ourproduct or not might say "Fuck it, their shit is not working". The attached Google Analyticsscreenshot showing comparison of the Day of the Outage vs same day last week is the best Ican do. cl.png (60 KB)

Martijn Gonlag December 01, 2015 01:49

Hi Anastas,

None of this information helps us identify the root cause of the issue, and whether it was relatedto an outage on our services.

As we noted, we're happy to do the right thing, but without any information that indicates thiswas indeed related to our location we cannot offer any SLA credit. Please note that we havemany different locations throughout the world, so only a portion of visitors that hit the SJC datacenter would have been affected due to any issues on our end, but from the sound of it, thiswas global.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha December 02, 2015 17:37

You are asking to produce something impossible. CloudFlare does not (as far as I know /please correct me if I'm wrong) provide list of locations with IPs an a way to make a request toour CL enabled site via specific CL location... then how am I suppose to monitor all thelocations? I don't see how anyone of our size would have a system that polls the site frommultiple geos, captures the entire response and makes results available. Basically, you aretelling me that we need to pay for watchmen service.. yet, who watches the watchmen. But hey,why not say that you got 2500% uptime guarantee.. oh, wait.. you actually do!

Martijn Gonlag Saturday at 22:20

Hi Anastas,

Page 18: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 18/27

While you are technically correct that our IP addresses are not unique per location due to thefact that we use Anycast, we do offer other ways to track this. More specifically, you would wantto add the CF­RAY header to your logging schema to track what locations requests are comingfrom:

Adding the CF­RAY header to your logs

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Monday at 01:06

Great suggesting. However it would not be possible to capture the logs of requests that are notmaking to our servers that easily. Since those RayID errors would be returned to the users. Isuppose, if CloudFlare self­report how many times error pages were served on our domain, wewould now. Come to think of it, CF is the best position to answer the question of the about ofcustomer impact of any outage with some per domain error stats. Assuming the error pagesserving infrastructure itself if operational. I don't know why we didn't talk about you digging theproof of how insignificant of the impact this outage had on our traffic. The again, a single poorexperience for the right person for the company of our size could be immeasurable.

­Anastas

Anastas Dancha Monday at 01:09

Pardon the typos. I can clarify if something is too ambiguous.

Martijn Gonlag Monday at 16:41

Hi Anastas,

We do not keep access or error logs for customers other than for those at the Enterprise level.While we can dig through our error logs, we will need to know what we're looking for to identify ­for which you didn't provide any information.

If this is data that you absolutely need, you may want to consider upgrading, but afraid there isnothing else we can do for you at this time without actual information that indicates any issues.

Please note that we would display specific errors to your customers if this was caused by our

Page 19: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 19/27

end, and to this you specifically said they were seeing whitepages.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Tuesday at 10:20

The whole reason I'm mentioning the logs, is because you seem to think that we ought to havelogs for requests we didn't get. It is not "the data I absolutely need", it is what you unreasonablydemand of us as proof of out sites being affected by the Outage. As for the data you'd belooking for in your logs, any 500 class errors for our main domain ­ *.finmason.com, obviously.Noted that CF would return errors, yet they won't necessarily be displayed to out users. And tothat I've specifically said, we had user reports with "blank graphs" or "missing elements", not a"whitepages" as you suggest ­ an enormously huge difference.

I understand you have many tickets and it might be hard to follow the conversation when haveto context switch as much as you do. However, you misleadingly quoted me as sayingsomething I did not, making it look like it's done on purpose, which it might very well not beintentional.

Please escalate this ticket to next­in­command. We're just spinning our wheels here and notgoing anywhere.

Main points: 1. CF admits to 7 hour outage at one of the locations. 2. CF demands logs showingfailed requests and/or RayIDs from error pages 3. How are we I expected to provide logs forrequests that are never forwarded to our servers due to Outage? It's a Catch­22 kind of aproblem.

Martijn Gonlag Tuesday at 19:43

Hi Anastas,

I'm sorry to hear you're unsatisfied with the handling of this ticket, but I am afraid we havealready given our final answer, and we will not be able to provide any SLA credit as we cannotclearly show that this was solely caused by the outage in SJC. Please note that we have over60 data centers world wide ­ unless your visitors were all in the San Jose area, they would hitmultiple data centers closer to their own region. One single data center being offline would notcause your entire website to be unavailable, which is why we have asked ­again and again­ foryou to provide more information or anything that can point us at this. If you're looking to keep

Page 20: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 20/27

logs on CloudFlare's end, this is something we offer as part of our Enterprise plan, which wouldgive you full control over all traffic passing through our edge whether it be errors or access logs.But we do not have this available at lower tiers.

Best Regards,

Martijn Gonlag CloudFlare | Support Engineer

Martijn Gonlag Tuesday at 19:44

To note ­ per my previous reply ­ we also do not have the ability to go back in time to when theincident occurred, so I have no way to check the error logs for that zone. Clearly, this issomething that should have been done, and I will ensure that moving forward our support teamreviews error logs when these types of requests come through.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Yesterday at 17:42

Yes, out visitors are not all concentrated in San Jose area. We are however targeting USmarket and are, in fact, unable to offer our services to non­US users. The only reason I've evennoticed the fact of the Outage, is due to the customer complain, received by our CEO. We are astartup company with limited users, so it's important for us to deliver great experience to everycustomer. Our CEO forwarded the complaint to me for investigation and while looking into it,I've found out about CF Outage.. I cannot provide the logs of failed requests, no one besidesCF could. I've found logs of successful requests that user (who complained) was generating..See attached. Again, I request to be escalated to the manager. cf_sj_user.log (2 KB)

Martijn Gonlag Yesterday at 17:52

Anastas,

We can go back and forth, but we have already given you an answer, two of which have comefrom our senior support engineers. We will not be able to provide an SLA because we wereunable to determine the cause of the issues with your customers. Please note that you yourselfhave stated that the error they were seeing was a white page ­­ this would not happen ifCloudFlare had an outage, as you would instead be seeing a CloudFlare branded error (52Xmost likely). There is no other escalation path available in this case, and our decision is final.

Page 21: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 21/27

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Martijn Gonlag Yesterday at 18:01

Anastas, out of curiosity, what time zone are you in?

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Yesterday at 18:12

There is no need to guess what I've said. And there is no need to keep misquoting me.

Blank graph does not equal blank page. Missing elements also does not equal blank page. SLAguarantees mean jack when you refuse to be reasonable and demand proof that is virtuallyunobtainable, while yourself being in the position to retain, report and present any outage thatwould trigger SLA related credit proactively. Let me reiterate: CloudFlare should do itproactively and automatically, without customers having to be subjected to procedural abuse.Instead of having you paid customer of 1+ years on highest non­enterprise plan have thisinfuriating discussion.

You've got a great marketing department. And generally, outrageous marketing claims tend tobite one's ass, when engineering, business and/or support sides are unable or unwilling. If youinsist on resolving this case as before, I need you to provide explicit and official documentationon submitting SLA claims, that indicate precisely what proof must be submitted.

Anastas Dancha

Martijn Gonlag Yesterday at 18:30

Please note that we're not trying to work against you in any way, but normally we can onlyprovide an SLA credit when there is clear evidence that an issue was caused by an outage onour network.

Please find all information related to the SLA that comes with your Business plan here:https://www.cloudflare.com/business­sla/ Specifically, this is the part that you'll want to review:

Service Credit Claims.

3.1 Company provides this SLA subject to the following terms.

Page 22: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 22/27

3.2 In order to be eligible to submit a Claim with respect to any Incident, the Customer must firsthave notified Customer Support of the Incident, using the procedures set forth by Company,within five business days following the Incident.

3.3 To submit a Claim, Customer must contact Customer Support and provide notice of itsintention to submit a Claim. Customer must provide to Customer Support all reasonable detailsregarding the Claim, including but not limited to, detailed descriptions of the Incident(s), theduration of the Incident, network traceroutes, the URL(s) affected and any attempts made byCustomer to resolve the Incident.

3.4 In order for Company to consider a Claim, Customer must submit the Claim, includingsufficient evidence to support the Claim, by the end of the billing month following the billingmonth in which the Incident which is the subject of the Claim occurs.

3.5 Company will use all information reasonably available to it to validate Claims and make agood faith judgment on whether the SLA and Service Levels apply to the Claim.

For future cases when you experience technical difficulties that you think are the fault ofsomething on our network, it would be useful to include:

Output from having the customer visit finmason.com/cdn­cgi/trace Traceroute from affectedcustomer to finmason.com Screenshots of any errors that were observed, and preferably a copyof the RayID associated with the error That said, I have reviewed this ticket once more, and Ihave to admit that we could have handled it better from the start to better assess what occurredhere. Therefore, I will make a one­time exception and grant an SLA credit for the full duration ofthe outage that will be used towards your next payment automatically.

The credit applied to your account totals $10 which is more than we typically grant per thecalculation below.

Service Credit = (Outage Period minutes * Affected Customer Ratio) ÷ Scheduled Availabilityminutes

Please let me know if I can be of any further assistance on this matter at this time.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Yesterday at 19:17

Page 23: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 23/27

I refuse your SLA credit, unless you admit that your request to provide logs for failed requestwas unreasonable. Not unless you'll work to put "proactive SLA credit" feature on the roadmap.Not unless you realise how absurd is the though of providing the "traceroute from affectedcustomer"; how completely insane is an expectation of "output from having customer visitfinmason.com/cdn­cgi/trace" ­ when we are talking about web traffic from user browsers in theage of AngularJS and AJAX requests. It's completely bonkers!

You got the realise it soon! Trust me, it's for the best.

I suppose, someone must've though that using "reasonable" in the SLA terms is a nice trick andallows for an endless tautology with Catch­22 style demands. No! You can't operate like that."Reasonable" means as deemed by the majority of industry leaders. And I assure you, majoritywould agree with me.

By the way, by my calculations, Nov 18th San Jose Outage related SLA credit should be around$0.10 (you can read all about it in an upcoming article).

This ticket should've been resolved with support's first response, with that same $10 credit. AndI would be talking to my colleagues and peers about brilliant service, that I've sold to our bossover a year ago. Not just technically brilliant, but customer­centric "no bullshit" ((c) Gandi) kindof service. Consider $1600+ paid by us and the lightness of our traffic.. $10 is 5 to 10 times lessthan your marketing spends on paid user acquisition.

Think about this, pass it to your manager, bring it to the meeting, use it in training material.Worst thing you can do is to close this ticket, check out and learn nothing from this.

Sincerely, Anastas Dancha

Martijn Gonlag Yesterday at 19:28

Anastas,

The information I requested is what we ask from all our customers, including Enterprisecustomers. If you're unwilling to provide this, and any other information, then you risk that wecannot properly assist and provide an SLA credit, per the agreement you signed when youbecame a customer.

Please note that I have already stated on multiple occasions that the ticket wasn't handled right,and for that I do apologize. I have also noted that we can no longer retrieve the logs due to the

Page 24: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 24/27

time that has passed since the incident, which prevented me from looking up this informationwhen I took over the ticket.

While I can understand you feel that this should have been resolved on the first response, wedo require information from you as a customer, or from your visitors. We don't do magic, and wecannot see a lot of the things you appear to be assuming we can. For example, we only keeperror logs for a ~48 hour period. We do not keep any access logs at all, except for at theEnterprise plan. This makes it extremely hard for us to identify issues unless we know whatwe're looking for with your help.

I will now close this ticket as there is no longer any point to go back and forth. You have thecredit you asked for ­ and I have increased the credit because I do agree with you that the ticketwas not handled appropriately.

Best Regards,

Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Yesterday at 19:43

Quote:

If you're unwilling to provide this, and any other information, then you risk thatwe cannot properly assist and provide an SLA credit, per the agreement yousigned when you became a customer. I'm not unwilling! It's impossible! For Imight not even know who are our customers when they cannot load the page..But it's pointless, you're right.

Sadly, you don't seem to get it. For the record, I refuse your $10 SLA credit with sameconditions as stated.

Feel free to close this ticket, I won't respond triggering it to reopen anymore.

Regards, Anastas D

Martijn Gonlag Yesterday at 22:17

Anastas,

Please understand that I am not disputing that the ticket was handled poorly in any way. I agree

Page 25: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 25/27

with you 100%, and I have left internal feedback to prevent this from happening in the future.This is also not how most of our customers experience support interactions, so I apologize thatwe left a bad taste, and can ensure you that we strive to provide quality support to all ourcustomers.

However, hope you can appreciate that not all issues are as straight forward to troubleshoot,especially when dealing with network related problems. This means that sometimes we have toask you, or your customers, for additional information so that we can better troubleshoot theissue. During incidents, this also allows us to determine whether the issue is related to theincident, or otherwise. Without this information we're guessing at best, because error logs aloneare, in most cases, not enough to resolve issues with.

As an example, customers may be experiencing a network issue, but only on one single ISP.We would only be able to determine this with information from the customer, which wouldinclude a traceroute or MTR that may show is the faulty routes. In addition to that, the CDNtrace shows us what specific location and metal that was hit, as well as the client IP in case wehave to run any tests back to the client from our network.

This is part of our standard troubleshooting, so we appreciate if this type of information can beincluded in requests to ensure we can swiftly resolve issues. We ask this of all our customers,from Free, to Enterprise, and is not specific to just your case or us trying to dodge blame. Ifthere is an issue with our servers, we're more than happy to acknowledge. But only if we are100% certain it is.

Best Regards, Martijn Gonlag CloudFlare | Support Engineer

Anastas Dancha Today at 14:16

Please understand that I am not disputing that the ticket was handled poorly inany way. I agree with you 100%, and I have left internal feedback to preventthis from happening in the future.

Yes, it was handled poorly, by every team member and every time you've provided a "finalanswer" or suggested that I'm not providing data you oh so reasonably request because I'munwilling. I can't tell which part of what I said are you agreeing with? Any chance it's the partwhere I ask you to consider how you have a specific 7 hour Outage and expect me to providelogs showing you traceroutes from our affected customers? It's not like I can call every personin San Francisco and ask them: "Any chance you've visited our site Nov. 18th between the

Page 26: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 26/27

hours of 8am and 3:30pm UTC? Can I have some screenshots, traceroutes and serverresponses for AJAX requests our AngularJS web­app made that you've most certainly savedexpecting my call?".

This is also not how most of our customers experience support interactions, so Iapologize that we left a bad taste, and can ensure you that we strive to providequality support to all our customers.

Our entire conversation paints a picture of institutional tactic of evasive maneuvering, whereeach support member participating in conversation followed, what appears to be, an approvedprocedure of misdirection to resolve these types of requests.

However, hope you can appreciate that not all issues are as straight forward totroubleshoot, especially when dealing with network related problems. Thismeans that sometimes we have to ask you, or your customers, for additionalinformation so that we can better troubleshoot the issue. During incidents, thisalso allows us to determine whether the issue is related to the incident, orotherwise. Without this information we're guessing at best, because error logsalone are, in most cases, not enough to resolve issues with.

As someone who's been in the industry for some time now, I'm aware of the difficulties introubleshooting peculiar network issues; where only by having access to detailed diagnosticsoutput might help determine the root cause. Networks are hard. Combine that withconfiguration, software, and OS tweak ­ it's a cat farm. Not in this case though. Cause is not amystery here. CloudFlare is in the best position to collect all necessary information during anoutage. Keeping per­domain tally of served error pages, either derived from logs in post­processing or with asynchronous "bean­counter" ­ is the only way to determine per­customerper­domain impact of any outage.. and issue unsolicited SLA credits, proactively.

As an example, customers may be experiencing a network issue, but only onone single ISP. We would only be able to determine this with information fromthe customer, which would include a traceroute or MTR that may show is thefaulty routes. In addition to that, the CDN trace shows us what specific locationand metal that was hit, as well as the client IP in case we have to run any testsback to the client from our network.

I appreciate how reasonable this (and other) sentence looks in a vacuum. And how it's acomplete misdirection, for this example is not applicable here. We are not talking about some

Page 27: CloudFlare Does Not Play Fair

12/13/2015 CloudFlare does not play fair

https://webcache.googleusercontent.com/search?sourceid=chrome­psyapi2&ion=1&ie=UTF­8&q=cache%3Ahttp%3A%2F%2Frandom.io%2Fcloudflare­pla… 27/27

view raw

obscure network issue I might or might not have discovered. We are discussing a self­imposed100% SLA obligation in the context of a specific and admitted 7 hour Outage that affected majormetropolitan area; arguably, the most important geo for startups and high­tech. And unlike otheroutages, I see no mention of "traffic have been redirected via alternative location".

This is part of our standard troubleshooting, so we appreciate if this type ofinformation can be included in requests to ensure we can swiftly resolve issues.We ask this of all our customers, from Free, to Enterprise, and is not specific tojust your case or us trying to dodge blame. If there is an issue with our servers,we're more than happy to acknowledge. But only if we are 100% certain it is.

Having a standard, doesn't make it right. And this bring me back to my point of "it cannot beallowed to remain a standard". When Amazon or Digital Ocean is having an outage with 1 of100 machines affected, they don't require a proof showing how users hit that server and itresulted in an error. There are some differences between paying hourly/month for VPS versusCDN, granted. But in the context of an Outage, it's the same. Whether I had 1 user hit the siteand get an error or 1k, or 1m. Downtime is downtime. We have a 100% certainty in the fact thatthere was no mention of root cause or any details explaining the nature of the outage, whatservices were affected. There is no way to sign up for alerts, as far as I know. Only limitedfunctionality StatusPage.io that's dry on details. Did you know that StatusPage.io can becustomized to allow users to subscribe to updates via email, SMS, Twitter, iCalendar (!),webhook? That functionality is disabled on CloudFlareStatus.com. I ponder the reasons.

cloudflare_nov18_outage_sla_madness.md hosted with by GitHub

Anastas Dancha's Picture

Anastas Dancha

Read more posts by this author.

Boston, MA http://random.io

Share this post

Twitter Facebook Google+special sauce © 2015Proudly published with Ghost