Outage / Image problems — 13-14 November 2012 [As it happened updates]

The dotMailer platform is up and functioning normally. This follows an earlier outage on Tuesday 13 Nov 2012 and subsequent related problems  on Wednesday 14 Nov 2012.

Click here for the post-incident report

We're currently experiencing some problems with the network. Among other services, dotMailer is currently not accessible.

Our team are currently looking into it and will post updates on Twitter using the account @dotMailer and the hashtag #dMmaint.

November 13, update 15:48
As you may have seen from Twitter, we have our London and Croydon teams working hard on getting the application back up and running. We estimate that the service will be back up and running within the hour.
Check back here or on Twitter for updates.
November 13, update 16:06
The API is back and working, the application is working; we're doing some final tests before we can give you the all clear.
November 13, update 16:53
Everything is back online, sends are running slowly. We should be able to give you an 'all clear' very shortly.
November 13, update 17:03
Our team in the data centre have given us the all clear. You can now log in and get sending.
We'll be monitoring for the rest of the night; contact us on support@dotmailer.com if you have any problems.
November 14, update 9:58
As a result of yesterday's outage, we appear to have problems serving and uploading images. This is having a knock-on effect with other areas of the system, so we've paused sends whilst we investigate.
November 14, update 11:15
Sends are still paused while we work on restoring our image system. More details will be posted here as they become available. Our support team are on hand on support@dotmailer.com if you need help with any other parts of the system.
Images in dotSurvey / dotMailer Survey are unaffected.
November 14, update 11:35
Images uploaded before Tuesday (Nov 13) have now been restored to all sent campaigns; some recently uploaded images may still be unavailable.
Sends continue to be paused, but will be re-enabled as soon possible.
November 14, update 12:20
The platform is now back up and running. You can now use dotMailer and we will continue to test and monitor.
All campaigns paused during the incident have now been, or are being, sent.
There will have been a small window during which any images uploaded will continue to be unavailable (images uploaded on or before Monday will be unaffected); we will be working to restore these images where possible.
If you have uploaded images yesterday or today you can either wait for these to be restored, or reupload them. If you have uploaded images during this time, please be sure to test send your campaigns before sending.
November 14, update 14:25
A few users were having issues uploading images; this has now been resolved and all uploads should be functioning normally.
The issue was found to only affect images which had previously been uploaded into the system.
November 14, update 18:30
We are experiencing further outages, our team is working to resolve the issue. We will continue to update this article and Twitter.
November 14, update 20:45
We have confirmed that we have further network issues within our data centre. Our engineers are on site, but currently we are still unavailable. More as we have it. We're also updating on Twitter at https://twitter.com/dotmailer
November 14, update 22:38
Issues in our data centre are ongoing; we're still working towards getting everything up and running again as soon as we can.
November 15, update 01:18
The issues in our data centre have been resolved. Our team in the data centre have given us the all clear.
We'll be monitoring for the rest of the night; contact us on support@dotmailer.com if you have any problems.
November 15, update 07:55
No further issues have been identified during the night. We are continuing to monitor the system.
November 15, update 11:35
The system is now stable and the affected hardware has been sent for testing. See our post-incident report.
Have more questions? Submit a request

Comments

  • Avatar

    hey Stoo, hope you're not too stressed out after todays outage; how is it possible for the platform to go down if served from 3 differing locations?

  • Avatar

    Hi Don,

    Fortunately the stress is brought down by people bringing us sugary treats. We do still have a reliance on the London data centre for recovery without data loss, although we are working on improving our speed of incident recovery to a matter of milliseconds. (Ironically, I was working on this at lunchtime.)

  • Avatar

    I am assessing dotmailer currently, can you advise if this is a one-off problem?

  • Avatar

    Hi Sally,

    Yes, this has been a one-off incident. We also had a planned period of  extended period of downtime on a Saturday of July this year which was preceded with notifications by several channels. Other than that and today's incident, dotMailer has a very high uptime — I don't have figures to hand, but will try to provide them with the followup report later this week.

  • Avatar

    Hi Stoo,

    I'm not sure if something has changed - but I cannot upload images to Image Manager anymore after your outage. They are just uploading as blank. Existing image are displaying fine.

  • Avatar

    I too am having problems creating new campaigns - getting error message.

    Previous campaigns previews do not display either.

  • Avatar

    Images are not displaying, current or previous. Can you advise if you are still resolving the data servers please, perhaps your image server has been corrupted somehow.

  • Avatar

    Hi Oliver, Mark,

    Sorry about that; you're right it is as a result of yesterday's incident.

    I've the following from the team currently fixing it:

    "As a result of yesterday’s outage the system which hosts campaign images is currently offline and is being repaired.

    As a consequence we’ve temporarily delayed sends to prevent campaigns being sent with missing images. Unfortunately any campaigns previously sent will have broken images. However tracking data is being collected and links will still work."

    I'll keep you updated with progress and timescales when available.

    Stoo

    Posted from my phone due to train cancellations, please excuse any typos.

  • Avatar

    Hi Stoo,

    Thanks for update. I have added a support ticket but sure you guys are working hard on the fix. I have instead trialled a camapign using the @ media query and with our website hosting the images and this type seems ok (so far in creation).Do you think this will be ok for sending or with the delayed sends will this sit in a queue? Cheers

  • Avatar

    Hi Oliver,

    An ingenious fix there, that would definitely have worked earlier. At present, however, we've paused all sends to stop campaigns going out with broken images and we don't have an easy (to hand) way to work out which campaigns contain images from where. It does give me ideas for some future work though.

    You should be able to 'send' this campaign and it will sit in the queue provided other images don't prevent the campaign from saving.

    Thanks all for your patience while we fix this.

    Stoo

    Now back in the office 

  • Avatar

    Like Oliver, we're still experiencing the error messages when creating new campaigns as well, although it only seems to be affecting our users with IE8 (Firefox seems to be functioning normally).  Unfortunately 99% of our user base only have access to IE8!  We've put in support tickets to this effect so hopefully you can figure out if this is related to the other issues!

  • Avatar

    Any idea when the sends will be un-paused and the images will be working again?

  • Avatar

    @Neil — Sorry about that, thanks for submitting the support ticket and the browser specific notes; IE8 is the most common browser for using dotMailer so we'll definitely get on to fixing that.

    @Joshua — No timescales are available yet; but we'll keep you updated here and on Twitter as soon as we have something more definite.

  • Avatar

    Hey Stoo, whilst I appreciate it is a server downtime crash, it does highlight that clients greatly benefit if communication could be pushed out by dotMailer via alternative means. Twitter is a great medium to communicate out your voice, but if you dont know to go to Twitter to listen, how are we supposed to know?

    I dont know how many #1000 customers you have, but notification of these issues without us having to go through the standard appropriate channels or accusing our own IT infrastructure doesnt help.

    Even a generic BCC email to account owners would help telling us to keep up to speed with Twitter.

    I hope its up and running soon & good luck with it

    Cahir

  • Avatar

    Hi, Any updates on when the paused campaigns will be sent?

    Thanks

    Nick

  • Avatar

    No time scale as to when the images will be back up and running? Not even an estimate? 

  • Avatar

    @Cahir I agree. I've been waiting to upload three new campaigns, which I started working on yesterday at 12.30pm. My work has been massively disrupted by these outages and have had to phone repeatedly to find out any new information.  I've heard nothing from my account manager (I rarely do) and have now resorted to searching the internet and twitter for updates (which is wasting my time, when I am now so behind with my work). 

    Come on Dotmailer - sort out your communication with your clients.

    Sarah 

  • Avatar

    Hi Stoo as a white label it would be great just to get an email when things like this happen so that we can inform clients. I understand why you couldn't email yesterday but this mornings incident could have been emailed to us. I would have then been able to inform my clients of the issues and answer incoming calls without having to put a call into your support team.

    Thanks

    Holly

  • Avatar

    Hi all,

    We're hoping we'll be able to unpause campaign sends very shortly (in the next few minutes) — so any campaign sent (or new campaigns) should be back in action.

    No timescales on fixes to the images just yet (could be minutes or hours); but we'll let you know as soon as possible.

    Sarah/Cahir, thanks for the feedback on getting the message out there. We try to push people to Twitter and the support forum so people can get info as up to date as possible; but we'll take that on board when we do the post-incident review.

  • Avatar

    You cant push people to Twitter or a support forum when you havent engaged them to that point?

    Even your livechat at the minute does not actually allow you to engage a person. It interactively asks you to fuill as much information and then responds that the ticket has been sent.

    @Stoo. You have outbound email and my issue with dotMailer (especially as a Whitelabel) is that no communication got to me before I had to go find it or worse yet, have clients (indirect major sales clients who have an account on our system)  tell me that they are annoyed as our system keeps crashing.

    [Fail to plan, plan to fail]

    Im sorry but dotMailer has not created a solid enough crisis communication process for situations such as this and it would be highly advisable that you act on it asap before you see customers rethinking their relationship

  • Avatar

    Hi all,

    The platform is now back up and running.

    You can now use dotMailer and we will continue to test and monitor.

    All campaigns paused during the incident have now been, or are being, sent.

    There will have been a small window during which any images uploaded will continue to be unavailable (images uploaded on or before Monday will be unaffected); we will be working to restore these images where possible.

    If you have uploaded images yesterday or today you can either wait for these to be restored, or reupload them.

    If you have uploaded images during this time, please be sure to test send your campaigns before sending.

  • Avatar

    Hi Stoo

    Thanks for the update - we tried a @media with seperately hosted imagery 

    http://wantthelook-newsletter.com/C8N-11S9H-E324TRL5D0/cr.aspx?campaignkw=undefined

    So hopefully this will be ok.

    Thanks. 

  • Avatar

    Newly loaded images are still not showing despite you saying the issue is resolved!

  • Avatar

    Thank you Stoo

  • Avatar

    Are images working or not? In my templates, I have deleted what should of been images with a view to reload. 

    Can you confirm if everything is working as it should or not?

  • Avatar

    Hi Phil,

    Sorry to hear that. I've been doing some tests with new images and all seems to be working for me; so I'm not sure what's happening there. Can I get you to drop our support team a line with some more details? (Though please be patient, there may be some backlog.)

  • Avatar

    Hi Sally,

    Yep images should all be working now. The only images that might be affected are those uploaded on Tuesday 13th, or uploaded this morning (on the 14th). All new images (and images uploaded Monday/before) should be fine. We'll be working to recover the affected images over the next few days.

    Phil's pointed out that his images aren't working, but I can't reproduce that here — so be sure to do a test send. If you spot anything not working, do let our support team know with specifics about the account, campaign and image that aren't working.

  • Avatar

    I'm assuming that maybe there's a cacheing issue on the servers.  I can only read my images from yesterday if I save them as different image formats (PNG rather than JPG).  Even if I recreate the image from scratch - if I save it as JPG it loads as blank.

  • Avatar

    Ah yes, I see. Sounds like it may only be affecting images with the same name as one of the affected images uploaded yesterday?

  • Avatar

    The image upload issues should now be resolved. More info shortly.

    Particular thanks to Phil — your notes helped us to debug it quickly.

Powered by Zendesk