Legacy Sydney Datacenter Issues Update

- Tuesday, February 01, 2011

At the time of writing, we continue to experience issues on BC's legacy Sydney datacenter. We have systems engineers working on the issue and I am posting an official update on the situation. Please note that for the purposes of this post, all dates/times are posted as Australian Eastern Daylight Saving (+11 GMT) time.

To give you some background surrounding these issues, we originally had 2 Watchguard firewalls in place in our legacy (Sydney Primus) datacenter, one acting as the primary, the other as a backup. The primary firewall developed a hardware issue causing last Friday's outage and we failed-over to the backup firewall.

Yesterday, we suffered another major outage from 11am to 5:30pm due to the backup firewall being unable to handle the load. To rectify this, we have installed an additional firewall with a load balancer to distribute the load across 2 firewalls, and to try and stabilize the situation. We are also adding another network switch which will take approximately 2 hours. We are working with the vendor to procure another primary firewall as soon as possible, giving us triple redundancy.

Other actions we are taking to improve stability in the Sydney Primus datacenter include:

  1. Rebooting the NAS server tonight (1AM Wed 2 February 2011) - this will result in 25 minutes of downtime during off-peak hours, however the reboot will free up system resources and improve performance of that server
  2. Throttling OpenSRS mail migration - given that we are experiencing load issues on our firewall we have taken steps to throttle our OpenSRS migration from Sydney Primus. The legacy mail server was physically located in the same location behind the same firewall as the other servers. This has unfortunately extended our mail migration period for another 72 hours.
System engineers are monitoring the situation 24/7 and you can be assured they are doing everything possible to keep the system stable.

Plan for Migrating to Sydney Ultimo

Obviously, keeping the old DC stable isn't our final fix for these on-going issues. Our medium term goal is to migrate all sites from Sydney Primus to Sydney Ultimo as soon as possible, with the least amount of customer impact. I've just finished meeting with the Engineering and Systems teams, who have put together a technical plan which I'm sharing publicly to keep you informed of the situation. Please be aware that the following is subject to change over the next 10 days.

  • What's Happening?: We are migrating all sites and BC application infrastructure from Sydney Primus to Sydney Ultimo in one bulk-move
  • Target Date/Time: 7am Saturday 12 February 2011 (AEDT). This is 2 weekends from now.
  • How Long Will It Take? We will have a scheduled maintenance window of 5 hours, during which all sites hosted on Sydney Primus will be unavailable
  • What are we doing? Simply put, we are going to replicate all databases between Sydney Primus and Sydney Ultimo. We will also setup a high-speed direct datalink between the 2 locations, to ensure databases are kept in sync prior to the migration. At the scheduled time of the migration we will reconfigure DNS settings and make other related BC architectural changes to point to the new Ultimo Datacenter. We will also need to restart all web servers.
  • Customer Impact - Worldwide: During the migration you will not be able to create new BC sites on any datacenter. You will not be able to access the Partner Portal during the maintenance window. No action is required from you.
  • Customer Impact - sites hosted on legacy Sydney DC with redelegated DNS: In addition to the above, all sites hosted on legacy Sydney DC will be offline for the maintenance window of 5 hours. There will be no front-end pages being served or Admin console access. No action is required from you.
  • Customer Impact - sites hosted on legacy Sydney DC with externally hosted DNS: In addition to the 2 points above you will be required to change your DNS settings with your DNS host e.g MelbourneIT, GoDaddy etc, to point to the IP address of the new datacenter. More details and instructions on this in the near future.

Over the coming days I will be posting regular communications around this datacenter migration, including detailed instructions if action is required from you or your customers, and more technical details around the plan as well. We've learnt some important lessons from the mail migration communication process, thank you for the feedback you've provided.

Finally, I want to thank all our partners for sticking with us through these trying times. I read the forums and the comments on this blog and I understand that many of you have built businesses on BC and that you're feeling pain. We know that this is disruptive to you and we are throwing everything we can at the problem to fix it. I will be posting daily updates to the blog on the situation and try to answer as many questions as possible via this channel.

Thanks for reading,
Eddy Chan
Business Catalyst Product Manager