Cristinel Anastasoaie

Update: January 2011 Stability and Email Issues

Earlier this year we experienced a series of incidents that caused significant problems for a large number of customers. We had issues with our hosted email services followed by stability problems on our legacy Sydney data center.

In light of this, we are issuing a full month credit for all paid sites that were hosted on the legacy Sydney data center, and those that were using internally hosted email services.

To be more specific, the following sites will receive a full month credit:

  • All paid sites hosted on the AU data center that have been upgraded on or before January 31st
  • All paid sites using the internal Business Catalyst hosted mail service across all data centers, which have been upgraded on or before January 31st 2011

All customers that were affected by these incidents will receive an email announcing the full month of credit.

In addition, we have made some updates to the Partner Portal and site Admin Console, to help partners identify sites that have received the credit and also to help customers understand the period for which the credit has been applied. Thus, the site list in Partner Portal has been updated with a new “Credit” column:

The Partner Portal > Clients > Site details user interface has also been updated to include a notice about the credit:

We’ve updated the Site admin user interface to highlight the period for which the credit has been received:

To issue the full month of credit, we will skip a month of invoicing for websites with monthly billing, and simply postpone the invoice for one month for sites on annual billing. If your site is billed monthly, no invoice will be generated between 15 Apr and 15 May 2011. If your site is billed yearly, the next due invoice will be postponed by one month.

The updates are going to be rolled out in our next week release.

Since the series of incidents, we have managed to solve all email issues by shifting to an externally hosted email provider and migrating all sites from the legacy Sydney data center to a new location.

To prevent similar incidents on our other two legacy datacenters, we have accelerated the migration schedule for both legacy Europe (London) and North America (Ottawa) data centers and we plan to complete this in the first half of 2011.

Once again, we apologize for the inconvenience caused by these outages and thank you for your support. If you have any other questions, please submit a Support Request via your Partner Portal or reach us on Live Chat.

The Business Catalyst Team

View Comments

New Schedule For Sydney Datacenter Migration

We are continuing to experience intermittent issues on Adobe Business Catalyst's legacy Sydney datacenter. All dates/times in this post are in Australian Eastern Daylight Saving Time (+11GMT) We have systems engineers working on the issue. I know this is the 3rd business day in a row and it is really getting long in the tooth for Partners and site owners alike. Paul Gubbay, a VP of Engineering at Adobe will be posting on the blog shortly to share some of his thoughts on this very serious situation.

  • Issue: Business Catalyst services hosted on the legacy Sydney Primus Datacenter are exhibiting slow response times. Webpages for these sites were being served slowly in addition to customers reporting problems accessing Admin UI or transferring data through FTP. This is the 3rd business day in a row this has occurred.
  • Time of Incident Start: 2 Feb 2011 11:18AM Australian Eastern Daylight Time
  • Time of Incident End: Ongoing - ETA is unknown
  • Technical Action: Although we installed an extra switch and an extra firewall yesterday into the environment and moved OpenSRS migration to use the secondary switch/firewall, system engineers suspect there's still  too much HTTP traffic coming through the primary firewall. I mentioned that we were going to put in a load balancer for the 2 firewalls as well but this was not required in the end because we put the second firewall on the second switch. Our plan now involves moving 2 web servers (out of 3) across to the network using the secondary firewall/switch combination to balance the load (resulting in 5-10 minutes downtime).

New Schedule For Migration

Onto the topic of datacenter migration; given all the feedback we've received in the comments below, we've now scheduled the migration to occur at 1:00AM Sunday 13 February (check local times here) to give the lowest customer impact possible for Australian businesses.

To give you some background, we originally chose 9am Saturday morning because we thought it would help those partners and site owners with externally hosted DNS make the switch in sync with the migration, however this isn't required anymore. Additionally you have all made it clear that the impact to your customers' businesses is unacceptable if we were to do the migration at the original time. With this in mind, the updated details are as follows:

  • What's Happening?: We are migrating all sites and BC application infrastructure from Sydney Primus to Sydney Ultimo in one bulk-move
  • Target Start Date/Time: 1:00AM Sunday 13 February 2011 (Australian EDT) | 6:00AM Saturday 12 February 2011 (US Pacific) | 2:00PM Saturday 12 February 2011 (London) | check local times here
  • Target End Date/Time: 6:00AM Sunday 13 February 2011 (Australian EDT) | 11:00AM Saturday 12 February 2011 (US Pacific) | 7:00PM Saturday 12 February 2011 (London) | check local times here
  • How Long Will It Take? We will have a scheduled maintenance window of 5 hours, during which all sites hosted on Sydney Primus will be unavailable. Partner Portal access and new site creation will be unavailable at this time as well.
  • What are we doing? Simply put, we are going to replicate all databases between Sydney Primus and Sydney Ultimo. We will also setup a high-speed direct datalink between the 2 locations, to ensure databases are kept in sync prior to the migration. At the scheduled time of the migration we will reconfigure DNS settings and make other related BC architectural changes to point to the new Ultimo Datacenter. We will also need to restart all web servers.
  • Customer Impact - Worldwide: During the migration you will not be able to create new BC sites on any datacenter. You will not be able to access the Partner Portal during the maintenance window. No action is required from you.
  • Customer Impact - sites hosted on legacy Sydney DC with redelegated DNS: In addition to the above, all sites hosted on legacy Sydney DC will be offline for the maintenance window of 5 hours. There will be no front-end pages being served or Admin console access. No action is required from you.
  • Customer Impact - sites hosted on legacy Sydney DC with externally hosted DNS: In addition to the 2 points above you will be required to change your DNS settings with your DNS host e.g MelbourneIT, GoDaddy etc, to point to the IP address of the new datacenter after the migration has started.

Sites with Externally Hosted DNS - Action Required

There's been some questions around what happens for sites with their DNS externally hosted. The Engineering team are looking into an improved solution right now which is to keep a proxy server in the legacy Sydney Datacenter so that all requests coming in to the old legacy IP addresses will get routed through to the new datacenter transparently. Likewise the pages being served will come from the new DC through the proxy server back out to customers. This is not a permanent solution but gives you a longer window in which to make your DNS changes and also lessens impact to your customers when you do make the change. We will likely keep the proxy server running for a minimum of at least 30 days after the migration before we fully decommission the legacy datacenter.

For partners or site owners with externally hosted DNS, we advise you to set the TTL for your records down to 1800 (30 minutes) during the next week in preparation for the migration so that when you do make an IP address change following the migration, the settings will take a shorter amount of time to propagate

Thanks for reading and check back in a bit for Paul's post.
Eddy Chan
Business Catalyst Product Manager
View Comments

Legacy Sydney Datacenter Issues Update

At the time of writing, we continue to experience issues on BC's legacy Sydney datacenter. We have systems engineers working on the issue and I am posting an official update on the situation. Please note that for the purposes of this post, all dates/times are posted as Australian Eastern Daylight Saving (+11 GMT) time.

To give you some background surrounding these issues, we originally had 2 Watchguard firewalls in place in our legacy (Sydney Primus) datacenter, one acting as the primary, the other as a backup. The primary firewall developed a hardware issue causing last Friday's outage and we failed-over to the backup firewall.

Yesterday, we suffered another major outage from 11am to 5:30pm due to the backup firewall being unable to handle the load. To rectify this, we have installed an additional firewall with a load balancer to distribute the load across 2 firewalls, and to try and stabilize the situation. We are also adding another network switch which will take approximately 2 hours. We are working with the vendor to procure another primary firewall as soon as possible, giving us triple redundancy.

Other actions we are taking to improve stability in the Sydney Primus datacenter include:

  1. Rebooting the NAS server tonight (1AM Wed 2 February 2011) - this will result in 25 minutes of downtime during off-peak hours, however the reboot will free up system resources and improve performance of that server
  2. Throttling OpenSRS mail migration - given that we are experiencing load issues on our firewall we have taken steps to throttle our OpenSRS migration from Sydney Primus. The legacy mail server was physically located in the same location behind the same firewall as the other servers. This has unfortunately extended our mail migration period for another 72 hours.
System engineers are monitoring the situation 24/7 and you can be assured they are doing everything possible to keep the system stable.

Plan for Migrating to Sydney Ultimo

Obviously, keeping the old DC stable isn't our final fix for these on-going issues. Our medium term goal is to migrate all sites from Sydney Primus to Sydney Ultimo as soon as possible, with the least amount of customer impact. I've just finished meeting with the Engineering and Systems teams, who have put together a technical plan which I'm sharing publicly to keep you informed of the situation. Please be aware that the following is subject to change over the next 10 days.

  • What's Happening?: We are migrating all sites and BC application infrastructure from Sydney Primus to Sydney Ultimo in one bulk-move
  • Target Date/Time: 7am Saturday 12 February 2011 (AEDT). This is 2 weekends from now.
  • How Long Will It Take? We will have a scheduled maintenance window of 5 hours, during which all sites hosted on Sydney Primus will be unavailable
  • What are we doing? Simply put, we are going to replicate all databases between Sydney Primus and Sydney Ultimo. We will also setup a high-speed direct datalink between the 2 locations, to ensure databases are kept in sync prior to the migration. At the scheduled time of the migration we will reconfigure DNS settings and make other related BC architectural changes to point to the new Ultimo Datacenter. We will also need to restart all web servers.
  • Customer Impact - Worldwide: During the migration you will not be able to create new BC sites on any datacenter. You will not be able to access the Partner Portal during the maintenance window. No action is required from you.
  • Customer Impact - sites hosted on legacy Sydney DC with redelegated DNS: In addition to the above, all sites hosted on legacy Sydney DC will be offline for the maintenance window of 5 hours. There will be no front-end pages being served or Admin console access. No action is required from you.
  • Customer Impact - sites hosted on legacy Sydney DC with externally hosted DNS: In addition to the 2 points above you will be required to change your DNS settings with your DNS host e.g MelbourneIT, GoDaddy etc, to point to the IP address of the new datacenter. More details and instructions on this in the near future.

Over the coming days I will be posting regular communications around this datacenter migration, including detailed instructions if action is required from you or your customers, and more technical details around the plan as well. We've learnt some important lessons from the mail migration communication process, thank you for the feedback you've provided.

Finally, I want to thank all our partners for sticking with us through these trying times. I read the forums and the comments on this blog and I understand that many of you have built businesses on BC and that you're feeling pain. We know that this is disruptive to you and we are throwing everything we can at the problem to fix it. I will be posting daily updates to the blog on the situation and try to answer as many questions as possible via this channel.

Thanks for reading,
Eddy Chan
Business Catalyst Product Manager
View Comments

Our engineers are currently checking that we have stabilized an issue that caused a significant period of downtime for the Adobe Business Catalyst in the legacy Sydney Primus Datacenter today. I'm posting to give an explanation of what happened and will keep you updated as more details come through:

  • Issue: Business Catalyst clients were unable to access FTP and Admin Consoles for all sites hosted on the legacy Sydney Primus Datacenter. Webpages for these sites were being served very slowly with response times of up to 4000ms
  • Time of Incident Start: 26 Jan 2011 1:48AM US Pacific Time
  • Time of Incident End: 26 Jan 2011 7:00PM US Pacific Time
  • Technical Action: The Systems/Operations team have verified that there was a failure in the primary firewall for the datacenter causing heavy packet loss. We have switched over to the backup firewall which has restored services.
  • Vendor Action: No vendor action required
Although the situation is currently stable, our engineers are continuing to monitor the datacenter closely. There will be a follow-up blog post with actions taken to ensure we improve reliability for the legacy Sydney DC after the team performs an incident 'post-mortem'.

View Comments