Please see http://mcssys.posterous.com for the most up-to-date information
We’ve had a failure on the server that handles mailing lists and trouble tickets. We’re bringing RT back online via a backup machine ASAP, and are working on getting the mailing list data in sync so we can bring that back.
Mail is not being lost, it’s just queueing up.
Will post updates here as they happen.
System is up, mail is flowing again.
Just a reminder that most systems are down due to a planned power outage in building 240. Things will return to normal on Sunday.
On Saturday, October 30th, there will be a Zimbra outage in the morning while the service is upgraded to the newest version. This upgrade will fix a number of bugs and introduce a number of new features — we’re very excited about this upgrade.
Originally, this upgrade was slated to happen during the power outage work on that weekend, but since the power outage has been pushed to December, we’re still moving forward with the upgrade.
Expect the service to go down in the morning and stay down until early evening. No mail will be lost, only delayed. We’ll post updates as we have them.
Mike Rios has been working tirelessly on this issue from the start, loosing sleep and I suspect some wits, (kidding). Thanks Mike for all your work!
He has just sent out an note on the current state of afairs vis a vis Zimbra, along with a link to a page that they will be using to post updates:
We are still experiencing an issue with our Zimbra mail system. At present,
Zimbra runs normally for a short while, then response time to the users will
begin to degrade after about 15-30 minutes until the system is virtually
unusable. Mail delivery is still happening; the front-end is the only thing
being impacted by this issue.
We have taken to restarting the module that is responsible for the user response
and mail interface every 15 minutes, at :00, :15, :30, and :45 on the clock.
The restart process takes about 90 seconds, during which time reading and
sending mail, along with the Zimbra web interface, will be affected. This
strategy has allowed us to “limp” along while we work with Zimbra in finding a
solution for the problem we have.
Zimbra engineers are working with us, examining our log files and going through
their code. There is at present no estimated time to repair for this issue.
Zimbra understands that this is a critical issue for us, and has a number of
people working this issue and keeping us informed of their progress. What
information we receive will be communicated on this list. In addition, we will
be keeping a wiki page up-to-date with information as we have it:
If there are any questions regarding this outage or any other issues related to
this outage, please don’t hesitate to direct them to this list or any of the
Argonne people involved.
Thank you for your support and patience!
As many of you know, the CIS Zimbra service has been “mis-behaving” since roughly 10pm on Tuesday, October 12th.
The symptoms of this behavior: poor responsiveness leading to an unresponsive system – typically within 20-30 minutes of last restart.
We have a temporary “work-around” in place until Zimbra has a fix for us – we restart the affected process once every 15 minutes. The restart takes around 90 seconds, during which time IMAP and web-based access is unavailable.
After doing our own investigation into the issue, we have been working around the clock with Zimbra support on this issue since around 2AM this morning. Zimbra has several developers working on this, and are following several leads. It is at their highest level of priority and we have stressed to them how important it is that we get this solved quickly.
As new information becomes available, we will pass it along. Please feel free to contact me directly to address any questions and concerns.
Thank you for your continued patience.
At the moment, we’re still waiting for some fix from Zimbra. We’re giving them until Sunday evening before we start trying more drastic measures. We felt this time frame was acceptable given that mail is generally working, except for a 90 second IMAP outage every 15 minutes on the quarter hour.
The Zimbra mail and calendar service is having serious service issues. Work has been going on all night, and engineers at Zimbra are helping. At this point we have no ETA on when things will be stable. More details as they emerge.
CIS will be performing an update to the Zimbra service on Saturday between 9AM and 5PM on Saturday, July 17. During this window, you will not be able to receive new mail, send mail through Zimbra, or check for mail on the server. Also, Calendars will not be available during this window.
Any mail sent during this window will cue up and be delivered once the server is back online. We do not expect any loss of mail or bounced mail.
This upgrade is migrating the server from a 32-bit version to the 64-bit version, which will allow us to enhance performance. Also, the 32-bit versions will be end-of-life soon, and will no longer receive support.
In August, we are planning to upgrade the server from Zimbra 5.0.23 to Zimbra 6.x. We expect this to fix a number of bugs, so that’s something to look forward to.
Sorry for any inconvenience.
Here’s the talk I gave in early June on the state of systems. Alas, the video didn’t quite make it — audio was too quiet and it cut out before the end. But if you want to talk about anything you see in there, please come see me!
Let me tell a couple of true stories of how social networking can be used to cause you and your friends harm.
The first happened to a friend of mine. He’s sitting on his computer, browsing facebook, and he gets a chat request from a friend. This friend claimed to be in London, and needed money desperately as he had been robbed. My friend was smart enough to recognize this might not be legit, and started asking questions. Of course, because this guy had access to all the data in facebook, he could be fairly convincing in his answers (minus, of course, the lag time in looking up the information). As you may guess, my friend did not wire any money. Turns out this is a scam that’s getting more and more common.
The second story happened to an acquaintance of this same friend. However, in her case, it was her account that was compromised. A Yahoo mail account, which was used to send mail to her friends asking for money. We don’t quite know how successful this one was. We do know the malicious user deleted all her e-mails from her account.
Neither of these incidents happened at Argonne, just so you know.
I tell you these stories to remind you that you need to be on your toes. In this day and age of social networking and information sharing, we’re putting a lot of information out there than can be used against us in many ways. I was startled when I visited pipl.com and searched for myself — all this information is out there, scraped off of webpages, social networking sites, Usenet… you name it. Someone armed with that information might be able to pull off a convincing job of pretending to be me. Convincing enough to scam someone else out of money or information they shouldn’t have.
So be careful what you put out there. Keep your passwords strong, lengthy, diverse, and private. Don’t reuse them.
Here’s a couple of links that were passed on to me today from ANL’s Cyber Security Program Office. The first is available on-site only, and is written by Mike Skwarek, the Cyber Security Program Manager and Deputy CIO for the lab. I recommend reading then, as there’s good advice in there.