We’re moved!

stace | Uncategorized | Saturday, October 17th, 2009

Wow, what a day. Most of us got here over 15 hours ago, some of us are still here. I’m a little punch drunk, so I’ll give a quick summary and perhaps a better post mortem later in the week.

  • All equipment was moved without incident. Boyer-Rosene (the folks who moved all of us into our offices) were simply fantastic, very professional, and a joy to deal with!
  • The bulk of the day was spent putting the machines back together, plugging in all the whosits and whatsits and things that go *ping*
  • We had a bit of a network scare around 7, just when we thought we were back. Corby and Linda trudged over to 221 and got things working again after some fighting with some very old networking hardware.
  • We*believe no mail was bounced, and things seem to be trickling through now and will continue to do so overnight
  • We also believe just about all critical infrastructure is back up. We’ll catch what we missed either tomorrow or Monday
  • We finally retired the following machines and services:
    • Our old Windows 2000 domain
    • Our old mail servers, some of which date back to the 90’s
    • Our old tape library, that’s been handling our tape backups since… well, since before I’ve been employed here.
  • Your “windows password” is no more. You now have a single MCS password, used for logging into everything we run except Zimbra. And, soon, Zimbra will also use that password

The Core really looks like a datacenter now. It’s filled with racks of machines, all chugging along. We have some work to do, still, and some cleanup in there (along with a hefty amount of cleanup in 221). Once it’s all done, we’ll have a little “open house” over lunch some day when you can stop by and we can show you all the cool things about the room and what’s running in there, plus what will be running in there in the years to come.

A big, heartfelt “thank you” to everyone who came in to help today, and throughout the weeks since we started this move: Corby Schmitz, Linda Winkler, Max Trefonides, Hunter Matthews, John Valdes, Jason Hedden, John Roberts, Rick Bradshaw, Ti Leggett, Ken Raffenetti, Dave Goodell, Darius Buntinas, Rob Latham, Jared Wilkenning, Narayan Desai, Pavan Balaji, Rinku Gupta, David Ressman. If I’ve forgotten someone, please let me know! This whole move was a huge thing, and praise is deserved.

I don’t want to minimize anyone’s effort in this, but I feel I must call out two people who have put in a really extraordinary effort in making this all happen. First is Hunter, who spent almost every day since we moved to 240 back over in the old machine room getting gear ready to move, all while learning about and rebuilding a new piece of the infrastructure that recently fell under his responsibility. I also want to call out Rick, who really took the lead on the huge organizational burden of this move, and did a fantastic job, also while doing a fantastic job fighting the NFS server issues that plagued us (while coming up with a top-notch and zippy service improvement).

Lastly, some entertainment for you. I’ve posted a couple of videos to facebook, but for those of you who aren’t on there, check out the following videos. This is what happens during a long day in the datacenter:

The Big Move

stace | Uncategorized | Friday, October 16th, 2009

It’s here! It’s underway! Things are moving!

Move II: October 16-18

  • Full Disclosure (already down)
  • Breadboard (already down)
  • kbT compute cluster (already down)
  • I2U2 resources (already down)
  • LCRC DDN storage system (already down)
  • MCS Core Computing infrastructure (going down at around 4-4:30 PM today)

First thing Saturday morning, the movers will move the reamining equipment into the Core, at which point we’ll start hooking things back up and have things operational as soon as possible. You should generally not expect the resource to be available until the Monday following the move (though we’ll obviously strive for getting things up as quickly as possible).

MCS Computing resources:

  • October 16
    • 4:00-4:30 PM: general core computing resources go down. At this point, all login nodes, compute machines, file servers, etc., will be offline as we pull cables and prepare the machines for the move.
    • Note: This will affect mail service. Reading and sending mail will unaffected, receiving of new mail will be delayed, but we’re working to keep it as short a downtime as possible.
  • October 17
    • Very early in the morning, movers start moving gear to the Core.
    • If all goes well with no snags, we should be operational before 5:00 PM, however the outage window is still until Monday in case things go other than smoothly.

There you have it! We’re shooting to keep downtimes to a minimum and make this weekend go as smoothly for you (and *us*) as possible. Sorry for any inconvenience this may cause you.

See you at the TCS reception at 3!

Mail Slowness

stace | Known Issues | Thursday, October 8th, 2009

An over-aggressive DOE scanner spam-bombed us this afternoon, bringing our mail service to a crawl as three mail servers tried to deal with over 40,000 e-mail messages sent within a matter of minutes.

We believe we got it all stopped, and all the backlog of mail should be delivered at this point.

Sorry for the inconvenience.

Pictures from the Datacenter move

stace | Announcements | Monday, October 5th, 2009

Pics taken by Max.

All in all, it took a little less than 11 hours. Everything went generally smoothly. It bodes very well for the move in two weeks!

First Datacenter Move

stace | Uncategorized | Saturday, October 3rd, 2009

The first phase of the move of equipment from our old datacenter (the BMR) to the new datacenter (the Core) is complete, from a physical perspective. All the racks made it over with no issues.

On Friday, our smallest cluster was up and running. Today, as the day wore on, we got more equipment up. As a sort of “dry run” for the big move in two weeks, this bodes very well. We’ll keep plugging away at getting the random pieces up and running, but there are no major stumbling blocks.

Hooray!

We’ll post a wrap-up on or before Monday.

History

stace | Uncategorized | Friday, October 2nd, 2009

Today, the first computer came up in our new datacenter, The Core. The cosmea cluster came up, largely without incident, and backed itself up to the tape library (that is, incidentally, still sitting in 221).

Tomorrow’s gonna be a fun day.

Datacenter Move, Episode I, The Phantom Menace

stace | Announcements | Monday, September 28th, 2009

Okay, folks, this is it. A little over two weeks ago, we moved a bunch of people into the spiffy new building. And this Friday, we begin the process of moving our computers over from the BMR to the Core (which is now what we’re calling the SSF).

Are we nervous?

Yes. Yes, we are. How could we not be? You know the old curse; May you live in interesting times. Well, this is a pretty interesting time.

But we’re also confident. All of our plans, our timelines, our probable outcomes are unraveling, intertwining, recombining, and falling generally into place. By Saturday evening, we should have a pretty solid idea of exactly how well we did on the whole ordeal, and how the second and larger part of the move will go.

As a reminder, the sysadmin of the particular system will send out a more detailed notice, but here’s the big picture of what’s moving:

  • Remaining TeraGrid infrastructure
  • NMPDR and IGSB infrastructure
  • The Cosmea compute cluster
  • LCRC PVFS fileservers

We’ll also be moving the bulk of our storage and parts. But that’s just our pain, not yours.

In each of the above cases, the resource in question will go down on the Friday before the move (at some point in the day, depending on the resource), and get prepped for move. First thing Saturday morning, the movers will move the equipment into the Core, at which point we’ll start hooking things back up and have things operational as soon as possible.

This is going to be an experience!

Upcoming moves and Updates

stace | Announcements, Known Issues | Monday, September 21st, 2009

Things have generally been “steady-state” with regards to the office moves, and we’re picking up the last pieces. Some notes.

  • The copier/printer on 4 (4171) is still awaiting its PostScript module. (I referenced the wrong printer in my last post.)
  • A color copier/printer will be installed at 2D11, but is not here yet.
  • Fax modules will get installed such that there’s at least one copier that can send faxes per floor.
  • The big color plotter is awaiting repair. Remember to check with the help desk if you want to print to it.

We’ve had one full week in the new building, and now our group’s attention must turn back to 221. In slightly less than two weeks, we begin the process of moving our computing infrastructure from the datacenters in 221 (known affectionately as the BMR and the LMR) into the ginormous datacenter here in 240 (known affectionately to me as The Computorium).

As such, a lot of our concentration and effort will be going towards getting things packed up or retired. Then, on Friday, October 2, the first move will happen. Two weeks later, the second and last move will happen. We’ve got our schedule of moves set up, so here’s what to expect in terms of moves and downtimes:

Move I: October 2-4

  • Remaining TeraGrid infrastructure
  • NMPDR and IGSB infrastructure
  • The Cosmea compute cluster
  • LCRC PVFS fileservers

Move II: October 16-18

  • Full Disclosure
  • Breadboard
  • kbT compute cluster
  • I2U2 resources
  • LCRC DDN storage system
  • MCS Core Computing infrastructure

In each of these cases, the resource in question will go down on the Friday before the move (at some point in the day, depending on the resource), and get prepped for move. First thing Saturday morning, the movers will move the equipment into the Computorium, at which point we’ll start hooking things back up and have things operational as soon as possible.

I’ll leave it to the administrators of the individual resources to prep you for what you can expect, but as indicated above, you should generally not expect the resource to be available until the Monday following the move (though we’ll obviously strive for getting things up as quickly as possible).

Of note is the second move, which contains the MCS core computing resources. Here’s a general schedule of what you can expect for that on the weekend of October 16-18.

  • October 16
    • 5:00 PM: general core computing resources go down. At this point, all login nodes, compute machines, file servers, etc., will be offline as we pull cables and prepare the machines for the move.
    • Note: This should not affect mail service. Reading and sending mail will definitely be unaffected, receiving of new mail may be delayed, but we’re working to avoid that.
  • October 17
    • Very early in the morning, movers start moving gear to the Computorium.
    • If all goes well with no snags, we should be operational before 5:00 PM, however the outage window is still until Monday in case things go other than smoothly.

There you have it! We’re shooting to keep downtimes to a minimum and make these weekends go as smoothly for you (and *us*) as possible. Sorry for any inconvenience this may cause you, and if you see a showstopper in one of the dates above, let us know ASAP.

I’m looking forward to seeing computers in that big ole room across the atrium!

Here’s to Fortune’s favor, may we find it!

Moving Madness

stace | Announcements, Known Issues | Wednesday, September 16th, 2009

Three full days in the new digs. We’re pretty much settled in, things are becoming normal, and we’re getting comfortable in our new surroundings. Thus far, there have been no zombie attacks from the rock gardens, so things are definitely starting out on a positive note.

Some notes:

  • We’ve got a new wiki page up for MCS and ALCF users here. That’s where you’ll find any known issues.
  • We’ve added a new blog category, Known Issues, where we’ll announce updates on known issues.

In the spirit of reporting known issues, all the printers except for pr-4165 should be operational. We’re waiting on a PostScript module for the 4165 copier in order to make it useful as a printer. Otherwise, you should be good to go with any of your printing or scanning needs. Fax should be generally available soon on each floor, and we’re going to have a desktop package soon available so you can fax directly from your Mac or Windows computer. More updates on that as they arrive.

The conference rooms have been updated in Zimbra to reflect accurate capacity counts, actual room number, building, and floor information. As soon as furniture arrives, we’ll be set!

I intend to post updates on the order of every other day for the next little while. Also, I’m hoping you’ll see entries from more than just me. As always, continue to report your IT issues to issues to systems@mcs.anl.gov, and we’ll get to it.

Thanks!

Printing and Wireless

stace | Known Issues | Monday, September 14th, 2009

The wireless network in TCS is slightly different from the one in 221, and we have to open up some firewall holes to allow printing from guest wireless. This should be working soon, in the short term use Auth wireless (which you really should use anyway, since it gets you behind the firewall), the VPN, or plug in.

Next Page »

Powered by WordPress | Theme by Roy Tanck