Ad
Ad
Ad
Pages: « 1 [2]   Bottom of Page
Print
Author Topic: LuLa Server down for almost 24hrs!  (Read 5116 times)
Rob C
Sr. Member
****
Offline Offline

Posts: 12213


« Reply #20 on: December 03, 2010, 04:22:49 PM »
ReplyReply

It's sun spots.

My own website needs me to go to Weebly to access it in case I want to make alterations or check traffic: I couldn't get in.

Worse, with this one down, there will probably have been no traffic!

;-)   or, alternatively, ;-(

Nonetheless, thanks to you guys in Mission Control for getting us out of the warp.

Rob C
Logged

Mark D Segal
Contributor
Sr. Member
*
Offline Offline

Posts: 7044


WWW
« Reply #21 on: December 03, 2010, 04:25:45 PM »
ReplyReply

It's sun spots.


Aw shucks, ya mean there's a scientific explanation? That spoils all the fun.
Logged

Mark D Segal (formerly MarkDS)
Author: "Scanning Workflows with SilverFast 8....." http://www.luminous-landscape.com/reviews/film/scanning_workflows_with_silverfast_8.shtml
bobtowery
Full Member
***
Offline Offline

Posts: 219


WWW
« Reply #22 on: December 03, 2010, 05:25:26 PM »
ReplyReply

Well, there was this picture in one of the wiki leaks documents.  But really, I think it is just a case of mistaken identity?

Logged

michael
Administrator
Sr. Member
*****
Offline Offline

Posts: 4915



« Reply #23 on: December 03, 2010, 06:03:48 PM »
ReplyReply

The fact that a lot of other sites had problems yesterday may indeed be related. Our server is maintained at a large hosting farm in Texas (The Planet). They host thousands of servers and sites, and the problems may have manifested across a lot of machines.

Vincent and Mark are both catching up on their sleep, so I won't have a full post mortem until later in the weekend. If I learn anything relevant I'll post it here.

Michael
Logged
Slobodan Blagojevic
Sr. Member
****
Online Online

Posts: 6182


When everybody thinks the same... nobody thinks.


WWW
« Reply #24 on: December 03, 2010, 06:26:10 PM »
ReplyReply

All seems pretty much back to normal now. If readers find any problems, please let us know.

My avatar is missing and it seems impossible to attach a new one. Also noticed some other members' avatars missing too.
Logged

Slobodan

Flickr
500px
Eric Myrvaagnes
Sr. Member
****
Offline Offline

Posts: 8206



WWW
« Reply #25 on: December 03, 2010, 08:20:37 PM »
ReplyReply

My avatar is missing and it seems impossible to attach a new one. Also noticed some other members' avatars missing too.
So it was all a plot by the folks at Wikileaks to steal LuLa avatars!
If yours is missing, watch for it to appear soon in the NY Times.
Logged

-Eric Myrvaagnes

http://myrvaagnes.com  Visit my website. New images each season.
Justan
Sr. Member
****
Offline Offline

Posts: 1881


WWW
« Reply #26 on: December 04, 2010, 09:19:52 AM »
ReplyReply

Time to start planning our new recovery strategy for the next time. With computers, there's always a next time.

Michael



If you were interested in installing fail-over capability, there are a number of ways to mirror SQL databases and any related software. The goal would be to have a 2nd site, managed by a different group. The on-line site would send regular or real-time updates to the 2nd site. In the event that the primary site goes off line, all that’s needed is to change your dns values so that they point to the 2nd site and you’re back up and running in a few minutes.

It’s not the most trivial of task to establish, but offers many advantages and isn't all that expensive to implement or maintain. Do a Google search on “how to mirror sql servers.”
Logged

Chris Sanderson
Administrator
Sr. Member
*****
Offline Offline

Posts: 1920



« Reply #27 on: December 04, 2010, 09:56:02 AM »
ReplyReply

Yes, this is already 'in the works' - but thanks for the suggestion!
Logged

Christopher Sanderson
The Luminous-Landscape
Mark Guertin
Administrator
Full Member
*****
Offline Offline

Posts: 233



« Reply #28 on: December 04, 2010, 03:02:45 PM »
ReplyReply

My avatar is missing and it seems impossible to attach a new one. Also noticed some other members' avatars missing too.

I'm not sure why some went missing but I will further investigate this.  They all appear to exist so it might be a permissions problem.  You should now be able to upload avatars again, there was a missing PHP module that is now installed.
Logged
ErikKaffehr
Sr. Member
****
Offline Offline

Posts: 7888


WWW
« Reply #29 on: December 04, 2010, 03:38:48 PM »
ReplyReply

Congratulations to handling an unexpected problem in reasonable time!

Best regards
Erik

Ps. I have worked with a Mr. Merik Guertin of L3 Maps, no relative of yours?



I'm not sure why some went missing but I will further investigate this.  They all appear to exist so it might be a permissions problem.  You should now be able to upload avatars again, there was a missing PHP module that is now installed.
Logged

Mark Guertin
Administrator
Full Member
*****
Offline Offline

Posts: 233



« Reply #30 on: December 04, 2010, 03:41:35 PM »
ReplyReply

Congratulations to handling an unexpected problem in reasonable time!

Best regards
Erik

Ps. I have worked with a Mr. Merik Guertin of L3 Maps, no relative of yours?


Thanks Erik.  Nope, no relation.

Mark
Logged
K.C.
Sr. Member
****
Offline Offline

Posts: 662


« Reply #31 on: December 04, 2010, 05:15:30 PM »
ReplyReply

As an IT professional for 25+ years I'm trying to understand why a site, with the level of demand this one has, is being run on a single box and maintained by a couple of guys. No matter how competent you may be that's an old school approach.

If you're using your own server in a colo then you really need to be running RAID and the colo should have another box ready to hot swap to. With all due respect, 24 hrs down time and the need for a manual rebuild is pretty amateur with the options you have available to you.

At the very least write a script and ftp it off site several times a day.

# Dump SQL data
/usr/bin/mysqldump -uUSER -pPASS --all-databases --opt -l --result-file=/backup/mysql/mysqld­ ump.sql

# Compress sql dump
tar zcf /backup/mysqldump.sql.tar.gz /backup/mysql

# UPLOAD TO FTP (DD deletes on successful upload)
ncftpput -f ftplogin.cfg -DD /remote_path /backup/2010_12_4.tar.gz

# EMAIL TO MAILBOX
uuencode /home/user/backup/$DATE.tar.gz Some_Hosting_SQL_Dbases.$DATE.t  ar.gz | mail -s "Some Hosting SQL Database Backup" recipient@domain.com


Logged
Mark Guertin
Administrator
Full Member
*****
Offline Offline

Posts: 233



« Reply #32 on: December 04, 2010, 06:34:36 PM »
ReplyReply

As an IT professional for 25+ years I'm trying to understand why a site, with the level of demand this one has, is being run on a single box and maintained by a couple of guys. No matter how competent you may be that's an old school approach.

If you're using your own server in a colo then you really need to be running RAID and the colo should have another box ready to hot swap to. With all due respect, 24 hrs down time and the need for a manual rebuild is pretty amateur with the options you have available to you.

<snip>



K.C.:

The length of downtime had nothing to do with us not having backups -- in fact we had backups right down to the last minute we were online.  It had everything to do with hardware failure and response times in first diagnosing and then rectifying the problem at the DC end of the equation, and I can assure you that we are taking this up with our provider.  Also I'm not really sure where you get the idea that we performed a manual rebuild of the server or data.  As stated we had full and complete backups of anything remotely considered essential right up to the last minute the old server was online to work from and we restored from these backups. 

A very small portion of the actual downtime was required for the actual data restore.  We also have offsite backups as well but had we made the decision to take that route it's likely that it would have ultimately taken longer at the end of the day than it did to wait the (unacceptably long) time it took the DC to get it's act together and get us back onto functional hardware.  RAID would not have helped us in this situation -- this was not a hard drive failure -- and in fact had we had RAID to deal with for the hardware changeover it likely would have slowed the process down yet again.  I'm also not really sure how you think that having more people maintaining the site and server would have sped up the process (at least on our end of things), you can't restore data if you have nothing to restore it to...

Lastly I have to say that while emailing uuencdoded data is an interesting backup approach, for a dataset the size we are talking about here it wouldn't even be remotely feasible.

Rest assured there are plans underway that will make sure this type of a failure won't require this kind of turnaround time again.

Mark
Logged
K.C.
Sr. Member
****
Offline Offline

Posts: 662


« Reply #33 on: December 04, 2010, 06:51:38 PM »
ReplyReply

Mark you describe a much different picture than the thread let me to believe was the case.

Sounds like a familiar scenario. You don't realize the competency, or lack there of, of the people you're relying on until the worst case happens. Time for a new host/colo.

Emailing gigs of data unsecured is common. You dump tables in random order. Nobody sniffing it can get enough info at once for it to be useful.

Logged
Mark Guertin
Administrator
Full Member
*****
Offline Offline

Posts: 233



« Reply #34 on: December 04, 2010, 07:56:26 PM »
ReplyReply

It's not the unsecured data part of the emailing that bothers me as much as the size of said emails Wink
Logged
Christoph C. Feldhaim
Sr. Member
****
Offline Offline

Posts: 2509


There is no rule! No - wait ...


« Reply #35 on: December 05, 2010, 06:00:21 AM »
ReplyReply

I'd put such a server in a virtualized movable environment, like vmware or Xen.
Just my 0.02
Logged

Justan
Sr. Member
****
Offline Offline

Posts: 1881


WWW
« Reply #36 on: December 05, 2010, 07:54:52 AM »
ReplyReply

my idea is: how to automatically and constantly backup files in a server with zero intervention (automatized) from a HD?

And I'd like an automatized backup that runs all that in case of crash. Is that possible? (I'm not tech as you can see)

Look into the osql utility program or it’s newer incarnation the sqlcmd util.

There are some ftp programs that will do as you wish and which have their own scheduler, or you can use the ftp command line and use the system scheduler, at least in windows boxes.
Logged

Justan
Sr. Member
****
Offline Offline

Posts: 1881


WWW
« Reply #37 on: December 05, 2010, 08:17:15 AM »
ReplyReply


The length of downtime had nothing to do with us not having backups -- in fact we had backups right down to the last minute we were online.  It had everything to do with hardware failure and response times in first diagnosing and then rectifying the problem at the DC end of the equation, and I can assure you that we are taking this up with our provider.  [snip]

...you can't restore data if you have nothing to restore it to...


Disaster recovery is a thorny topic and a troublesome thing to implement. Few will spend the time or $$ to implement a fail-over system due to cost, complexity. It takes this kind of problem to motivate and show the value of a fail-over solution.

This appears a classic case where it takes a series of failures to identify the nature of the infrastructure’s (the data center) shortcomings. It sounds like the core issue is that the data center was not quick to identify or resolve their hardware problems ( Shocked ) and from what you wrote, didn't have a ready solution ( Shocked Shocked). And added to that, the site’s management did not plan for or expect the data center to let them down. ( Shocked )

The good news is that the backups worked ( HURRAY Grin Grin Grin) so little or nothing was lost but time, and gave the site’s management the opportunity to see where the recovery scheme could be improved.

Bravo on the diligence and getting the site up and running in short order!
Logged

Craig Arnold
Full Member
***
Offline Offline

Posts: 219


WWW
« Reply #38 on: December 05, 2010, 08:17:56 AM »
ReplyReply

I'd put such a server in a virtualized movable environment, like vmware or Xen.
Just my 0.02

Yup downtime would have likely been zero (if the hardware was failing with a detectable failure just Vmotion it automatically) or down to a few minutes at most if you needed to spin up an new instance.

Check out something like the Rackspace Cloud web hosting solutions (there are of course other providers too - but starting with Rackspace gives you an idea of what is possible). No single point of failure anywhere. Essentially infinitely scalable too.
Logged

Pages: « 1 [2]   Top of Page
Print
Jump to:  

Ad
Ad
Ad