Ad
Ad
Ad
Pages: « 1 2 [3] 4 »   Bottom of Page
Print
Author Topic: Planning a New PC Build - Advice Needed  (Read 16714 times)
Chris Pollock
Full Member
***
Offline Offline

Posts: 213


« Reply #40 on: August 03, 2012, 06:21:18 AM »
ReplyReply

Thank you, Chris. Those systems are out of my price range but they do look nice.
If you want ECC support, the only components in your build that you have to change are the motherboard, CPU, and memory. Everything else can stay the same.

The Supermicro motherboard appears to be roughly the same price as the ASUS P9X79 PRO.

The E5-1620 is the Xeon equivalent of the Core i7-3820 that you decided on. According to staticice.com it will cost you $315.95, or $26 more than the core i7.

Crucial.com will sell you 16GB of DDR3 1600 ECC Registered memory (2 x 8GB) for $192, slightly less than what they charge for the non-ECC memory that you originally chose. The ECC memory is admittedly slightly higher latency, but the difference in system performance will be negligable. You could, however, get 16GB of non-ECC DDR3 1600 with the same latency as the ECC memory for $132, so we'll use that price for comparison.

A 32GB ECC system will therefore cost you an extra $26 for the CPU, and an extra $120 for the memory, a grand total of $146. That seems like a modest price to pay for a significant reduction in the risk of data corruption. (I've used US prices; you'll probably pay more if you live elsewhere. The price difference is higher here in Australia because the Supermicro board is hard to get and expensive.)

It's worth mentioning that a lot of AMD motherboards and most of the AMD processors support ECC. I chose a Phenom II for my last build (in early 2009) for this reason. Unfortunately AMD performance is now so far behind Intel that they are no longer a serious contender.

There are those who say that memory errors are rare enough that they can be ignored, but that view may unfortunately be incorrect: http://www.zdnet.com/blog/storage/dram-error-rates-nightmare-on-dimm-street/638

One final disclaimer - I've never actually used a Supermicro board, so I can't vouch for them. They are designed for enterprise use however, so hopefully they are reliable.
Logged
Sharon Van Lieu
Sr. Member
****
Offline Offline

Posts: 376


Nantucket Landscape and Architectural Photographer


WWW
« Reply #41 on: August 03, 2012, 08:01:41 AM »
ReplyReply

Thanks Chris. What is the Xeon equivalent to the i7 3930k? That is the processor I had previously decided on.

Sharon
Logged

Chris Pollock
Full Member
***
Offline Offline

Posts: 213


« Reply #42 on: August 03, 2012, 08:45:46 AM »
ReplyReply

Thanks Chris. What is the Xeon equivalent to the i7 3930k? That is the processor I had previously decided on.
Sorry, I looked at your original system spec. Reading the thread more carefully I see that you did decide on the 3930k later on. The Xeon equivalent would be the E5-1650, which will set you back 615.95 (as opposed to 539.99 for the i7 3930k), according to staticice.com. I think the Xeon and i7 processors are pretty much identical, except that the i7 chips only support unbuffered non-ECC memory, so performance should be very similar.

I'd recommend the 1620 if you need to keep costs down. The 1650 gives you an extra two cores (but a slightly lower clock speed if all cores are in use), but that may not be worth nearly doubling the cost. Having said that, I'd probably choose the 1650, because the resulting price difference for the system as a whole isn't that great, and software is gradually becoming better at using multiple cores. The rate of improvement in computer hardware also seems to have slowed down in recent years, so I expect my next system to last a long time before the next upgrade.
Logged
John.Murray
Sr. Member
****
Offline Offline

Posts: 893



WWW
« Reply #43 on: August 03, 2012, 11:07:51 AM »
ReplyReply

Sharon the other thing differentiating the Xeon is that it supports 2 QPI interfaces; enhancing multiple CPU configs, and directly affecting the number of PCIe lanes offered.  It this important to you?  

If you are running multiple drives using additional controllers, capture/rendering (Red Rocket), etc, other components found in a video production platform, it may well be.    ECC memory would be critical in this instance as rendering or other processing times can run hours if not more......

« Last Edit: August 03, 2012, 11:21:38 AM by John.Murray » Logged

Sharon Van Lieu
Sr. Member
****
Offline Offline

Posts: 376


Nantucket Landscape and Architectural Photographer


WWW
« Reply #44 on: August 03, 2012, 12:06:59 PM »
ReplyReply

John, I don't know enough about it to know if it is important to me or not. I am looking through the Adobe Premiere hardware forum and don't see a lot information on the configuration that Chris mentioned. Maybe I'm not searching correctly. I'm going to check out the Puget Systems site you mentioned to compare what they offer to AVA Direct.  I only talked to AVA Direct about an X79 board. I'll ask them about their workstations.

Thanks,

Sharon

Logged

John.Murray
Sr. Member
****
Offline Offline

Posts: 893



WWW
« Reply #45 on: August 03, 2012, 12:22:03 PM »
ReplyReply

If you are contemplating a single CPU machine, I would tend to recommend staying on the Core i7 platform. 

The expansion capabilites offered by the Xeon, along with additonal cost make sense for Dual CPU rigs, but there are limiting factors here as well (Amdahl's law / too many CPU cores....).  The PBM5 site has an excellent discussion of this

If your future business puts you in the position of suddenly needing a Dual CPU, Jeff Schewe Monster Rig, well thats a problem you'd love to have, right?
Logged

Sharon Van Lieu
Sr. Member
****
Offline Offline

Posts: 376


Nantucket Landscape and Architectural Photographer


WWW
« Reply #46 on: August 03, 2012, 12:33:39 PM »
ReplyReply

That's true, John. Smiley I wouldn't mind that problem. I looked at Puget Systems but they are more expensive than AVA Direct so I'm sticking with them. I checked AVA's prices on the EEC boards but they are out of my budget. $2500 is my absolute limit for this computer with plans to add more hard drives later.  I will go with my plan for the i7 board. I do appreciate the advice from everyone.

Sharon
Logged

Chris Pollock
Full Member
***
Offline Offline

Posts: 213


« Reply #47 on: August 03, 2012, 05:58:34 PM »
ReplyReply

If you are contemplating a single CPU machine, I would tend to recommend staying on the Core i7 platform. 

The expansion capabilites offered by the Xeon, along with additonal cost make sense for Dual CPU rigs, but there are limiting factors here as well (Amdahl's law / too many CPU cores....).  The PBM5 site has an excellent discussion of this
Do you think it's worth the increased risk of data corruption or system crashes to save a few hundred dollars? The risk is hard to quantify, but certainly not zero. Mission-critical systems use ECC for a reason.

For a gaming or web browsing machine I agree that ECC is overkill, but for a system that will be used commercially to handle large quantities of data it seems like a prudent precaution.
Logged
Sharon Van Lieu
Sr. Member
****
Offline Offline

Posts: 376


Nantucket Landscape and Architectural Photographer


WWW
« Reply #48 on: August 03, 2012, 08:37:19 PM »
ReplyReply

I guess I don't understand what data I could lose that couldn't be replaced.  I backup from my cards to three different drives. My post processing is fairly simple and usually can be easily reproduced. I save often when working.

A few hundred more just isn't possible for me. I do appreciate your input.

Sharon
Logged

Chris Pollock
Full Member
***
Offline Offline

Posts: 213


« Reply #49 on: August 03, 2012, 09:59:20 PM »
ReplyReply

I guess I don't understand what data I could lose that couldn't be replaced.  I backup from my cards to three different drives. My post processing is fairly simple and usually can be easily reproduced. I save often when working.
The problem with memory errors is that you don't know when they will happen, or what they will do. An error could corrupt a photo as you're copying it from your card to the PC. It could corrupt something as you're doing a backup, leading to a corrupt file in your backup, while your original copy is still OK. It could corrupt part of your filesystem, and the bad files might make it onto your backup devices before you notice. It could cause a system crash in the middle of something important. In some cases you'll be able to recover your data from a backup copy, but in other cases you may not.

I honestly don't know how common these sorts of errors are. I have noticed a handful of files on my systems go bad over the years, although in all cases I still had a good copy. I don't know if this was due to memory errors or something else (I was guilty of using cheap non-ECC memory and less than top quality power supplies on my older systems) but it shows that data integrity isn't something that you can take for granted.

Anyway, if you can't afford ECC, make a lot of backup copies and hope for the best. Perhaps I am exaggerating the risk.
« Last Edit: August 03, 2012, 10:16:12 PM by Chris Pollock » Logged
lfeagan
Full Member
***
Offline Offline

Posts: 208



« Reply #50 on: August 03, 2012, 10:31:17 PM »
ReplyReply

Here is a wonderful paper on DRAM memory errors. www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
In case you decide to scan it for the charts, CE = correctable error, UE = uncorrectable error

All of the machines in the study have ECC. If they didn't, all of the correctable errors would instead be uncorrectable.

Some snippets from the summary and discussion section
Quote
Conclusion 1: We found the incidence of memory errors and the range of error rates across different DIMMs to be much higher than previously reported.

About a third of machines and over 8% of DIMMs in our fleet saw at least one correctable error per year. Our per-DIMM rates of correctable errors translate to an aver- age of 25,000–75,000 FIT (failures in time per billion hours of operation) per Mbit and a median FIT range of 778 – 25,000 per Mbit (median for DIMMs with errors), while pre- vious studies report 200-5,000 FIT per Mbit. ...

The conclusion we draw is that error correcting codes are crucial for reducing the large number of memory errors to a manageable number of uncorrectable errors. In fact, we found that platforms with more powerful error codes (chip- kill versus SECDED) were able to reduce uncorrectable er- ror rates by a factor of 4–10 over the less powerful codes. ...

BTW, their test fleet was Googe's server, so they have a nice large sample. "This paper provides the first large-scale study of DRAM memory errors in the field. It is based on data collected from Google’s server fleet over a period of more than two years making up many millions of DIMM days"

Quote
Conclusion 3: The incidence of CEs increases with age, while the incidence of UEs decreases with age (due to re- placements).

Given that DRAM DIMMs are devices without any me- chanical components, unlike for example hard drives, we see a surprisingly strong and early effect of age on error rates. For all DIMM types we studied, aging in the form of in- creased CE rates sets in after only 10–18 months in the field.

Quote
Conclusion 5: Within the range of temperatures our production systems experience in the field, temperature has a surprisingly low effect on memory errors.

Temperature is well known to increase error rates. In fact, artificially increasing the temperature is a commonly used tool for accelerating error rates in lab studies. Interest- ingly, we find that differences in temperature in the range they arise naturally in our fleet’s operation (a difference of around 20C between the 1st and 9th temperature decile) seem to have a marginal impact on the incidence of memory errors, when controlling for other factors, such as utilization.

Quote
Conclusion 7: Error rates are unlikely to be dominated by soft errors.
We observe that CE rates are highly correlated with sys- tem utilization, even when isolating utilization effects from the effects of temperature.

BTW, a soft-error is a randomly corrupted bit, such as due to cosmic radiation. A hard-error represents a physical defect in the device, datapath, etc. Previously people thought that soft errors were the major source of errors. Turns out this just isn't so. Because people thought that soft-errors were the major source, they incorrectly concluded that error rates were much lower than they are.

Even after saying all that, I still don't feel like for your needs memory errors are something you need to be worried about. Tons of us run around with notebooks, phones, tablets, and other devices that have no ECC and are on days, weeks, and months without being restarted. I wouldn't get bent out of shape over this issue.
« Last Edit: August 03, 2012, 10:33:01 PM by lfeagan » Logged

Lance

Nikon: D700, D800E, PC-E 24mm f/3.5D ED, PC-E 45mm f/2.8D ED, PC-E 85mm f/2.8D, 50mm f/1.4G, 14-24 f/2.8G ED, 24-70 f/2.8G ED, 70-200 f/2.8G ED VR II, 400mm f/2.8G ED VR
Fuji: X-Pro 1, 14mm f/2.8, 18mm f/2.0, 35mm f/1.4
Rhossydd
Sr. Member
****
Offline Offline

Posts: 1888


WWW
« Reply #51 on: August 04, 2012, 02:41:31 AM »
ReplyReply

Perhaps I am exaggerating the risk.
I think so. The safeguards of ECC memory etc will be worth considering if you're dealing with really important data like bank transactions, medical records etc, but for just image editing ? having to go back to an archive photo copy isn't exactly life and death is it ?
Logged
Chris Pollock
Full Member
***
Offline Offline

Posts: 213


« Reply #52 on: August 04, 2012, 04:51:36 AM »
ReplyReply

I think so. The safeguards of ECC memory etc will be worth considering if you're dealing with really important data like bank transactions, medical records etc, but for just image editing ? having to go back to an archive photo copy isn't exactly life and death is it ?
What do you do if the corruption happens just before you archive, and your archived copy is also corrupt?

Sure, you can probably reduce the risk of data loss to negligible levels with a cunning backup plan (Copying the data directly from your cards to multiple external drives, etc.), but why put up with the extra stress to save such a modest amount of money? The sort of workstation we're talking about will probably last for 4 or 5 years before the next upgrade, so the extra cost of ECC is something like $40 or $50 a year. People are willing to spend a lot more than that for tiny performance improvements, so I don't understand the reluctance to pay a modest premium for reliability.
Logged
Sharon Van Lieu
Sr. Member
****
Offline Offline

Posts: 376


Nantucket Landscape and Architectural Photographer


WWW
« Reply #53 on: August 04, 2012, 09:23:33 PM »
ReplyReply

I don't understand the reluctance to pay a modest premium for reliability.

Hi Chris, I do understand your point but I am at the top of my budget now. I priced an ECC machine and I can't swing it this time around. Maybe next time.

Sharon
Logged

Rhossydd
Sr. Member
****
Offline Offline

Posts: 1888


WWW
« Reply #54 on: August 06, 2012, 03:13:09 AM »
ReplyReply

why put up with the extra stress to save such a modest amount of money?
I'm not stressed at all by not using ECC. The probability of having problems by not using ECC is so vanishingly small it's not worth any concern.

There's far more productive ways to spend your money.
Logged
alain
Sr. Member
****
Offline Offline

Posts: 274


« Reply #55 on: August 06, 2012, 05:29:21 AM »
ReplyReply

Here is a wonderful paper on DRAM memory errors. www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
In case you decide to scan it for the charts, CE = correctable error, UE = uncorrectable error

All of the machines in the study have ECC. If they didn't, all of the correctable errors would instead be uncorrectable.
...

I have read the paper and actually it contains rather good news if "consumer" hardware has the same error rate as the server hardware.

The tested servers would get about one correctable error every 3 year, but I suppose they are running 24/7.  If a PC is shut down when not in use, this would probably decrease to one's in every ten years, let's say one error (or less) in it's lifetime.  I' m sure that there are more important things to do : regular and multiple backups (on other physical media and other locations) , a UPS, good testing of the hardware before and while using.

I do recommend testing the RAM with a tool like memtest (windows) and checking the backups, with a checksum (I use hashdeep for that.).

Logged
lfeagan
Full Member
***
Offline Offline

Posts: 208



« Reply #56 on: August 06, 2012, 10:11:51 AM »
ReplyReply

I have read the paper and actually it contains rather good news if "consumer" hardware has the same error rate as the server hardware.

The tested servers would get about one correctable error every 3 year, but I suppose they are running 24/7.  If a PC is shut down when not in use, this would probably decrease to one's in every ten years, let's say one error (or less) in it's lifetime.  I' m sure that there are more important things to do : regular and multiple backups (on other physical media and other locations) , a UPS, good testing of the hardware before and while using.

Thanks for pointing that out Alain. If memory errors were extremely prevalent, either all hardware would have ECC and/or most software (perhaps all) would include checksums on every file that would be regularly verified.

On that note, it would be interesting if Adobe introduced a feature into Lightroom X (some future version) that included a preference where image checksums would be maintained in the database (or elsewhere) for all of the images and LR would verify the checksum regularly (perhaps the first time the image was accessed during a single execution of the application). Computing a checksum really isn't that expensive and it would give a bit more peace of mind. And like I said, it would be a configurable preference so users who felt comfortable leaving it off or that wanted a bit more performance could do so. The LoE for implementing such a feature would be rather low, likely less that 1PM.
Logged

Lance

Nikon: D700, D800E, PC-E 24mm f/3.5D ED, PC-E 45mm f/2.8D ED, PC-E 85mm f/2.8D, 50mm f/1.4G, 14-24 f/2.8G ED, 24-70 f/2.8G ED, 70-200 f/2.8G ED VR II, 400mm f/2.8G ED VR
Fuji: X-Pro 1, 14mm f/2.8, 18mm f/2.0, 35mm f/1.4
John.Murray
Sr. Member
****
Offline Offline

Posts: 893



WWW
« Reply #57 on: August 06, 2012, 11:39:44 AM »
ReplyReply

Lance:  Thanks for reminding us of that article, I referenced it as well in a discussion a couple of years ago - still good reading.....

One area the article does not address is the actual source of the memory faults;  are they arising from actual failures of the modules themselves, the onboard memory controllers, or a signal path issue?   The reason I bring this up was a bad experience  with a dual socket Supermicro Server board that had the option of using non-ECC memory - we populated it with 64GB and ended up with a basically unuseable machine.  SM sent a replacement that experienced same issue.  Although we gave up on it at the time, I did go back later and repopulate with ECC memory.... no difference!  What ended up "fixing" the problem, was pulling one of the CPU's - clearly a major design problem with the board itself....

I agree ECC memory is critical in applications where long term stability is required; a memory failure during video rendering spanning many hours could result in a missed deadline; servers hosting large databases, directories, or email stores.

In the case of Intel - they have two design paths:
 - the Z chipsets which rely on external memory controllers.
 - the X or Zeon C chipsets which will utilize the on-die memory controller

Honestly, building a machine based on a Z chipset with anything over 16GB of RAM makes me nervous.  I'm much more comfortable using non ECC in an X79....

The checksum idea in LR is interesting, but possibly redundant?  All modern filesystems checksum:

http://en.wikipedia.org/wiki/Journaling_file_system
Logged

lfeagan
Full Member
***
Offline Offline

Posts: 208



« Reply #58 on: August 06, 2012, 12:26:57 PM »
ReplyReply

The checksum idea in LR is interesting, but possibly redundant?  All modern filesystems checksum:

Hi John,

The checksums in a journaling filesystem are for the journal, not for all the files on disk.

Also, not all journaled filesystems do journal checksumming. While ext4 does journal checksumming, ext3 does not.

http://en.wikipedia.org/wiki/Ext3#No_checksumming_in_journal
Quote
ext3 does not do checksumming when writing to the journal. If barrier=1 is not enabled as a mount option (in /etc/fstab), and if the hardware is doing out-of-order write caching, one runs the risk of severe filesystem corruption during a crash.

Consider the following scenario: If hard disk writes are done out-of-order (due to modern hard disks caching writes in order to amortize write speeds), it is likely that one will write a commit block of a transaction before the other relevant blocks are written. If a power failure or unrecoverable crash should occur before the other blocks get written, the system will have to be rebooted. Upon reboot, the file system will replay the log as normal, and replay the "winners" (transactions with a commit block, including the invalid transaction above, which happened to be tagged with a valid commit block). The unfinished disk write above will thus proceed, but using corrupt journal data. The file system will thus mistakenly overwrite normal data with corrupt data while replaying the journal. There is a test program available to trigger the problematic behavior. If checksums had been used, where the blocks of the "fake winner" transaction were tagged with a mutual checksum, the file system could have known better and not replayed the corrupt data onto the disk. Journal checksumming has been added to ext4.

Assuming this wikipedia chart comparing file system is correct, there are very few filesystems that even support ECC/checksumming (and it is usually disabled in them). It looks like the candidates are ext4, GPFS, ZFS, and Btrfs. Definitely not NTFS or HFS/HFS+.
Logged

Lance

Nikon: D700, D800E, PC-E 24mm f/3.5D ED, PC-E 45mm f/2.8D ED, PC-E 85mm f/2.8D, 50mm f/1.4G, 14-24 f/2.8G ED, 24-70 f/2.8G ED, 70-200 f/2.8G ED VR II, 400mm f/2.8G ED VR
Fuji: X-Pro 1, 14mm f/2.8, 18mm f/2.0, 35mm f/1.4
John.Murray
Sr. Member
****
Offline Offline

Posts: 893



WWW
« Reply #59 on: August 06, 2012, 12:55:04 PM »
ReplyReply

Lance:  Thats correct, but the purpose of the journal is to roll back any failed writes to the filesystem.  A memory error and resultant system crash would result in a corrupted file with or without data ECC.  The journal protects us against that by rolling back the failed write.....
Logged

Pages: « 1 2 [3] 4 »   Top of Page
Print
Jump to:  

Ad
Ad
Ad