Announcement

**kebabbert** · 05 November 2010, 06:51 AM

Originally posted by smitty3268 View Post

In their case, they have multiple copies of the data scattered around the world and so whenever 1 copy gets corrupted they just take it offline and serve the data from somewhere else until it gets replicated back again.

It seems you dont really understand what I am talking about.

How can Google notice if there is a corruption in a file? Many storage solutions (filesystems, hw raid, etc) can not detect all corruptions, especially not Silent Corruption.

**Jimbo** · 05 November 2010, 12:17 PM

Originally posted by kebabbert View Post

You dont understand what Data Integrity is.

It is not about a disk crashes or something similar. It is about retrieving the same data you put on the disk. Imagine you put this data on disk: "1234567890" but a corruption occured so you got back "2234567890". And the hardware does not even notice the data got corrupted. This is called Silent Corruption and occurs all the time.

What stupidity and nonsense!!, you clearly prove that you don't understand what is BER, silent corruption, CRC , convolutional codes and theory of information!!!

Please don't speak of things you don't know or understand.

**KDesk** · 05 November 2010, 01:19 PM

Originally posted by Jimbo View Post

What stupidity and nonsense!!, you clearly prove that you don't understand what is BER, silent corruption, CRC , convolutional codes and theory of information!!!

Please don't speak of things you don't know or understand.

Then, explain it to us, please.

**smitty3268** · 05 November 2010, 01:40 PM

Originally posted by kebabbert View Post

When have I lost an argument? Can you please link to a post that shows I loose an argument or when I redefine it?

Sure, how about this one from just a few posts up?

You are describing a safe solution.

We talk about unsafe solutions, where corrupted data is allowed.

Skipping over the rest of the stuff where you accuse me of lying, because frankly i'm not even interested in going over this again...

I doubt you know people at NYSE, and from your earlier well known track record I suspect you make this one also up.

No i don't, i read an article about it in the Wall Street Journal, and have heard news reported from other sources as well. If you're claiming some super-secret inside knowledge, then OK. But everything I've heard publicly reported was that they were very happy.

**Jimbo** · 05 November 2010, 01:46 PM

@KDesk

You don't get that type of errors!!. If a small corruption occurs lets say 1 bit, convolution code can repair it and you get your original data, if a big error occurs , crc detects it and you get a read error. If you put a 400 in your excel you don't read 800 when there is an read error.

What CERN is talking about?

CERN reported errors are special corner case of RAID 5 arrays, when the firmware!!! (not the file system) of the raid controller is introducing an error due to malfunction, it is writing data on wrong places, then you get this type of corruption, its a corner case, on raid 5 arrays and only few controllers are affected. ZFS can workaround this rare error, others file system doesn't.

On raid controllers complaint with T10 Data Integrity Field standard, this bug doesn't exits, you can setup your raid 5 safely using ext4.

**smitty3268** · 05 November 2010, 01:47 PM

Originally posted by kebabbert View Post

It seems you dont really understand what I am talking about.

How can Google notice if there is a corruption in a file? Many storage solutions (filesystems, hw raid, etc) can not detect all corruptions, especially not Silent Corruption.

Nope, once again you missed the part where i said "THEY DON'T CARE" if they don't know. Because they're willing to trade a rare error that probably won't ever be spotted in exchange for speed/money. Lots of their search indexes are constantly being updated, anyway, making any error short-lived. And it wouldn't surprise me if they kept crc codes or something to detect errors on long-term files like youtube, where they could always spot an issue by comparing the files they have across different servers for any difference.

How many times are you going to make me repeat myself?

**misiu_mp** · 05 November 2010, 01:53 PM

Originally posted by kebabbert View Post

It seems you dont really understand what I am talking about.

How can Google notice if there is a corruption in a file? Many storage solutions (filesystems, hw raid, etc) can not detect all corruptions, especially not Silent Corruption.

We were saying that google doesn't need to notice, because their data (such as multimedia) is not that sensitive to bitflops. As you said, an error in metadata will make it detectable, but they will then just use another copy. The data needs to be distributed and multiplied in the first place to balance the load and achieve low latency in different geographical locations.

Originally posted by kebabbert View Post

When have I lost an argument? Can you please link to a post that shows I loose an argument or when I redefine it?

Loose is the opposite to tighten. You meant to say lose.
You lost because you've been shown a viable use for a system trading data integrity for performance, something you claimed was unimaginable.

Originally posted by kebabbert View Post

In fact, I suspect you have lied in other posts as well. For instance, you claimed that New York Stock Exchange are very happy now:

Benchmarking ZFS On FreeBSD vs. EXT4 & Btrfs On Linux - Phoronix Forums

http://www.phoronix.com/forums/showpost.php?p=141001&postcount=159

Discussion of *BSD operating systems and software, including but not limited to FreeBSD, DragonflyBSD, OpenBSD, and NetBSD. Mac OS X, GNU Hurd, and other alternative operating systems can also be discussed.

I doubt you know people at NYSE, and from your earlier well known track record I suspect you make this one also up. Because I work in finance, and I have heard the opposite. As has frantaylor, who explains that NYSE is very very cautious about their Linux switch:

Netbook Performance: Ubuntu vs. OpenSolaris - Phoronix Forums

http://phoronix.com/forums/showpost.php?p=85594&postcount=64

Discussion of Solaris-based operating systems including OpenSolaris, Oracle Solaris, Nexenta, and BeleniX.

http://lwn.net/Articles/411022/:

"In summary, NASDAQ OMX seems to be happy with its use of Linux. They also seem to like to go with current software - the exchange is currently rolling out 2.6.35.3 kernels. "Emerging APIs" are helping operations like NASDAQ OMX realize real-world performance gains in areas that matter. Linux, Bob says, is one of the few systems that are willing to introduce new APIs just for performance reasons. That is an interesting point of view to contrast with Linus Torvalds's often-stated claim that nobody uses Linux-specific APIs; it seems that there are users, they just tend to be relatively well hidden. "

The comment of frantaylor (from september 2009) reads the following:

Originally posted by frantaylor View Post

The IT people at the NYSE lose their entire yearly bonus if their uptime drops to less than 99.99%. They use Linux, but it took a 15 year migration project to get off of HPUX. Even so they use way too much hardware and alarms go off if any machines have a load average of more than 0.1. They do not believe in putting any kind of a load on their machines, they are afraid of performance slowdowns. They know full well that Linux does not behave well under load.

I dont know what does he have to support that statement, but it seems that the NASDAQ OMX has chosen linux specifically for the performance benefits. Go figure. Besides none of this has anythig to do with file systems (the article is mostly about networking performance).

**misiu_mp** · 05 November 2010, 02:11 PM

Originally posted by Jimbo View Post

@KDesk

You don't get that type of errors!!. If a small corruption occurs lets say 1 bit, convolution code can repair it and you get your original data, if a big error occurs , crc detects it and you get a read error. If you put a 400 in your excel you don't read 800 when there is an read error.

What CERN is talking about?

CERN reported errors are special corner case of RAID 5 arrays, when the firmware!!! (not the file system) of the raid controller is introducing an error due to malfunction, it is writing data on wrong places, then you get this type of corruption, its a corner case, on raid 5 arrays and only few controllers are affected. ZFS can workaround this rare error, others file system doesn't.

On raid controllers complaint with T10 Data Integrity Field standard, this bug doesn't exits, you can setup your raid 5 safely using ext4.

Still, this indicates there are situations when data can be silently corrupted before it lands on the actual disk media. Except for faulty firmware it can be faulty hardware (hardware registers, [ds]ram), the butterfly effect (like in http://xkcd.com/378/, but on the wires), etc...
It also indicates that it's much less likely than I thought.

**Jimbo** · 05 November 2010, 02:33 PM

Originally posted by misiu_mp View Post

Still, this indicates there are situations when data can be silently corrupted before it lands on the actual disk media. Except for faulty firmware it can be faulty hardware (hardware registers, [ds]ram), the butterfly effect (like in http://xkcd.com/378/, but on the wires), etc...
It also indicates that it's much less likely than I thought.

If data gets corrupted on your ram ... CRC checks signals it.

Before ZFS arrived banks where not changing the money on accounts because silent corruptions...

**misiu_mp** · 05 November 2010, 03:22 PM

Originally posted by Jimbo View Post

If data gets corrupted on your ram ... CRC checks signals it.

Say I write data to the FS, but not flush to disk. The ram is corrupted and when the fsync comes, something else flows out of my ram and into the storage controller.
What CRC check is it that will signal this?

There are more places where bits are stored and flow through on their way from RAM to the disk platters. And there is plenty of high energy particles in the air trying to change them.

Originally posted by Jimbo View Post

Before ZFS arrived banks where not changing the money on accounts because silent corruptions...

I don't understand. Changing money into what?

Announcement

Ted Ts'o: EXT4 Within Striking Distance Of XFS

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment