Сегодня пришло мейлом. Может кому будет интересно и даже полезно.FYI. Thought you might be interested.
Sent: Tuesday, August 03, 2010 5:20 PM
Subject: please share with any C++ coders in your group
I attach a short note on the bad effects of an innocuous-looking way of initializing C++ variables which has cost our collaboration considerable pain, after lying silently in wait for us for many years. You really should listen to the textbook advice on how to initialize variables. The pain is greater, the larger the code base you are dealing with.
A dangerous C++ coding practice to avoid:
// anything at all not involving x
x = 0.;
Instead always initialize at creation:
This problem was tracked down by my D0 colleagues, and summarized by Michael Diesburg of Fermilab. Our
reconstruction jobs run from a self‐contained tarball which carries the job’s entire environment (i.e., typical of the grid,
“an endless source of amusing and novel failure modes”). MC reconstruction jobs had run successfully on a particular
farm before. We then tried to reconstruct data on the same farm, but 60% of the jobs failed with floating point
exceptions on the first event. We could not reproduce this behavior elsewhere, even on another Scientific Linux
Fermilab 5 farm. In fact, when we submitted a reconstruction job repeatedly on a single file, it failed 60% of the time
but ran 40% of the time: it wasn’t the data that was bad.
Cause: It usually takes two screw‐ups to really confuse a physicist: your troubles “combine and mix”.
1) Linux kernel had been upgraded to 2.6.12, which includes ASLR (Address Space Layout Randomization), a
security enhancement to thwart hackers using unprotected array overflow vulnerabilities. Your code starts at a
different location in memory each time you run. Thus the offset between your web browser and (say) the
authentication code in the operating system is different each time.
2) In the first float initialization construction above, the compiler has the right to do a register pre‐fetch of x; this is
not a compiler bug (verified with members of C++ standard s committee). If the fetch of an uninitialized variable
finds an illegal floating point number (NaN), a floating point exception will be thrown.
Obviously, these two factors conspire to give stochastic, but disastrous, behavior.
Turn off floating point exceptions: not practical for many physics codes—do you always check before dividing by Cos?
Change floating point units: an alternative exists which doesn’t trip on register loads of NaN
but it also has precision issues; why trade one insidious problem for another
Turn off ASLR: a bandaid, insecure, and only reduces probability, rather than fixes the problem
site security will likely force you to abandon this eventually
Think: have you ever had a job that only ran on the second or third try?
Insanity is doing the same thing but expecting different results—unless you are dealing with computers.
Find instances and fix your code: painful, particularly in large codes, but the only real cure
There may be compiler flags that help; or grep your heart out.
You won’t introduce any new bugs while patching, will you?