Sat, 20 Dec 2003
Allocating rounding errors fairly
Last Monday, I visited a client in California. Their programmer showed me some code he'd been working on and mentioned that he was having some difficulty with rounding errors accumulating.
I shared a technique with him I've used to solve the same problem.
Having way too much time in airports and on planes to think about it, I realized that the method I shared, while quite good for most cases, fails to allocate rounding errors fairly in extreme cases. I spent many hours since trying to find an optimal solution.
Consider the problem of distributing money, evenly, across three accounts. If we only have one cent to distribute, which account should get it?
Using the naive rounding method, none of the accounts will get anything and we will be left with one cent undistributed.
Using the technique I shared with Russ, we would consider the first account and attempt to distribute 1/3 cents. That rounds to zero. We would then restate the problem as distributing our single cent across the remaining two accounts. So, we would attempt to distribute 1/2 cent to the second account. That rounds to one cent. Finally, having nothing left to distribute, we assign zero cents to the third account.
This works well for any number of accounts ensuring that the maximum rounding error applied to any given account is less than one cent.
But what happens if we distribute one cent to the same three accounts 10,000 times? Ideally, two accounts should end up with $33.33 and one should end up with $33.34.
Using naive rounding, all three accounts would still have zero balances. Using the second method described, the first and third accounts would have zero balances, and the second account would have a balance of $100.00, since it would always get the benefit of the rounding errors.
I spent a lot of time working on a fair method for distributing the rounding errors. I'm certain I spent my time simply reinventing the wheel, but it was a very interesting problem to solve.
Consider the same problem: distributing $0.01 across 3 accounts.
The first account should get 1/3 cents. Since we can't divide a cent, we'll give the first account a 1 in 3 chance at rounding up. 2 out of 3 times, we'll move on to the second account and give it a 1 out of 2 chance at rounding up. 1 out of 2 times we'll still have our cent to distribute and we'll move to the third account, which will always get the cent if we reach it, because it is the only remaining account.
If you work out the math, you'll see that each account has exactly a 1 in three chance at getting the cent.
Surprisingly (at least it surprised me), this works out with any amount to be allocated, and with any weighting.
For instance, assume we have 3 accounts with weights 1, 2, and 1 respectively. If we want to allocate 1 cent to these accounts, we would attempt to allocate ¼cents to the first account, giving it a 1 in 4 chance to round up. 3 out of 4 times, we'll move to the second account and give it a 2 out of 3 chance (2 is it's weight, and we consider only the remaining 2 accounts for a total weight of 3) at rounding up.
To allocate a larger number, we simply allocate the the integer portion after multiplying the amount to allocate by the account's weight and dividing by the total weight. The odds of rounding up are given by the fractional remainder. We reduce the total amount to be allocated by the amount allocated to each account and the total weight by the account's weight at each round and move on.
The following perl subroutine demonstrates the algorithm:
# usage: allocate(amount, weight_1, weight_2, ..., weight_n)
# returns an array of n allocated amounts
sub allocate {
my $value = shift;
my $basis = 0;
$basis += $_ for @_;
map {
my $allocation = $value * $_ / $basis;
my $allocated = int($allocation);
my $remainder = $allocation - $allocated;
++$allocated if rand() < $remainder;
$basis -= $_;
$value -= $allocated;
$allocated;
} @_;
}
[/programming] [link]
Fri, 19 Dec 2003
Broken firewalls refuse ECN enabled connections
I worked through an interesting problem last night at the office.
One of our users was unable to reach the
State of California's website. We recently
installed a Squid proxy and he assumed the
proxy server was blocking his request. He got a connection refused
error reported back from Squid.
I tried to reach the site without using the Squid proxy and could do so from the firewall but not from the machine hosting the Squid proxy. Packet sniffing with tcpdump revealed the connection attempt was, indeed, reaching www.ca.gov and was being refused by it. That was confusing, because the firewall provides NAT and the target system should not have been able to distinguish any difference between a request coming directly from the firewall box and one coming from the box hosting the Squid proxy.
Comparing the captured packet headers between the successful and unsuccessful attempts I did discover a difference. The firewall box was sending a packet with the Syn flag set (as expected) which simply appears in tcpdump as 'S' in the flags field. The failed attempt from the Squid proxy box had flags 'SWE'.
A Google search turned up a useful hit:
Are you running kernel 2.4.x? If so, _and_ you have TCP ECN enabled, that's the problem. How to check? # sysctl net.ipv4.tcp_ecn 1 means on. How to fix? Short term: # sysctl –w net.ipv4.tcp_ecn=0
That was indeed the problem. The Linux kernel configuration help file includes the following note:
Note that, on the Internet, there are many broken firewalls which refuse connections from ECN-enabled machines, and it may be a while before these firewalls are fixed. Until then, to access a site behind such a firewall (some of which are major sites, at the time of this writing) you will have to disable this option, either by saying N now or by using the sysctl.
Given the odd symptoms, I was very pleased to have been able to find and fix the problem in just a few minutes. Thanks goes to Nathan E Norman whose post on the debian-user mailing list provided the solution and to Google fantastic search engine for making it easy to find.
I'm a Debian user myself. For the time being, I've added the following line to the appropriate interface's section in /etc/networking/interfaces:
up /sbin/sysctl –w net.ipv4.tcp_ecn=0 || true
That seems to do the trick.
Mon, 08 Dec 2003
Dad’s notes
Mom has been boxing and sending Dad's books to me—hundreds of them. I've been working on a database to store information about Dad's books including a scanned image of each book cover, with a web interface. I'm using MySQL for the back-end database and Mason running under mod_perl for the web front end. More on the application development later.
Yesterday, I pulled one of Dad's books out a box at random. Tucked inside were several notes, including the following:
I find these notes of Dad's fascinating. I'm not sure what the formula at the top is all about (if you know, please e-mail me). The screwdriver diagram is particularly interesting. Is it an original idea—an invention on paper? Or did Dad draw it to document something he saw or remembered? I'll never know.
Dad would have considered this sketch nothing more than a doodle—something to pass the time. He certainly wouldn't have considered it presentable. I would give my eye-teeth to draw so well.
Wed, 03 Dec 2003
My first Open Source contribution
This morning, Jonathan Lundquist and I found and fixed a bug in the Squid Web Proxy Cache. I'm not sure about Jonathan, but it was my first, direct contribution to Open Source and a rewarding experience.
It took my knowledge of the Linux development environment, tools, and protocol for creating and submitting a patch; and it took Jonathan's excellent analysis and debugging skills to quickly locate the bug and suggest a patch. Individually, it may have taken either of us 2 days to find and fix the problem in totally foreign code. But together, we found it, fixed it, made a report including a patch in under 2 hours.
I think it is rare to find a working relationship that multiplies productivity that way. Several years ago, Jonathan and I wrote the first version of a Windows based application in a single 30+ hour session, passing the keyboard back and forth. That product has gone through many revisions with thousands of man hours invested, but the core of the application still consists of the code we wrote in those 30-some-odd hours.
Sadly, after 12 years working closely together, that working relationship will end, soon. Jonathan has accepted a position as CFO of for one of our clients. Good luck, Jonathan. I'll miss you most when debugging code on my own!
[/programming] [link]
About this weblog
This site is the personal weblog of Marc Mims. You can contact Marc
by sending e-mail to:
[email protected].
Marc writes here about cycling, programming, Linux, and other items of personal interest.
This site is syndicated with RSS.
Archives
Credits
CSS stolen from Tom Coates who didn't even complain.