Tue, 30 Jun 2009
RFC: Math::Round::Fair
I’ve developed a small module, Math::Round::Fair, which currently resides on github. If the name is suitable and nothing like it already exists on CPAN, I’ll upload it after some time for comment.
I blogged about this a few years ago, but until I needed the algorithm again, recently, I didn’t attempt to create a CPAN module for it.
From the POD:
This module provides a single, exportable function, “round_fair”, which allocates an integer value, fairly distributing rounding errors.
Consider the problem of distributing one indivisible item, for example a penny, across three evenly weighted accounts, A, B, and C.
Using a naive approach, none of the accounts will receive an allocation since the allocated portion to each is 1/3 and 1/3 rounds to zero. We are left with 1 unallocated item.
Another approach is to adjust the basis at each step. We start with 1 item to allocate to 3 accounts. 1/3 rounds to 0, so account A receives no allocation, and we drop it from consideration. Now, we have 2 accounts and one item to allocate. 1/2 rounds to 1, so we allocate 1 item to the account B. Account C gets no allocation since there is nothing left to allocate.
But what happens if we allocate one item to the same three accounts 10,000 times? Ideally, two accounts should end up with 3,333 items and one should end up with 3,334 items.
Using the naive approach, all three accounts receive no allocation since at each round the allocation is 1/3 which rounds to zero. Using the second method, account A and account C will received no allocation, and account B will receive a total allocation of 10,000 items. Account B always receives the benefit of the rounding errors.
“round_fair” uses an algorithm with randomness to ensure a fair distribution of rounding errors. In our example problem, we start with 1 item to allocate. We calculate account A’s share, 1/3. Since it is less than one item, we give it a 1/3 chance of rounding up (and, therefore, a 2/3 chance of rounding down). It wins the allocation 1/3 of the time. 2/3 of the time we continue to B. We calculate B’s allocation as 1/2 (since there are only 2 accounts remaining and one item to allocate). B rounds up 1/2 of 2/3 (or 1/3) of the time and down 1/2 of 2/3 (or 1/3) of the time. If neither A nor B rounds up (which occurs 2/3 * 1/2, or 1/3 of the time), C’s allocation is calculated as 1/1 since we have one item to allocate and only one account to allocate it to. So, 1/3 of the time C receives the benefit of the rounding error. We never end up with any unallocated items.
This algorithm works for any number of weighted allocations.
The code is small enough to include here as well:
sub round_fair {
my $value = shift;
croak "Value to be allocated must be an integer" unless int($value) == $value;
my $basis = 0;
for my $w ( @_ ) {
croak "Weights must be > 0" unless $w > 0;
$basis += $w;
}
return ($value) if @_ == 1;
map {
my $allocation = $value * $_ / $basis;
my $allocated = int $allocation;
my $remainder = $allocation - $allocated;
++$allocated if rand() < $remainder;
$basis -= $_;
$value -= $allocated;
$allocated
} @_;
}
Send comments to Marc Mims or post them on github.
Transparent backwards compatibility using Moose
The current Net::Twitter is a complete rewrite of a prior version using Moose. One of my design goals was transparent backwards compatibility so that existing code based on Net::Twitter would continue to run, unchanged (provided the additional required modules were installed).
By using Moose roles I was able to factor out optional features and legacy behavior into separate classes. The core functionality became Net::Twitter::Core. Net::Twitter itself became an object factory applying appropriate roles to create a concrete classes including the appropriate mix of optional features then instances of those classes.
The Legacy role provides attributes and functionality that was not
carried forward into Net::Twitter::Core from Net::Twitter version 2.12. It
modifies the new core functionality with Moose method modifiers (in
particular, the around
modifier). It also applies the set of other,
optional roles required to make Net::Twitter backwards compatible.
The interesting and challenging work was getting the Net::Twitter factory
class correct. Initially, I used MooseX::Traits in Net::Twitter::Core
which provides the role application feature I needed with its
new_with_traits
method. But simply having Net::Twitter->new
call
Net::Twitter::Core->new_with_traits
wasn’t sufficient for Net::Twitter
derived classes. I ended up stealing a bit of the code and the concepts from
MooseX::Traits to get the behavior I needed.
Net::Twitter’s only public method is new
. It can be called with a traits
argument, or with a legacy
argument which is just a shortcut to save a bit
of typing. Traits and roles are the same thing. I use the terms
interchangeably here.
# The following are all equivalent, creating objects
# backwards compatible with Net::Twitter 2.12:
my $nt = Net::Twitter->new;
my $nt = Net::Twitter->new(legacy => 1);
my $nt = Net::Twitter->new(traits => ['Legacy']);
# New can also create non-legacy variations
my $nt = Net::Twitter->new(traits => ['API::REST', 'OAuth']);
Here is the complete new
method:
sub new {
my $class = shift;
croak '"new" is not an instance method' if ref $class;
my %args = @_ == 1 && ref $_[0] eq 'HASH' ? %{$_[0]} : @_;
my $traits = delete $args{traits};
if ( defined (my $legacy = delete $args{legacy}) ) {
croak "Options 'legacy' and 'traits' are mutually exclusive. Use only one."
if $traits;
$traits = [ $legacy ? 'Legacy' : 'API::REST' ];
}
$traits ||= [ qw/Legacy/ ];
$traits = [ $class->$resolve_traits(@$traits) ];
my $superclasses = [ 'Net::Twitter::Core' ];
my $meta = $create_anon_class->($superclasses, $traits, 1);
# create a Net::Twitter::Core object with roles applied
my $new = $meta->name->new(%args);
# rebless it to include a subclass, if necessary
if ( $class ne __PACKAGE__ ) {
unshift @$superclasses, $class;
my $final_meta = $create_anon_class->($superclasses, $traits, 0);
bless $new, $final_meta->name;
}
return $new;
}
The new
method expects either a single HASH reference or a list of key/value
pairs. It checks for and dereferences a HASH ref argument. Then it
normalizes the optional legacy
argument to the appropriate traits. If no
traits
or legacy
argument is provided, it uses the default Legacy
trait.
The first interesting bit of code is the call to $create_anon_class
, which
may be called twice.
The first call to $create_anon_class
creates a meta-class from
Net::Twitter::Core
with the appropriate roles applied and assigns it to the
variable $meta
. Anonymous classes aren’t /really/ anonymous; they have
auto-generated names. So, after creating the meta class, we create an
instance of the class it represents and assign the instance to the variable
$new
.
If Net::Twitter->new
was called directly, we’re done. We can simply return
the new instance. However, if new
was called from some Net::Twitter derived
class we need to create another anonymous class with the derived class
prepended to the list of superclasses. It gets assigned to the variable
$final_meta
.
We couldn’t create the first meta class with the derived class specified in
superclasses
, because calling new
on the class it represents is likely to
be infinitely recursive!
Consider this Net::Twitter derived class:
package My::Net::Twitter::DerivedClass;
use base 'Net::Twitter';
sub some_new_method { ... }
Net::Twitter’s new
method will be called with the $class
argument
My::Net::Twitter::DerivedClass
. Calling new
on the resulting anonymous
class, with My::Net::Twitter::DerivedClass
in superclasses
will result in
an infinite recursion since Net::Twitter’s new
will be called, again.
So, after creating an instance of the anonymous class without
My::Net::Twitter::DerivedClass
in superclasses
, we create another
anonymous class and assign it to the variable $final_meta
. Then we rebless
our existing instance into the new anonymous class and return it. Reblessed
into the new class, it can find and use the derived class methods.
$create_anon_class
has some interesting features.
my $create_anon_class = sub {
my ($superclasses, $traits, $immutable) = @_;
my $meta;
$meta = Net::Twitter::Core->meta->create_anon_class(
superclasses => $superclasses,
roles => $traits,
methods => { meta => sub { $meta }, isa => $isa },
cache => 1,
);
$meta->make_immutable(inline_constructor => $immutable);
return $meta;
};
With the cache
argument set to 1
, create_anon_class
caches the classes
it creates and returns an existing instance of the mata-class when the same
arguments are passed again. So, although create_anon_class
may be called an
arbitrary number of times in the life of an application, a new, anonymous
class is only created when different roles or superclasses are specified.
A final point of interest is the method isa
added to to the anonymous
classes.
my $isa = sub {
my $self = shift;
my $isa = shift;
return $isa eq __PACKAGE__ || $self->SUPER::isa($isa)
};
Net::Twitter is an object factory. It does not create instances of itself.
It creates instances of anonymous classes based on Net::Twitter::Core with
roles applied. Classes derived from Net::Twitter may be surprised (and die!)
if ->isa('Net::Twitter')
fails on their instances. So, objects created by
Net::Twitter claim to be Net::Twitter instances when queried with isa
.
Net::Twitter contains a bit more code than I’ve shown here. The additional
code, stolen from MooseX::Traits, simply expands role names, passed in the
traits
argument into the Net::Twitter::Role namespace.
This may not be the best possible implementation, but it seems to work well
for all the conditions I have encountered tested. Your feed back is welcome.
You can send me email or find me online. I’m semifor
on IRC and can usually
be found in the #moose and #net-twitter channels on irc.perl.org. I’m also
semifor
on Twitter.
Fri, 12 Jun 2009
Net::Twitter Roadmap
Recently, I became the maintainer of Net::Twitter.
Net::Twitter 3.01 is a complete rewrite using Moose. It uses Moose Roles heavily to provide optional features, API support, and a choice of error handling strategies. There is also a legacy role, Net::Twitter::Role::Legacy, to provide transparent backwards compatibility with version 2.12. The legacy role is applied by default, so aside from new dependencies, a CPAN upgrade should be completely transparent to existing users.
Version 3.01 comes with some new features:
And 3.02 will include an enhancement to the OAuth support for desktop applications. (Twitter made a recent change for OAuth desktop apps requiring the use of a PIN# to obtain access tokens.) A developer release was uploaded to CPAN today including that change. If no bugs are reported, 3.02 will roll out shortly.
A new feature under consideration is inflation of Twitter API responses to
Perl objects, rather than simple HASH refs. So, $status->{user}->{screen_name}
becomes $status->user->screen_name
. And status->created_at
may return a
DateTime object. Some feedback from the Net::Twitter users on this would be
appreciated.
My primary goal for the rewrite was ease of maintenance so that I could respond quickly to the ever changing Twitter API specification. I accomplished much of that goal with Net::Twitter::API. It extends Moose with some sugar so that Twitter API methods are described with a declarative syntax.
The API methods themselves, declared in Net::Twitter::Role::API::REST and the other API::* modules, are dynamically generated. POD for the API methods is also dynamically generated by introspecting the Moose meta data. Even some of the tests are dynamically generated in the same way.
So, when Twitter adds, removes, or changes API methods, there should be little work to do as a maintainer to get a new release prepared, tested, and distributed.
There are many Net::Twitter 2.12 users who will be dismayed by the Moose requirement. Not every one can install Moose and its dependencies, even if they want to. Shared hosting systems, for instance, may not allow installation of arbitrary modules. There are also Net::Twitter users who have created plugins and distribute the module with their applications. They want a module with limited dependencies.
Net::Twitter::Lite was created to serve those users. It does not have all the features and flexibility Net::Twitter does, but it is lean and has no more requirements than Net::Twitter version 2.12. In addition, its API methods and documentation are generated from the Net::Twitter 3.x source. So, an API change can be quickly and reliably propagated to Net::Twitter::Lite.
For those who need Net::Twitter::Lite, the move from Net::Twitter 2.12 is fairly painless, and instructions are included with the module.
Net::Twitter::Lite will get optional OAuth support, soon.
Transparent backwards compatibility is the default in Net::Twitter 3.01. That may change in a future version, but only after a reasonable deprecation plan is communicated to users.
Many thanks to Chris Thompson who authored the original version of Net::Twitter and maintained it through version 2.12.
Tue, 09 Jun 2009
Fighting spam with spam
I've always found it mildly irritating when I get a response to an email message that says:
I'm protecting myself from spam. Please click the link below to complete the verification process. You have to do this only once.
When I receive one of these messages in response to email I did NOT send, it's not just irritating, it's infuriating.
It is a common technique of spammers to not only send spam to their harvested
addresses, but also to use them in forged From
headers. With this type of
spam verification, you get to be a victim twice.
Spam filters have gotten quite good. I rarely received spam in my inbox. Any spam filtering technique that generates unwanted mail itself should be shunned. They are just adding to the problem.
The message I received came from Spam Arrest. It not only came with the verification, but an advertisement for the service. Isn't that the very definition of spam?
I've added an email filter. All mail from spamarrest.com is automatically reported as spam. If you happen to use Spam Arrest—sorry—your mail is undeliverable here.
Sat, 06 Jun 2009
Testing with LWP::UserAgent
LWP::UserAgent has a feature that makes testing applications that use it very easy.
I wrote and maintain the Net::Twitter and Net::Twitter::Lite distributions.1 I needed a way to test API calls without actually hitting the Twitter API servers.
The job required of the Net::Twitter modules is to turn simple Perl method calls into HTTP requests and turn the HTTP responses back into useful Perl data. It is, therefore sufficient to inspect an HTTP::Request object at the point it would normally be transmitted to Twitter without actually sending it. And, to return a suitable HTTP::Response object to test the applications behavior.
It turns out, LWP::UserAgent has a callback mechanism that is perfect for this
task. The add_handler
method takes a phase name, a code reference, and
optionally a matchspec to create callbacks at any of several phases. The
request_send
phase occurs at exactly the right phase for testing: when the
HTTP::Request instance is fully configured and ready to send on the wire. If
the callback returns an HTTP::Response object, no network call is made. The
HTTP::Response provided by the callback is returned to the caller.
Here's an example test using this technique:
1 use Test::More tests => 2;
2 use Net::Twitter::Lite;
3
4 my $nt = Net::Twitter::Lite->new;
5
6 my $request;
7 my %args;
8 my $response = HTTP::Response->new(200, 'OK');
9 $response->content('{"test":"success"}');
10
11 $nt->{ua}->add_handler(request_send => sub {
12 $request = shift;
13
14 $response->request($request);
15 %args = $request->uri->query_form;
16
17 return $response;
18 });
19
20 # additional args in a HASH ref
21 my $search_term = "intelligent life";
22 my $r = $nt->search($search_term, { page => 2 });
23 is $args{q}, $search_term, "q as positional arg";
24 is $args{page}, 2, "page parameter set";
In lines 6-7 I declared some lexical variables that will be available in the callback using a closure.
Line 11 sets up the callback on Net::Twitter::Lite's LWP::UserAgent instance.
The callback receives three parameters: the HTTP::Request instance, the
LWP::UserAgent instance, and a reference to the callback handler itself. Only
the HTTP::Request instance is of interest here. It is assigned to $request
on line 12.
An HTTP::Response includes a reference to its initiating HTTP::Request. Line 14 takes care of that.
Since this particular test is dealing with query parameters, they are extracted from the request's URI on line 15 and stored in the HASH declared on line 7.
On line 17, the HTTP::Response is returned to the caller. This prevents LWP::UserAgent from actually making a network call.
Line 22 makes a Net::Twitter::Lite call that should result in an HTTP GET with
url http://search.twitter.com/search.json?page=2&q=intelligent+life
.
Net::Twitter::Lite should inflate the HTTP::Response contents into an
appropriate Perl representation of the JSON return.
With the request and response available, a variety of tests can be run to unsure Net::Twitter::Lite is behaving as expected.
You may find this technique useful for your own tests.
[1]Net::Twitter versions 2.12 and earlier were written and maintained by Chris Thompson. Version 3 is a complete rewrite using Moose. Net::Twitter::Lite was created for those who cannot or prefer not to install Moose and its dependencies.
About this weblog
This site is the personal weblog of Marc Mims. You can contact Marc
by sending e-mail to:
[email protected].
Marc writes here about cycling, programming, Linux, and other items of personal interest.
This site is syndicated with RSS.
Credits
CSS stolen from Tom Coates who didn't even complain.