Tue, 30 Jun 2009

RFC: Math::Round::Fair

I’ve developed a small module, Math::Round::Fair, which currently resides on github. If the name is suitable and nothing like it already exists on CPAN, I’ll upload it after some time for comment.

I blogged about this a few years ago, but until I needed the algorithm again, recently, I didn’t attempt to create a CPAN module for it.

From the POD:

This module provides a single, exportable function, “round_fair”, which allocates an integer value, fairly distributing rounding errors.

Consider the problem of distributing one indivisible item, for example a penny, across three evenly weighted accounts, A, B, and C.

Using a naive approach, none of the accounts will receive an allocation since the allocated portion to each is 1/3 and 1/3 rounds to zero. We are left with 1 unallocated item.

Another approach is to adjust the basis at each step. We start with 1 item to allocate to 3 accounts. 1/3 rounds to 0, so account A receives no allocation, and we drop it from consideration. Now, we have 2 accounts and one item to allocate. 1/2 rounds to 1, so we allocate 1 item to the account B. Account C gets no allocation since there is nothing left to allocate.

But what happens if we allocate one item to the same three accounts 10,000 times? Ideally, two accounts should end up with 3,333 items and one should end up with 3,334 items.

Using the naive approach, all three accounts receive no allocation since at each round the allocation is 1/3 which rounds to zero. Using the second method, account A and account C will received no allocation, and account B will receive a total allocation of 10,000 items. Account B always receives the benefit of the rounding errors.

“round_fair” uses an algorithm with randomness to ensure a fair distribution of rounding errors. In our example problem, we start with 1 item to allocate. We calculate account A’s share, 1/3. Since it is less than one item, we give it a 1/3 chance of rounding up (and, therefore, a 2/3 chance of rounding down). It wins the allocation 1/3 of the time. 2/3 of the time we continue to B. We calculate B’s allocation as 1/2 (since there are only 2 accounts remaining and one item to allocate). B rounds up 1/2 of 2/3 (or 1/3) of the time and down 1/2 of 2/3 (or 1/3) of the time. If neither A nor B rounds up (which occurs 2/3 * 1/2, or 1/3 of the time), C’s allocation is calculated as 1/1 since we have one item to allocate and only one account to allocate it to. So, 1/3 of the time C receives the benefit of the rounding error. We never end up with any unallocated items.

This algorithm works for any number of weighted allocations.

The code is small enough to include here as well:

sub round_fair {
    my $value = shift;

    croak "Value to be allocated must be an integer" unless int($value) == $value;

    my $basis = 0;
    for my $w ( @_ ) {
        croak "Weights must be > 0" unless $w > 0;
        $basis += $w;
    }

    return ($value) if @_ == 1;

    map {
        my $allocation = $value * $_ / $basis;
        my $allocated  = int $allocation;
        my $remainder  = $allocation - $allocated;
        ++$allocated if rand() < $remainder;
        $basis -= $_;
        $value -= $allocated;
        $allocated
    } @_;
}

Send comments to Marc Mims or post them on github.

[/perl] [link]

Transparent backwards compatibility using Moose

The current Net::Twitter is a complete rewrite of a prior version using Moose. One of my design goals was transparent backwards compatibility so that existing code based on Net::Twitter would continue to run, unchanged (provided the additional required modules were installed).

By using Moose roles I was able to factor out optional features and legacy behavior into separate classes. The core functionality became Net::Twitter::Core. Net::Twitter itself became an object factory applying appropriate roles to create a concrete classes including the appropriate mix of optional features then instances of those classes.

The Legacy role provides attributes and functionality that was not carried forward into Net::Twitter::Core from Net::Twitter version 2.12. It modifies the new core functionality with Moose method modifiers (in particular, the around modifier). It also applies the set of other, optional roles required to make Net::Twitter backwards compatible.

The interesting and challenging work was getting the Net::Twitter factory class correct. Initially, I used MooseX::Traits in Net::Twitter::Core which provides the role application feature I needed with its new_with_traits method. But simply having Net::Twitter->new call Net::Twitter::Core->new_with_traits wasn’t sufficient for Net::Twitter derived classes. I ended up stealing a bit of the code and the concepts from MooseX::Traits to get the behavior I needed.

Net::Twitter’s only public method is new. It can be called with a traits argument, or with a legacy argument which is just a shortcut to save a bit of typing. Traits and roles are the same thing. I use the terms interchangeably here.

# The following are all equivalent, creating objects
# backwards compatible with Net::Twitter 2.12:
my $nt = Net::Twitter->new;
my $nt = Net::Twitter->new(legacy => 1);
my $nt = Net::Twitter->new(traits => ['Legacy']);

# New can also create non-legacy variations
my $nt = Net::Twitter->new(traits => ['API::REST', 'OAuth']);

Here is the complete new method:

sub new {
    my $class = shift;

    croak '"new" is not an instance method' if ref $class;

    my %args = @_ == 1 && ref $_[0] eq 'HASH' ? %{$_[0]} : @_;

    my $traits = delete $args{traits};

    if ( defined (my $legacy = delete $args{legacy}) ) {
        croak "Options 'legacy' and 'traits' are mutually exclusive. Use only one."
            if $traits;

        $traits = [ $legacy ? 'Legacy' : 'API::REST' ];
    }

    $traits ||= [ qw/Legacy/ ];
    $traits   = [ $class->$resolve_traits(@$traits) ];

    my $superclasses = [ 'Net::Twitter::Core' ];
    my $meta = $create_anon_class->($superclasses, $traits, 1);

    # create a Net::Twitter::Core object with roles applied
    my $new = $meta->name->new(%args);

    # rebless it to include a subclass, if necessary
    if ( $class ne __PACKAGE__ ) {
        unshift @$superclasses, $class;
        my $final_meta = $create_anon_class->($superclasses, $traits, 0);
        bless $new, $final_meta->name;
    }

    return $new;
}

The new method expects either a single HASH reference or a list of key/value pairs. It checks for and dereferences a HASH ref argument. Then it normalizes the optional legacy argument to the appropriate traits. If no traits or legacy argument is provided, it uses the default Legacy trait.

The first interesting bit of code is the call to $create_anon_class, which may be called twice.

The first call to $create_anon_class creates a meta-class from Net::Twitter::Core with the appropriate roles applied and assigns it to the variable $meta. Anonymous classes aren’t /really/ anonymous; they have auto-generated names. So, after creating the meta class, we create an instance of the class it represents and assign the instance to the variable $new.

If Net::Twitter->new was called directly, we’re done. We can simply return the new instance. However, if new was called from some Net::Twitter derived class we need to create another anonymous class with the derived class prepended to the list of superclasses. It gets assigned to the variable $final_meta.

We couldn’t create the first meta class with the derived class specified in superclasses, because calling new on the class it represents is likely to be infinitely recursive!

Consider this Net::Twitter derived class:

package My::Net::Twitter::DerivedClass;
use base 'Net::Twitter';

sub some_new_method { ... }

Net::Twitter’s new method will be called with the $class argument My::Net::Twitter::DerivedClass. Calling new on the resulting anonymous class, with My::Net::Twitter::DerivedClass in superclasses will result in an infinite recursion since Net::Twitter’s new will be called, again.

So, after creating an instance of the anonymous class without My::Net::Twitter::DerivedClass in superclasses, we create another anonymous class and assign it to the variable $final_meta. Then we rebless our existing instance into the new anonymous class and return it. Reblessed into the new class, it can find and use the derived class methods.

$create_anon_class has some interesting features.

my $create_anon_class = sub {
    my ($superclasses, $traits, $immutable) = @_;

    my $meta;
    $meta = Net::Twitter::Core->meta->create_anon_class(
        superclasses => $superclasses,
        roles        => $traits,
        methods      => { meta => sub { $meta }, isa => $isa },
        cache        => 1,
    );
    $meta->make_immutable(inline_constructor => $immutable);

    return $meta;
};

With the cache argument set to 1, create_anon_class caches the classes it creates and returns an existing instance of the mata-class when the same arguments are passed again. So, although create_anon_class may be called an arbitrary number of times in the life of an application, a new, anonymous class is only created when different roles or superclasses are specified.

A final point of interest is the method isa added to to the anonymous classes.

my $isa = sub {
    my $self = shift;
    my $isa  = shift;

    return $isa eq __PACKAGE__ || $self->SUPER::isa($isa)
};

Net::Twitter is an object factory. It does not create instances of itself. It creates instances of anonymous classes based on Net::Twitter::Core with roles applied. Classes derived from Net::Twitter may be surprised (and die!) if ->isa('Net::Twitter') fails on their instances. So, objects created by Net::Twitter claim to be Net::Twitter instances when queried with isa.

Net::Twitter contains a bit more code than I’ve shown here. The additional code, stolen from MooseX::Traits, simply expands role names, passed in the traits argument into the Net::Twitter::Role namespace.

This may not be the best possible implementation, but it seems to work well for all the conditions I have encountered tested. Your feed back is welcome. You can send me email or find me online. I’m semifor on IRC and can usually be found in the #moose and #net-twitter channels on irc.perl.org. I’m also semifor on Twitter.

[/perl] [link]

Fri, 12 Jun 2009

Net::Twitter Roadmap

Recently, I became the maintainer of Net::Twitter.

Net::Twitter 3.01 is a complete rewrite using Moose. It uses Moose Roles heavily to provide optional features, API support, and a choice of error handling strategies. There is also a legacy role, Net::Twitter::Role::Legacy, to provide transparent backwards compatibility with version 2.12. The legacy role is applied by default, so aside from new dependencies, a CPAN upgrade should be completely transparent to existing users.

Version 3.01 comes with some new features:

  • full coverage of the Twitter REST API, including the new saved_searches methods
  • OAuth support
  • optional, exception based error handling

And 3.02 will include an enhancement to the OAuth support for desktop applications. (Twitter made a recent change for OAuth desktop apps requiring the use of a PIN# to obtain access tokens.) A developer release was uploaded to CPAN today including that change. If no bugs are reported, 3.02 will roll out shortly.

A new feature under consideration is inflation of Twitter API responses to Perl objects, rather than simple HASH refs. So, $status->{user}->{screen_name} becomes $status->user->screen_name . And status->created_at may return a DateTime object. Some feedback from the Net::Twitter users on this would be appreciated.

My primary goal for the rewrite was ease of maintenance so that I could respond quickly to the ever changing Twitter API specification. I accomplished much of that goal with Net::Twitter::API. It extends Moose with some sugar so that Twitter API methods are described with a declarative syntax.

The API methods themselves, declared in Net::Twitter::Role::API::REST and the other API::* modules, are dynamically generated. POD for the API methods is also dynamically generated by introspecting the Moose meta data. Even some of the tests are dynamically generated in the same way.

So, when Twitter adds, removes, or changes API methods, there should be little work to do as a maintainer to get a new release prepared, tested, and distributed.

There are many Net::Twitter 2.12 users who will be dismayed by the Moose requirement. Not every one can install Moose and its dependencies, even if they want to. Shared hosting systems, for instance, may not allow installation of arbitrary modules. There are also Net::Twitter users who have created plugins and distribute the module with their applications. They want a module with limited dependencies.

Net::Twitter::Lite was created to serve those users. It does not have all the features and flexibility Net::Twitter does, but it is lean and has no more requirements than Net::Twitter version 2.12. In addition, its API methods and documentation are generated from the Net::Twitter 3.x source. So, an API change can be quickly and reliably propagated to Net::Twitter::Lite.

For those who need Net::Twitter::Lite, the move from Net::Twitter 2.12 is fairly painless, and instructions are included with the module.

Net::Twitter::Lite will get optional OAuth support, soon.

Transparent backwards compatibility is the default in Net::Twitter 3.01. That may change in a future version, but only after a reasonable deprecation plan is communicated to users.

Many thanks to Chris Thompson who authored the original version of Net::Twitter and maintained it through version 2.12.

[/perl] [link]

Tue, 09 Jun 2009

Fighting spam with spam

I've always found it mildly irritating when I get a response to an email message that says:

I'm protecting myself from spam. Please click the link below to complete the verification process. You have to do this only once.

When I receive one of these messages in response to email I did NOT send, it's not just irritating, it's infuriating.

It is a common technique of spammers to not only send spam to their harvested addresses, but also to use them in forged From headers. With this type of spam verification, you get to be a victim twice.

Spam filters have gotten quite good. I rarely received spam in my inbox. Any spam filtering technique that generates unwanted mail itself should be shunned. They are just adding to the problem.

The message I received came from Spam Arrest. It not only came with the verification, but an advertisement for the service. Isn't that the very definition of spam?

I've added an email filter. All mail from spamarrest.com is automatically reported as spam. If you happen to use Spam Arrest—sorry—your mail is undeliverable here.

[/internet] [link]

Sat, 06 Jun 2009

Testing with LWP::UserAgent

LWP::UserAgent has a feature that makes testing applications that use it very easy.

I wrote and maintain the Net::Twitter and Net::Twitter::Lite distributions.1 I needed a way to test API calls without actually hitting the Twitter API servers.

The job required of the Net::Twitter modules is to turn simple Perl method calls into HTTP requests and turn the HTTP responses back into useful Perl data. It is, therefore sufficient to inspect an HTTP::Request object at the point it would normally be transmitted to Twitter without actually sending it. And, to return a suitable HTTP::Response object to test the applications behavior.

It turns out, LWP::UserAgent has a callback mechanism that is perfect for this task. The add_handler method takes a phase name, a code reference, and optionally a matchspec to create callbacks at any of several phases. The request_send phase occurs at exactly the right phase for testing: when the HTTP::Request instance is fully configured and ready to send on the wire. If the callback returns an HTTP::Response object, no network call is made. The HTTP::Response provided by the callback is returned to the caller.

Here's an example test using this technique:


    1	use Test::More tests => 2;
    2	use Net::Twitter::Lite;
    3	
    4	my $nt = Net::Twitter::Lite->new;
    5	
    6	my $request;
    7	my %args;
    8	my $response = HTTP::Response->new(200, 'OK');
    9	$response->content('{"test":"success"}');
   10	
   11	$nt->{ua}->add_handler(request_send => sub {
   12	    $request = shift;
   13	
   14	    $response->request($request);
   15	    %args = $request->uri->query_form;
   16	
   17	    return $response;
   18	});
   19	
   20	# additional args in a HASH ref
   21	my $search_term = "intelligent life";
   22	my $r = $nt->search($search_term, { page => 2 });
   23	is $args{q},    $search_term, "q as positional arg";
   24	is $args{page}, 2,            "page parameter set";

In lines 6-7 I declared some lexical variables that will be available in the callback using a closure.

Line 11 sets up the callback on Net::Twitter::Lite's LWP::UserAgent instance.

The callback receives three parameters: the HTTP::Request instance, the LWP::UserAgent instance, and a reference to the callback handler itself. Only the HTTP::Request instance is of interest here. It is assigned to $request on line 12.

An HTTP::Response includes a reference to its initiating HTTP::Request. Line 14 takes care of that.

Since this particular test is dealing with query parameters, they are extracted from the request's URI on line 15 and stored in the HASH declared on line 7.

On line 17, the HTTP::Response is returned to the caller. This prevents LWP::UserAgent from actually making a network call.

Line 22 makes a Net::Twitter::Lite call that should result in an HTTP GET with url http://search.twitter.com/search.json?page=2&q=intelligent+life. Net::Twitter::Lite should inflate the HTTP::Response contents into an appropriate Perl representation of the JSON return.

With the request and response available, a variety of tests can be run to unsure Net::Twitter::Lite is behaving as expected.

You may find this technique useful for your own tests.

[1]Net::Twitter versions 2.12 and earlier were written and maintained by Chris Thompson. Version 3 is a complete rewrite using Moose. Net::Twitter::Lite was created for those who cannot or prefer not to install Moose and its dependencies.

[/perl] [link]

About this weblog

This site is the personal weblog of Marc Mims. You can contact Marc by sending e-mail to:
[email protected].

Marc writes here about cycling, programming, Linux, and other items of personal interest.

This site is syndicated with RSS.

Archives

Credits

CSS stolen from Tom Coates who didn't even complain.