...making Linux just a little more fun!

<-- 2c Tips | TAG Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | Knowledge Base

The Answer Gang

By Jim Dennis, Jason Creighton, Chris G, Karl-Heinz, and... (meet the Gang) ... the Editors of Linux Gazette... and You!



(?) Python vs. Perl

From Benjamin A. Okopnik

Answered By: Mike Orr, Jim Dennis, Rick Moen, Jimmy O'Regan

Have I mentioned, yet, that I DON'T KNOW PYTHON? Just in case I haven't, I DON'T KNOW PYTHON. Since I DON'T KNOW PYTHON, my tweaking is guided by something other than knowledge; it can be approximately imagined as a cross between voodoo, poking a long stick down a deep hole which has ominous sounds issuing from it, and using a large axe to flail wildly in random directions. The only definite part of the process is that is takes a long time.

(!) [Sluggo] You mean you don't find it immediately more readable than Perl? Nice English words instead of obscure symbols, less packing of several operations in one statement?

(?) [snort]

files = ["lg/" + x for x in files if x]

This is supposed to somehow be less packed than a Perl statement?

(!) [JimD] No it's not. However, it's no more inscrutable than a Perl statement either. Personally I unroll such "list comprehensions" when I know that non-Pythonistas are likely to read it:
    results = list() # same as files = []
    for x in files:
        if x:
            files.append("lg/" + x)
    files = results
But the list comprehension syntax is still much cleaner than the horrid:
    files = map(lambda x: "lg/" + x, filter(None, files))
which is the only one-liner I can think of to implement this prior to their introduction.
The closest I can come up with in Perl at the moment is:
    @files = map { "lg/" . $_  if $_}  grep(/./,@files);
... which took me an unfortunately long time (much to Heather's displeasure as I'm supposed to be taking her to a social event right now). I don't use perl often enough; so, the silly @this and $that get rusty and I didn't remember Perl grep().
Is there a good Perl to Python "Rosetta stone" out there that would have helped me find Perl grep() vs. Python filter() correspondence?

(?) Yep - Tom Christiansen's "Python to Perl Conversions" list, which I ran across quite a while ago. Unfortunately, the original link at mail.python.org seems to be dead - but Google still has it cached.

https://tinyurl.com/abrt5

(!) [Rick] For what it's worth, I've saved Tom Christiansen's comparison chart for posterity:
"Python to Perl Conversions" on https://linuxmafia.com/kb/Devtools
(!) [Sluggo] At least it doesn't have a regular expression embedded in it, or worse, a grep statement used to calculate something rather than to extract the matching lines.

(?) So... "if x" is somehow better than "if /x/"? The distinction eludes me. As to "grep", I think you mean "map" - which I understand Python also supports.

(!) [Sluggo] No, I mean 'grep'. Jim posted a perfect example.

(?) Erm... your judgement of perfection falls a bit short of, well, perfection.  :) In the example that Jim gave, that 'grep' is completely superfluous. However, if you wanted to use it for something like this, you could certainly twist it to the purpose:

@files = grep s{.+}{lg/$&}, @files;

However, just because you can doesn't mean that you should. From "perldoc -f grep":

...............

Note that $_ is an alias to the list value, so it can be used to modify the elements of the LIST. While this is useful and supported, it can cause bizarre results if the elements of LIST are not variables. Similarly, grep returns aliases into the original list, much as a for loop#s index variable aliases the list elements. That is, modifying an element of a list returned by grep (for example, in a "foreach", "map" or another "grep") actually modifies the element in the original list. This is usually something to be avoided when writing clear code.

...............

Using 'map' for its intended purpose seems to me to be much clearer, as well as avoiding that variable repetition:

map $_="lg/$_" if /./, @files;
(!) [JimD] Not to defend my example but I'll point out that a) it worked for my test case (the side effect was irrelevant here but I can see where it would BITE someone somewhere else! and b) it looked reasonable to the one perl hacker I happened to show it to over a beer later that after- noon (at the social gathering which constrained my time on this little matter before).

(?) Oh, I wasn't saying that it wouldn't work - just that holding it up as an example of perfection and orthodox usage wasn't appropriate. It would work just fine, though.

(!) [JimD] To me it looks unnatural to use an assignment as the function in the map; but had I known it was legal than I might have tried this.

(?) That's one of those things that makes 'map' so powerful and so complex: it can be used to modify the array being processed, or it can be used to return the result - which means that you can use it "on the fly" to process your data so that it can be seamlessly plugged into the rest of the procedure.

# Voting: multiple offices, multiple candidates

my %offices = (
                Janitor =>      [ "Arnold", "Archie", "Anna", ],
                Peon    =>      [ "Barney", "Betty", "Basil", ],
                Manager =>      [ "Carl", "Cyril", "Cindy",   ],
                CEO     =>      [ "Dave", "Dana", "Darlene",  ]
);

print header, start_html, start_form,
    map table(
    	Tr(
    	    td( "Office: $_" ),
            td( radio_group( -name => $_, -values => $offices{ $_ } ) )
        )
    ), sort keys %offices;

print br, submit( "Done" ), end_form;

Since 'map' and 'grep' both are implicit loops that allow acess to the underlying variable, it's easy to get in trouble when using them... but that's always the case with power tools: if they can cut through an inch of steel, they can certainly take off a finger. Safety guards (a.k.a. "documentation") are there for a reason. :)

(!) [JimD] Oddly enough there was a case today where I had to think about Python's syntax for a minute before I could do what I wanted.
I was reading the new "Best Practices in Perl" O'Reilly book and came across the "Schwartzian Transform" example. This is known as the "DSU pattern" to Pytonistas (decorate, sort, undecorate).

(?) Nicely descriptive, that. You might also want to take a look at the GRT (Guttman-Rosler Transform), which is related but different (a lot faster, for one):

#!/usr/bin/perl -w

my @words = <>;	# Slurp the file contents

my @sorted = map { substr($_, 4) }
             sort
             map { pack( "LA*", tr/eE/eE/, $_ ) } @words;

print "@sorted";
}

"A Fresh Look at Efficient Perl Sorting", Guttman and Rosler
https://www.sysarch.com/perl/sort_paper.html

(!) [JimD] Their putative example was one of returning a list sorted by SHA-1 hashes of the element values.
I decided to write two versions of the example in Python: one that modifies the list in place and the other which returns a copy.
Returning the copy is easy and comes naturally:
from sha import sha
def sha_sorted(seq):
    seq = [ (sha(x).hexdigest(), x) for x in seq ]
    seq.sort()
    seq = [ y for (x,y) in seq ]
    return seq
(Actually I could sort by decimal or hex representations of the hash)
As you can see the first line decorates the list (by creating a list of tuples such that each tuple contains the hash, then the original item). The call to this list's .sort() method is all we need to do the right thing; since the Python .sort() methods are automagically multi-key when dealing with sequence objects like lists and tuples. Then we "undecorate."
That's easy. It's a little trickier to make it modify the object which was passed to it. In Python the seq = [ ... ] is binding a newly created list to the name seq. The object just happens to have been generated from the old object which was, co-incidentally, bound to that name prior to the new binding. We were passed an object reference, so how to me modify the object of that reference rather than creating a new object reference?
Here's the answer:
def sort_by_sha(seq):
    seq[0:] = [ (sha(x).hexdigest(), x) for x in seq ]
    seq.sort()
    seq[0:] = [ y for (x,y) in seq ]
    return None
... we assign a list of results to a slice of the list to which the name referred. It just so happens that we're replacing a slice that includes all of the items; and it doesn't matter if the new slice has more or fewer elements than the original list.
Of course that only works with mutable objects (like lists) so we can't write a function that modifies a normal Python string (which is immutable). We can write a function that returns a modified copy of a string, but we can't modify the string as a side effect of calling a function. In general we have to work to make any function modify something as a side-effect. (Of course our methods can modify their objects all they like ... but immutable members would simply be rebound to new objects with the desired transformations).
In general the Python community would use the first form, so the intended effect (changing "seq") was visible from the calling code:
    seq = sha_sorted(seq)
rather than having this side effect:
    seq = 'this is a test'.split()
    sort_by_sha(seq)
    # ...
(!) [JimD] Overall I just prefer to have less punctuation and more text; easier for me to read and type.
(!) [Sluggo] This also demonstrates another peeve: the magic variable $_.

(?) [shrug] Peeve for you, immensely valuable for huge numbers of programmers worldwide.

(!) [Sluggo] 'map' can hardly be counted against Perl since Python also has it.

(?) Yes, but Perl had it long before Python even existed.

(!) [Sluggo] List interpolation like the above was a pretty unusual step for Python, but it turned out to be immensely popular.

(?) Yeah, anything you steal from Perl is likely to be. :)

(!) [Sluggo] 2.4 added generator interpolation (trade the [] for () and you get an iterator, which uses less memory if you only need the sequence once). We almost got dict interpolation until it was pointed out that you can make dicts like this:
    pairs = [(key1, value1), (key2, value2)]
    d = dict(pair for pair in pairs)
I just wish they'd borrowed Perl's ternary operator (?:). That was shot down coz there were some six different syntaxes with equal support proceeding.

(?) [nod] I'm surprised that Guido didn't repeat Larry's method - i.e., just put his foot down and say "it shall be so." The tendency for people to wank endlessly with trivia is well known in those circles, and even an arbitrary decision at those times is better than none.

(!) [Sluggo] Actually, he did. Guido is not in favor of a ternary operator; he prefers an if-stanza. But it's been requested so persistently over the years -- moreso than any other construct -- so he wrote up a proposal and gave the community one chance to agree on a syntax. https://python.org/peps/pep-0308.html

...............

Following the discussion, a vote was held. While there was an overall interest in having some form of if-then-else expressions, no one format was able to draw majority support. Accordingly, the PEP was rejected due to the lack of an overwhelming majority for change. Also, a Python design principle has been to prefer the status quo whenever there are doubts about which path to take.

...............

You can already do:
result = condition and true_value or false_value

(?) Sure - you should be able to do that in pretty much anything that supports logical operators.

# Perl
print 3 > 2 && "true" || "false"

# Bash
[ 3 -gt 2 ] && echo "true" || echo "false"

etc.

(!) [Sluggo] but it produces the wrong result if true_value is empty (meaning zero, empty string, list/dict without elements, None, False). So you have to make sure true_value can never be empty. The ternary operator would eliminate the need for this paranoia.

(?) [Nod]

(!) [Jimmy] So... on the topic of 'map', how would I use a regex with the map in this:
sub am_asz
{
    my $verb = shift;

    my @end = qw (am asz a amy acie aj$a);
    return qw (dam dasz da damy dacie dadz$a) if ($verb eq "da$c");
    return qw (mam masz ma mamy macie maj$a) if ($verb eq "mie$c");
    return "error" if (substr ($verb, -2) ne "a$c");
    return map {substr ($verb, 0, -2).$_} @end;
};

(?)


>    return qw (dam dasz da damy dacie dadz$a) if ($verb eq "da$c");
>                                          ^^

That's not going to work - "qw" disables variable interpretation.

(!) [Jimmy] Ah. Well, to answer your later question, those are the extra non-ascii characters in Polish. I just changed them here because I didn't want to mess around with character sets in my mailer, but in the actual version I use the real characters.
(I've chosen to return an array of possibilities in the event that the caller hasn't given enough information. I'm trying to find a way to sort by likelihood, but that may be a pipe dream).

(?) What, weighted sorting? Pshaw. Pshaw, I say (perhaps because nobody alive today pays even the slightest attention to a silly word like "pshaw". I suspect he was bshaw's younger brother, the one that was always picked last for the soccer team and such... But I Digress.)

#!/usr/bin/perl -w
# Created by Ben Okopnik on Thu Aug 18 01:11:46 EDT 2005

# I actually had to go look this up in a *book*. Jimmy, I'll never
# forgive you. :)
#
# Modified from algorithm shown in "Mastering Algorithms with Perl"
# (Jon Orwant, Jarkko Hietaniemi, and John Macdonald)

sub weighted {
    my( $dist, $key_order, $total_weight ) = @_;
    my $running_weight;
    $key_order = [ sort { $dist->{$a} <=> $dist->{$b} } keys %$dist ]
        unless $key_order;
    unless ( $total_weight ) {
        for (@$key_order) { $total_weight += $dist->{$_} }
    }
    # Get a random value
    my $rand = rand( $total_weight );
    # Use it to determine a key
    while( my( $key, $weight ) = ( each %$dist ) ) {
        return $key if ($running_weight += $weight) >= $rand;
    }
}

%choices = ( eenie => 1, meenie => 10, mynie => 1, moe => 1 );

print weighted( \%choices ), "\n" for 1 .. 25;
(!) [Jimmy] ...but if you had to look it up, maybe I was underestimating the 'how' part :)

(?) Heh. That's more of an indicator of what I usually do or don't do with Perl; I'd seen this thing before, and had actually reconstructed ~80% of it from my memory of the underlying algorithm - but since the other 20% wasn't instantly forthcoming, I said "HUMPH!" very loudly and went to my bookshelf.  :) "MAwP" is a book of Extreme Coolness, anyway - it was a pleasure to revisit.

(?) Note the flexibility of the above: the key order and the total weight can be supplied in the invocation - or they'll be computed using the standard assumptions.

Loading the values for each verb from a *DBM hash or a MySQL DB is left to the individual student. :)

(!) [Jimmy] My son found your choice of choices highly amusing :)

(?) Oh, if I'd only known! I was going to make it a bunch of Smarties colors along with the freq of occurrence, but didn't have a bag available on the boat. :)

(?) I'm not really clear on what you're asking here, Jimmy - what's "$c"? What is "$a"? What are you trying to achieve by using a regex?

(!) [Jimmy] I was wondering if there was a way of doing something like this:
if ($verb =~ /(.*)ac$/)
{
    return map {$1.$_} @end;
};
where the suffix stripping was done in the map statement, like I've done with substr.

(?) Seems OK to me:

ben@Fenrir:~$ perl -wlne'sub a { map $1.$_, qw/x y z/ if /(.*)ac$/ }; print "\t$_" for a'
maniac
        manix
        maniy
        maniz
sumac
        sumx
        sumy
        sumz
tarmac
        tarmx
        tarmy
        tarmz
lacy

(!) [Jimmy] Yep, that's what I was thinking about. It was a forest and trees problem.

(?) Part of what I'm seeing, though, would be better done with a hash:

(!) [Jimmy] I'm using a hash in the (as yet incomplete) public function to check for the really odd exceptions.

(?)

sub am_asz {
    my verb = shift;
    my %list = ( da  => [ ("dam", "dasz", "da", "damy", "dacie", "dadz$a") ],
    	         mie => [ ("mam", "masz", "ma", "mamy", "macie", "maj$a")  ],
    	     # Add whatever other correspondences you want
    );

    # I have no idea what these actually are, so I just made shit up.
    my ($root, $base, $suffix) = $verb =~ /^((..).*)(..)$/;
    return @{ $list{ $base } };
    $suffix eq "a$c" ? map "$root$_", @end : "error";
}

I can only hope that I've hit somewhere near the target by throwing a dart while blindfolded...  :) Note that your algorithm presumes a word with a minimum of four characters, so my code reflects that restriction as well.

(!) [Jimmy] It does?
/me checks
Nope. Works with a couple of random, made-up three letter verbs.

This page edited and maintained by the Editors of Linux Gazette
HTML script maintained by Heather Stern of Starshine Technical Services, https://www.starshine.org/


Each TAG thread Copyright © its authors, 2005

Published in issue 118 of Linux Gazette September 2005

<-- 2c Tips | TAG Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | Knowledge Base
Tux