

...making Linux just a little more fun!
Jimmy ORegan [joregan at gmail.com]
I have a couple of scripts that almost work, and I was wondering if
anyone (Ben?
could tell me why...
First, I want to convert a list of tags in the IPA PAN's corpus format (subst:pl:dat:f) to Apertium's tag format (n.f.pl.dat). I have this:
#!/usr/bin/perl
use warnings;
use strict;
# tags to replace
my %terms = qw(n nt pri p1 sec p2 ter p3 subst n);
while (<>)
{
my @in = split/:/;
my @out = map { ($terms{$_} ne "") ? $terms{$_} : $_ } @in;
if ($#out > 3) {
my $type = $out[3];
$out[3] = $out[2];
$out[2] = $out[1];
$out[1] = $type;
}
print join '.', @out;
}
That's broken, because it only works for tag sets which have more than
4 entries, but changing the if to "($#out >= 3)" gives me this:
".sg.nomxxs.m3" from "xxs:sg:nom:m3". I also get a lot of warnings:
Use of uninitialized value in string ne at foo.pl line 11, <> line 1085. Use of uninitialized value in string ne at foo.pl line 11, <> line 1086.Next, I have a list of names extracted from a Polish morphology dictionary[1] that I'm trying to convert to a list of word stems and endings. I have this, which works (aside from a couple of errors):
#!/usr/bin/perl
use warnings;
use strict;
use String::Diff qw/diff_fully/;
use Data::Dumper;
#test();
while(<>)
{
s/,\W+$//;
my $endings = $_;
my @a = split/, /;
my $stem = find_stem(@a);
$endings =~ s/$stem//g;
print $stem;
if ($endings =~ /?/) {print ":n.f:";}
elsif ($endings =~ /owie/) {print ":n.m1:";}
else {print ":n.??:";}
print $endings . "\n";
}
sub test()
{
my $test = "Adam, Adama, Adaemie, Adamowi, Adamem, Adamach, Adamami, Adamom";
my @t = split/, /, $test;
print find_stem(@t);
print "\n";
}
sub find_stem()
{
my @in = @_;
my ($r, $l, $cur, $last);
my $i=0;
while ($i<($#in))
{
($r, $l) = diff_fully($in[$i], $in[$i+1]);
$cur = $r->[0]->[1];
$last = $cur if (!$last);
if ($cur ne $last) {
($r, $l) = diff_fully($last, $cur);
$last = $r->[0]->[1];
}
$i++;
}
return $last;
}
but if I change the end of the while() to this:
else {print ":n.??:";}
my @ends = split/, /, $endings;
sort(@ends);
$endings = join(',', at ends);
print $endings . "\n";
to sort the endings, it... doesn't. What am I missing?
[1] "S?ownik alternatywny", under the GPL: http://www.kurnik.pl/slownik/odmiany/
Ben Okopnik [ben at linuxgazette.net]
On Tue, Oct 09, 2007 at 07:20:41PM +0100, Jimmy O'Regan wrote:
> I have a couple of scripts that almost work, and I was wondering if > anyone (Ben?could tell me why...
I'll give it a shot. The only problem is, your script is doing more than what you describe here - so I'm going to have to guess about a few things. Worse yet, since your code isn't doing what it's supposed to do, I'm guessing based on wrong data. But hey, for a friend...
> First, I want to convert a list of tags in the IPA PAN's corpus format > (subst:pl:dat:f) to Apertium's tag format (n.f.pl.dat). I have this:
Something like this, maybe? Again, I'm just guessing.
perl _F: -wlne'shift @F; print "n. at F[2,0,1]"' input_file
> `` > #!/usr/bin/perl > > use warnings; > use strict; > > # tags to replace > my %terms = qw(n nt pri p1 sec p2 ter p3 subst n);Just a personal reaction here - BLECH. I hate having to count terms to figure out what's a key and what's a value.
my %terms = ( n => 'nt', pri => 'p1', sec => 'p2', ter => 'p3', subst => 'n' );
> while (<>)
> {
chomp; # If you don't handle that "\n", you'll be sorry...
> my @in = split/:/;
> my @out = map { ($terms{$_} ne "") ? $terms{$_} : $_ } @in;
What happens when $terms{$_} is undefined? Bad news, that's what. I suspect that this is where your errors are coming from - perhaps with help from the absence of that 'chomp'.
> if ($#out > 3) {
> my $type = $out[3];
> $out[3] = $out[2];
> $out[2] = $out[1];
> $out[1] = $type;
What happens to your $out[0]? Is it just supposed to be ignored? In any case, you could just use a list slice instead of all the manual swapping:
@out[1 .. 3] = @out[3, 1, 2];However, I strongly suspect that the 'map' statement is the source of your problems.
> } > print join '.', @out; > } > '' > > That's broken, because it only works for tag sets which have more than > 4 entries, but changing the if to "($#out >= 3)" gives me this: > ".sg.nomxxs.m3" from "xxs:sg:nom:m3". I also get a lot of warnings:
The best thing you could do to help me help you is by providing a bunch of example inputs and expected outputs. It sounds like it should be trivially simple to mung this stuff; this is the kind of thing that Perl is really good at.
> Next, I have a list of names extracted from a Polish morphology
> dictionary[1] that I'm trying to convert to a list of word stems and
> endings. I have this, which works (aside from a couple of errors):
>
> ``
> #!/usr/bin/perl
>
> use warnings;
> use strict;
>
> use String::Diff qw/diff_fully/;
> use Data::Dumper;
>
> #test();
> while(<>)
> {
> s/,\W+$//;
> my $endings = $_;
> my @a = split/, /;
> my $stem = find_stem(@a);
> $endings =~ s/$stem//g;
> print $stem;
> if ($endings =~ /??/) {print ":n.f:";}
> elsif ($endings =~ /owie/) {print ":n.m1:";}
> else {print ":n.??:";}
> print $endings . "\n";
> }
>
> sub test()
> {
> my $test = "Adam, Adama, Adaemie, Adamowi, Adamem, Adamach, Adamami, Adamom";
> my @t = split/, /, $test;
> print find_stem(@t);
> print "\n";
> }
>
> sub find_stem()
> {
> my @in = @_;
> my ($r, $l, $cur, $last);
> my $i=0;
>
> while ($i<($#in))
> {
> ($r, $l) = diff_fully($in[$i], $in[$i+1]);
>
> $cur = $r->[0]->[1];
> $last = $cur if (!$last);
> if ($cur ne $last) {
> ($r, $l) = diff_fully($last, $cur);
> $last = $r->[0]->[1];
> }
> $i++;
> }
> return $last;
> }
> ''
>
> but if I change the end of the while() to this:
>
> ``
> else {print ":n.??:";}
> my @ends = split/, /, $endings;
> sort(@ends);
> $endings = join(',', at ends);
> print $endings . "\n";
> ''
>
> to sort the endings, it... doesn't. What am I missing?
Like most Perl functions, "sort" doesn't modify the specified variable - it just returns a modified value. Your 'use warnings;' line should definitely have generated a warning about that.
ben at Tyr:/tmp$ perl -wle'@a = qw/3 2 1/; sort @a; print "@a"' Useless use of sort in void context at -e line 1. 3 2 1You can try this instead:
else {
print ":n.??:";
}
# If you're only going to use a variable once, don't bother.
print join(',', sort split/, /, $endings), "\n";
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Tue, Oct 09, 2007 at 04:55:55PM -0400, Benjamin Okopnik wrote:
> > Something like this, maybe? Again, I'm just guessing. > > `` > perl _F: -wlne'shift @F; print "n. at F[2,0,1]"' input_file > ''
Arrgh. Must be the welding fumes getting to me. That should be
perl -F: -walne'shift @F; print "n. at F[2,0,1]"' input_file
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Jimmy ORegan [joregan at gmail.com]
On 09/10/2007, Ben Okopnik <ben at linuxgazette.net> wrote:
> On Tue, Oct 09, 2007 at 07:20:41PM +0100, Jimmy O'Regan wrote: > > I have a couple of scripts that almost work, and I was wondering if > > anyone (Ben?could tell me why... > > I'll give it a shot. The only problem is, your script is doing more than > what you describe here - so I'm going to have to guess about a few > things. Worse yet, since your code isn't doing what it's supposed to do, > I'm guessing based on wrong data. But hey, for a friend... >
Oh, OK. Input:
subst:pl:acc:f subst:pl:acc:m1 subst:pl:acc:m2 subst:pl:acc:m3 subst:pl:acc:n subst:pl:dat:f subst:pl:dat:m1 subst:pl:dat:m2 adj:sg:nom:n:comp adj:sg:nom:n:pos adj:sg:nom:n:sup adj:sg:voc:f:pos adj:sg:voc:m1:compexpected output:
n.f.pl.acc n.m1.pl.acc n.m2.pl.acc n.m3.pl.acc n.nt.pl.acc n.f.pl.dat n.m1.pl.dat n.m2.pl.dat adj.nt.sg.nom.comp adj.nt.sg.nom.pos adj.nt.sg.nom.sup adj.f.sg.voc.pos adj.m1.sg.voc.comp
> > First, I want to convert a list of tags in the IPA PAN's corpus format > > (subst:pl:dat:f) to Apertium's tag format (n.f.pl.dat). I have this: > > Something like this, maybe? Again, I'm just guessing. > > `` > perl _F: -wlne'shift @F; print "n. at F[2,0,1]"' input_file > '' >
Kind of. Everything except for 1..3 passes through, except maybe with a change from that hash.
> > `` > > #!/usr/bin/perl > > > > use warnings; > > use strict; > > > > # tags to replace > > my %terms = qw(n nt pri p1 sec p2 ter p3 subst n); > > Just a personal reaction here - BLECH. I hate having to count > terms to figure out what's a key and what's a value. >
Oh... yeah. I can see a few more terms that'll have to be swapped, and your way is definitely less confusing.
> ``
> my %terms = ( n => 'nt',
> pri => 'p1',
> sec => 'p2',
> ter => 'p3',
> subst => 'n'
> );
> ''
>
> > while (<>)
> > {
>
> chomp; # If you don't handle that "\n", you'll be sorry...
>
I had one in there at one stage; I don't remember why I took it out.
> > my @in = split/:/;
> > my @out = map { ($terms{$_} ne "") ? $terms{$_} : $_ } @in;
>
> What happens when $terms{$_} is undefined? Bad news, that's what. I
> suspect that this is where your errors are coming from - perhaps with
> help from the absence of that 'chomp'.
>
> > if ($#out > 3) {
> > my $type = $out[3];
> > $out[3] = $out[2];
> > $out[2] = $out[1];
> > $out[1] = $type;
>
> What happens to your $out[0]? Is it just supposed to be ignored?
> In any case, you could just use a list slice instead of all the manual
> swapping:
>
It's ignored because it's in the right place.
> `` > @out[1 .. 3] = @out[3, 1, 2]; > '' > > However, I strongly suspect that the 'map' statement is the source of > your problems. >
Yeah... using that was kind of wishful thinking, because I don't really get it. (Yet!)
> > }
> > print join '.', @out;
> > }
> > ''
> >
> > That's broken, because it only works for tag sets which have more than
> > 4 entries, but changing the if to "($#out >= 3)" gives me this:
> > ".sg.nomxxs.m3" from "xxs:sg:nom:m3". I also get a lot of warnings:
>
> The best thing you could do to help me help you is by providing a bunch
> of example inputs and expected outputs. It sounds like it should be
> trivially simple to mung this stuff; this is the kind of thing that Perl
> is really good at.
>
> > ``
> > else {print ":n.??:";}
> > my @ends = split/, /, $endings;
> > sort(@ends);
> > $endings = join(',', at ends);
> > print $endings . "\n";
> > ''
> >
> > to sort the endings, it... doesn't. What am I missing?
>
> Like most Perl functions, "sort" doesn't modify the specified variable -
> it just returns a modified value. Your 'use warnings;' line should
> definitely have generated a warning about that.
>
Ah. So it does.
> ``
> ben at Tyr:/tmp$ perl -wle'@a = qw/3 2 1/; sort @a; print "@a"'
> Useless use of sort in void context at -e line 1.
> 3 2 1
> ''
>
> You can try this instead:
>
> ``
> else {
> print ":n.??:";
> }
>
> # If you're only going to use a variable once, don't bother.
> print join(',', sort split/, /, $endings), "\n";
> ''
Oh, cool. Thanks.
Ben Okopnik [ben at linuxgazette.net]
On Wed, Oct 10, 2007 at 12:16:10AM +0100, Jimmy O'Regan wrote:
> On 09/10/2007, Ben Okopnik <ben at linuxgazette.net> wrote: > > On Tue, Oct 09, 2007 at 07:20:41PM +0100, Jimmy O'Regan wrote: > > > I have a couple of scripts that almost work, and I was wondering if > > > anyone (Ben?could tell me why... > > > > I'll give it a shot. The only problem is, your script is doing more than > > what you describe here - so I'm going to have to guess about a few > > things. Worse yet, since your code isn't doing what it's supposed to do, > > I'm guessing based on wrong data. But hey, for a friend... > > > > Oh, OK. Input: > > `` > subst:pl:acc:f > subst:pl:acc:m1 > subst:pl:acc:m2 > subst:pl:acc:m3 > subst:pl:acc:n > subst:pl:dat:f > subst:pl:dat:m1 > subst:pl:dat:m2 > adj:sg:nom:n:comp > adj:sg:nom:n:pos > adj:sg:nom:n:sup > adj:sg:voc:f:pos > adj:sg:voc:m1:comp > '' > > expected output: > > `` > n.f.pl.acc > n.m1.pl.acc > n.m2.pl.acc > n.m3.pl.acc > n.nt.pl.acc > n.f.pl.dat > n.m1.pl.dat > n.m2.pl.dat > adj.nt.sg.nom.comp > adj.nt.sg.nom.pos > adj.nt.sg.nom.sup > adj.f.sg.voc.pos > adj.m1.sg.voc.comp > ''
Ah. OK, got it.
#!/usr/bin/perl -w # Created by Ben Okopnik on Wed Oct 10 09:21:10 EDT 2007It's a little clunky, but... I've got a lot on my mind this morning.%repl = ( subst => 'n', n => 'nt', );
while (<>){ next unless /:/; chomp;
my @in = split /:/;
if (@in > 4){ @in = @in[0, 3, 1, 2, 4]; $in[1] =~ s/(.*)/$repl{$1}||$1/e; } else { @in = @in[0, 3, 1, 2]; $in[0] =~ s/(.*)/$repl{$1}||$1/e; $in[1] =~ s/(.*)/$repl{$1}||$1/e; } print join(".", @in), "\n"; }
> > However, I strongly suspect that the 'map' statement is the source of > > your problems. > > Yeah... using that was kind of wishful thinking, because I don't > really get it. (Yet!)
[grin] You're not the only one. 'map' can get a little complex - especially since, based on its semantics, it can modify a list "in place" - or return a modified list, leaving the original alone.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *