chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Chicken vs Perl


From: Sascha Ziemann
Subject: [Chicken-users] Chicken vs Perl
Date: Tue, 20 Sep 2011 14:11:41 +0200

I tried to use Chicken for a job I would use normally Perl for to find
out whether Chicken might be a useful alternative.

The job is: go through a web site mirror and report a unique list of
all domains from all hrefs.

This is the my Perl version:

#! /usr/bin/perl

use warnings;
use strict;
use File::Find;

my $dir = $ARGV[0] || '.';
my @files;
my %urls;

find ({wanted => sub { push @files, $_ if -f $_; },
       no_chdir => 1}, $dir);

foreach my $file (@files) {
    open (HTML, $file) || die "Can not open file '$file'";
    while (<HTML>) {
        while (/href="(http:\/\/[^"\/?]+)(["\/?].*)/i) {
            $urls{lc $1} = 1;
            $_ = $2; } }
    close (HTML); }

foreach my $url (sort keys %urls) {
    print $url, "\n"; }

The Perl version takes for my test tree about two seconds:

real    0m1.810s
user    0m1.664s
sys     0m0.140s

And this is my Chicken version:

#! /usr/local/bin/csi -s

(require-extension posix regex srfi-69)

(define dir (let ((args (command-line-arguments)))
              (if (pair? args)
                  (car args)
                  ".")))
(define files (find-files dir regular-file?))
(define urls (make-hash-table))
(define href (regexp "href=\"(http://[^\"/?]+)([\"/?].*)" #t))

(for-each
 (lambda (filename)
   (with-input-from-file filename
     (lambda ()
       (let next-line ((line (read-line)))
         (if (not (eof-object? line))
             (let next-href ((found (string-search href line)))
               (if found
                   (begin
                     (hash-table-set! urls (string-downcase (cadr found)) #t)
                     (next-href (string-search href (caddr found)))))
               (next-line (read-line))))))))
     files)

(for-each
 (lambda (arg)
   (printf "~a\n" arg))
 (sort (hash-table-keys urls) string<?))

And now hold on tight! It takes more than one minute for the same data:

real    1m16.540s
user    1m14.849s
sys     0m0.664s

And there is almost no significant performance boost by compiling it:

real    0m1.810s
user    0m1.664s
sys     0m0.140s

So the questions are:

- What is wrong with the Chicken code?
- How can I profile the code?
- Why is there no difference between csi and csc?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]