chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] question on C types


From: Joerg F. Wittenberger
Subject: Re: [Chicken-users] question on C types
Date: 24 Nov 2003 19:39:02 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

> > somehow I can't understand the fine manual.  I want to code some parts
> > in C and can't interface.
> 
> (subtle point about the manual taken ;-)

Maybe it was just a subtle point about silly me.  ;-)

> (in this case a pointer to C_words holding fixnums). Your example will
> break, though, since you can not convert the numbers into a format
> Scheme recognizes.

Hm, why not?  I actually need an opaque array, which is going to be
extensively used (I've seen 45min till I ^C'ed it) from C until there
is _one_ integer result.  Hence I don't want to bitshift a tag to each
int and remove it in such a long loop.

> Another approach (if you want to have a number vector) is to use

Speed is the primary issue.

> SRFI-4 vectors and pass them as of the proper type:
> 
> (declare (uses srfi-4)) ; or (require 'srfi-r), if your registry is set up.
> 
> (let ([result (make-s32vector size)])
>   ((foreign-lambda* void ([s32vector line] [int size])
>     "int *p = line; int i; for(i = 0; i < size; ++i) p[ i ] = i;")
> result size)
>   result)

This (and the chicken code) looks as if that's what I need, thanks.

At first the result appears to work, but as I said, it's a long
running computation.  I need to make sure it doesn't block threading.

Looks as if I need some help, so here are the missing details:

The "Levenshtein"-Egg of Lars Rustemeier is obviously derived from
Lorenzo Seidenari's C-Implementation (though it fails to mention).  I
don't have a reference, but I remember that Lorenzo called that code
"not for production" and that's for a reason.  No, not for one, for
several:

a) The memory consumtion is O(n*m), while at most O(min(n,m)) is
actually required. (with n,m the length of the input strings)
See http://www.merriampark.com/ldc.htm look for

  d=malloc((sizeof(int))*(m+1)*(n+1));

b) The continues like only proof of concept code is allowed to: it
does *not* check "d" the result of the allocation!  Memory corruption
ahead.  Especally in presence of the excessive allocation.  Don't use
that code for real, please!

c) For long strings with a small Levenshtein distance (the most useful
case), much can be safed, by first skiping common pre- and suffixes.

d) It blocks the chicken scheduler.

I hope a fixed most of the bugs.  But it still blocks the scheduler
for me, though I don't know why.  Additionally the code below provides
a short circuited (levenshtein< s t limit), which doesn't compute the
whole distance, in case you just want to check whether it exceeds a
certain limit.

best regards

/Jörg

-- 
The worst of harm may often result from the best of intentions.

;;** Levenshtein

;; (C) 2003 Jörg F. Wittenberger

;; You may use this code under either the GPL or these conditions:

;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions
;; are met:

;; Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.

;; Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the
;; distribution.

;; Neither the name of the author nor the names of its contributors
;; may be used to endorse or promote products derived from this
;; software without specific prior written permission.

;; See http://www.merriampark.com/ld.htm

(define-macro (add1 n)`(+ 1 ,n))

(define (lev-init mx)
  (##sys#check-exact mx 'levenshtein-matrix-allocation)
  (cond-expand
   [unsafe]
   [else (when (fx< mx 0)
               (##sys#error 'levenshtein-matrix-allocation
                            "size is negative" mx))])
  (let ((result (make-u32vector (add1 mx))))
    ((foreign-lambda*
      void
      ((u32vector line) (integer m))
      "int *p=line; int i; for(i=0; i<=m; ++i) p[i]=i;")
     result mx)
    result))

(define (lev-dist line m) (u32vector-ref line m))

(define lev-step!
  (foreign-lambda*
   integer
   ((u32vector matrix) (integer m) (integer i)
    (pointer a) (pointer b) (integer o))
   #<<EOF
#define min(a, b) (((a) < (b)) ? (a) : (b))
  int *d_i = (int*) matrix;
  unsigned char *s = (unsigned char *)(a) + o;
  unsigned char *t = (unsigned char *)(b) + o;
  int distance=d_i[0], j, left, cost;

  d_i[0]=i;
  for(j=1; j<=m; ++j) {               /* row loop */
    left = d_i[j];
    /* Step 5 */
    cost = s[i-1]==t[j-1] ? 0 : 1;
    /* Step 6 */
    d_i[j] =  min(min(d_i[j-1]+1, left+1), distance+cost);
    distance = left;
  }
  return(distance);
EOF
))

(define (lev-0 a b)
  (do ((m (sub1 (string-length a)) (sub1 m))
       (n (sub1 (string-length b)) (sub1 n)))
      ((or (fx< m 0) (fx< n 0) (not (eqv? (string-ref a m) (string-ref b n))))
       (do ((m (add1 m) (sub1 m))
            (n (add1 n) (sub1 n))
            (i 0 (add1 i)))
           ((or (fx< m 1) (fx< n 1)
                (not (eqv? (string-ref a i) (string-ref b i))))
            (values i m n))))))

(define (levenshtein-distance s t)
  (receive
   (off sl tl) (lev-0 s t)
    (cond
     ((eqv? sl 0) tl)
     ((eqv? tl 0) sl)
     (else
      (if (< tl sl)
          (lev-exec (lev-init sl) s t off sl tl)
          (lev-exec (lev-init tl) t s off tl sl))))))

(define (lev-exec matrix s t o m n)
  (do ((i 1 (add1 i)))
      ((> i n) (lev-dist matrix m))
    (lev-step! matrix m i s t o)))

(define (lev-exec< matrix s t o m n limit)
  (let loop ((i 1) (distance 0))
    (cond
     ((>= distance limit) #f)
     ((> i n) (>= (lev-dist matrix m) limit))
     (else (loop (add1 i) (lev-step! matrix m i s t o))))))

(define (levenshtein< s t limit)
  (receive
   (off sl tl) (lev-0 s t)
    (cond
     ((eqv? sl 0) (< tl limit))
     ((eqv? tl 0) (< sl limit))
     (else
      (if (< tl sl)
          (lev-exec< (lev-init sl) s t off sl tl limit)
          (lev-exec< (lev-init tl) t s off tl sl limit))))))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]