bug-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

patch to speed up slow rcs patch processing.


From: Michael Smith
Subject: patch to speed up slow rcs patch processing.
Date: Thu, 06 Oct 2005 12:58:58 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031014 Thunderbird/0.3


Attached is the diff for the file src/rcs.c in the version 1.12.12 of cvs I downloaded from the cvs site. This single file is sufficient to fix the problem I detected.

This patch addresses a problem we were having with a cvs file that was 300K lines long with a patch diff with 200K entries. The update past the head (i.e. version 1.2 where 1.5 is head) entry took upwards of 2-4 minutes per file because the complexity of the patch code was O(m*n) where n=# of lines and m=# of updates. The patch provided here changes this to O(n) complexity (which completes the same operation in 4-6 seconds).

The changes I made from the pre-existing code were to:
1. reverse the order of the fragments to a in-order instead of a reverse order queue 2. perform a single pass to copy the non-deleted and new elements into a new linevector structure
3. copy the result back to the original line structure.

Please consider adding this patch to the release as it can make a huge difference in functionality for projects tracking files which can have this occur (this file was XML with new data values).

I ran the unit tests with this patch and the changes appear to pass all tests. I did receive some sort of IO error during the sanity.sh -p tests, but it appear to be innocuous.

I have not added any tests for this patch because there are no functional input/output relationship changes from this modification.

As per the HACKING document, I am including the consent to use this patch:

"I grant permission to distribute this patch under the terms of the GNU Public License"

Please let me know if you have any questions about how/why something was done. I would also be curious if/when you might be adding the patch -- although I realize that this isn't generally your policy.

--

Michael J. Smith
Software Engineer
SAIC, IDE site
Work: (321) 235-7613
mailto: msmith@ideorlando.org


7035d7034
<     struct deltafrag *dftail;
7038,7039d7036
<       int numlines, lastmodline, offset;
<     struct linevector *orig_lines = lines;
7042d7038
<     numlines = lines->nlines; // start with init # of lines
7052,7060c7048,7049
<     df->next = NULL;
<     if (dfhead == NULL) {
<             dfhead = df;
<     }
<     else {
<         dftail->next = df;
<     }
<     dftail = df;
< 
---
>       df->next = dfhead;
>       dfhead = df;
7101d7089
<         numlines+=df->nlines;
7111d7098
<         numlines-=df->nlines;
7115,7128d7101
<       // new temp data structure to hold new org before
<       // copy back into original structure.
<     lines = xmalloc (sizeof (struct linevector));
<     lines->nlines = lines->lines_alloced = numlines;
<     lines->vector = xnrealloc(NULL,
<                  numlines, sizeof(*lines->vector));
< 
< 
<     // we changed the list order to first to last -- so the
<     // list never gets larger than the size numlines
<     lastmodline = 0; 
<     offset = 0; // offset created when adding/removing lines
<                 // between new and original structure
< 
7132,7142c7105
<         unsigned int ln;
< 
<         // here we need to get to the line where the next insert will
<         // begin which is <df->pos> we will fill up to df->pos with
<         // original items
<         unsigned int deltaend = df->pos-offset;
<         
<         for(;lastmodline<deltaend;lastmodline++) {
<             // we need to copy from the orig structure into new one
<             lines->vector[lastmodline] = 
orig_lines->vector[lastmodline+offset];
<         }
---
>       unsigned int ln;
7151,7200c7114,7116
<                     {
< 
<     const char *textend, *p;
<     const char *nextline_text;
<     struct line *q;
<     int nextline_newline;
<     size_t nextline_len;
<     int online;
< 
<     textend = df->new_lines + df->len;
<     nextline_newline = 0;
<     nextline_text = df->new_lines;
<     online = 0; // which line we are currently adding
<     for (p = df->new_lines; p < textend; ++p) {
<       if (*p == '\n')
<       {
<           nextline_newline = 1;
<           if (p + 1 == textend)
<                   /* If there are no characters beyond the last newline, we
<                       don't consider it another line.  */
<                   break;
< 
<           nextline_len = p - nextline_text;
<           q = xmalloc (sizeof (struct line) + nextline_len);
<           q->vers = addvers;
<           q->text = (char *)q + sizeof (struct line);
<           q->len = nextline_len;
<           q->has_newline = nextline_newline;
<           q->refcount = 1;
<           memcpy (q->text, nextline_text, nextline_len);
<             lines->vector[lastmodline++] = q;
<             offset--;
<     
<           nextline_text = (char *)p + 1;
<           nextline_newline = 0;
<       }
<     }
<     nextline_len = p - nextline_text;
<     q = xmalloc (sizeof (struct line) + nextline_len);
<     q->vers = addvers;
<     q->text = (char *)q + sizeof (struct line);
<     q->len = nextline_len;
<     q->has_newline = nextline_newline;
<     q->refcount = 1;
<     memcpy (q->text, nextline_text, nextline_len);
<     lines->vector[lastmodline++] = q;
<     offset--; // for each line we add the offset between the #'s increases
< 
<                     }
< 
---
>               if (! linevector_add (lines, df->new_lines, df->len, addvers,
>                                     df->pos))
>                   err = 1;
7203,7205c7119,7120
<         offset+=df->nlines; // we are removing this many lines from the source
<               if (df->pos > orig_lines->nlines
<                   || df->pos + df->nlines > orig_lines->nlines)
---
>               if (df->pos > lines->nlines
>                   || df->pos + df->nlines > lines->nlines)
7209c7124,7125
<                       orig_lines->vector[ln]->vers = delvers;
---
>                       lines->vector[ln]->vers = delvers;
>               linevector_delete (lines, df->pos, df->nlines);
7218,7239d7133
<       // add the rest of the remaining lines to the data vector
<     for(;lastmodline<numlines;lastmodline++) {
<             // we need to copy from the orig structure into new one
<         lines->vector[lastmodline] = orig_lines->vector[lastmodline+offset];
<     }
< 
<     // we didn't make the lines structure fully -- so we can't
<     // use the linevector_copy without issues -- so we do it inline
<       // make the original vector larger if necessary
<     if (numlines > orig_lines->nlines) {
<         orig_lines->vector = xnrealloc (orig_lines->vector,
<                               numlines,
<                               sizeof (*orig_lines->vector));
<               orig_lines->lines_alloced = numlines;
<     }
<     memcpy (orig_lines->vector, lines->vector,
<               xtimes (lines->nlines, sizeof (*orig_lines->vector)));
<     orig_lines->nlines = lines->nlines;
< 
<     free(lines->vector);
<     free(lines); // we don't need it any longer
< 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]