[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: base64.c vs. newlines
From: |
Simon Josefsson |
Subject: |
Re: base64.c vs. newlines |
Date: |
Thu, 17 Jan 2008 15:18:52 +0100 |
User-agent: |
Gnus/5.110007 (No Gnus v0.7) Emacs/22.1 (gnu/linux) |
Jim Meyering <address@hidden> writes:
> Hi Simon,
>
> coreutils' `base64 --decode's inability to decode its own encoded
> output (without the sledgehammer of --ignore-garbage) finally got
> to me :-)
Hi Jim! Finally, reviewing this got to me. :-)
> So I bit the bullet and changed gnulib's base64.c and coreutils'
> src/base64.c so they decode embedded newlines unconditionally.
While that behaviour is likely better for src/base64.c (i.e., the
command line interface), it is not a good thing for lib/base64.c, where
the changes is done now. I would prefer that this behaviour was
optional in the library part. Either a flags parameter, or a new API?
> Note that it doesn't yet do anything about the examples
> in comments at the top of lib/base64.c -- IMHO, examples
> belong in test cases that can be checked automatically, so I'd
> like to remove those. Ok with you?
Do you refer to the following snippet? This was intended as
documentation for using the API, and I think it is useful to have it
somewhere. I don't regard it as an example. The error handling
interaction is a bit tricky to get right.
* Be careful with error checking. Here is how you would typically
* use these functions:
*
* bool ok = base64_decode_alloc (in, inlen, &out, &outlen);
* if (!ok)
* FAIL: input was not valid base64
* if (out == NULL)
* FAIL: memory allocation error
* OK: data in OUT/OUTLEN
*
* size_t outlen = base64_encode_alloc (in, inlen, &out);
* if (out == NULL && outlen == 0 && inlen != 0)
* FAIL: input too long
* if (out == NULL)
* FAIL: memory allocation error
* OK: data in OUT/OUTLEN.
> Note also that I haven't yet updated tests/test-base64.c.
> However, I have added quite a few to coreutils' tests/misc/base64.
I don't think merging all of these back is important. But I've just
noticed that there is no modules/base64-tests in gnulib, so the self
test that we have is never invoked. I'm fixing that separately.
> I've paid careful attention to the efficiency of the resulting code.
> The new decoder is just as fast as the original, at 170MB/s when
> the input contains no newlines. However, when the input contains
> newlines, the new "base64 --decode" is more than 5 times faster than
> the old "base64 --decode --ignore-garbage" (163.6MB/s vs. 29.0MB/s).
Neat! The --ignore-garbage was quite inefficient, so I'm not surprised.
> This is mainly to see if you're ok with the spirit of the change.
> I can imagine you may prefer not to change the existing base64_decode
> interfaces, and instead use a new pair of functions. Let me know,
> and I'll finish the job. E.g., I haven't updated the base64 documentation
> in coreutils yet, either.
Indeed, I think being able to base64 decode with just one API call is
important. So I'd prefer not to change the API. How about this
approach:
-extern void base64_decode_ctx_init (struct base64_decode_context *ctx);
-extern bool base64_decode (struct base64_decode_context *ctx,
- const char *restrict in, size_t inlen,
+extern bool base64_decode (const char *restrict in, size_t inlen,
char *restrict out, size_t *outlen);
-extern bool base64_decode_alloc (struct base64_decode_context *ctx,
- const char *in, size_t inlen,
+extern bool base64_decode_alloc (const char *in, size_t inlen,
char **out, size_t *outlen);
+extern void base64_decode_ctx_init (struct base64_decode_context *ctx);
+extern bool base64_decode_ctx_work (struct base64_decode_context *ctx,
+ const char *restrict in, size_t inlen,
+ char *restrict out, size_t *outlen);
+extern bool base64_decode_ctx_alloc (struct base64_decode_context *ctx,
+ const char *in, size_t inlen,
+ char **out, size_t *outlen);
+
As for going forward, perhaps we could commit the coreutils base64.{c,h}
into gnulib, and then discuss particular patches that achieve:
1) make sure the old non-context API continue to work. It would be
implemented by calling the context-init and then the context-work
functions.
2) make the old API not remove \n's, and,
3) add a new API that silently removes \n's. It can either take a
'int flags' variable which can hold a BASE64_REMOVE_LF flag, or 0,
or that the new function always remove \n's. If we chose the flags
approach, the old API could be implemented by calling this new API
with a flag of 0.
What do you think?
/Simon
- Re: base64.c vs. newlines,
Simon Josefsson <=