[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[nmh-workers] Downloading googlegroup messages
From: |
Bakul Shah |
Subject: |
[nmh-workers] Downloading googlegroup messages |
Date: |
Tue, 07 May 2019 15:14:30 -0700 |
I used https://github.com/icy/google-group-crawler to download
messages from a group I an persuing. Each message is put in a
separate file so it is easy to just link/copy them to numbered
files. I disovered that plain text message work fine but mime
messages don't.
For instance:
$ cd $_GROUP
$ find mbox -type f|head|cat -n|awk '{print "ln ",$2,$1;}'|sh
$ mhlist 3
msg part type/subtype size description
3 multipart/alternative 7888
$ show 3
mhshow: bogus multipart content in message 3
...
I finally tracked it down to this line:
uip/mhparse.c:1191: if (strcmp (bufp + 2, m->mp_start))
mh->mp_start is "0000000000008a6f8e0585b620ff--\n"
while bufp+2 is "0000000000008a6f8e0585b620ff--\r\n"
So the test fails. Manually removing \r fixed this.
This seems to be a bug. The boundary text as per the spec
doesn't include CRLF or LF or CR. What is interesting is that
the message header containing the boundary text also ends with
\r\n so nmh stripped that and then tacked on \n!
- [nmh-workers] Downloading googlegroup messages,
Bakul Shah <=