bug#17613: 24.3; html2text can't handle weird formatting

From: Mark A. Hershberger
Subject: bug#17613: 24.3; html2text can't handle weird formatting
Date: Tue, 27 May 2014 19:14:04 -0400

I got an email today whose source includes:

                                <a class=3D"mcnButton " title=3D"Profile L=
ink" href=3D"http://secret................................................=
...................................................." target=3D"_blank" st=
yle=3D"font-weight: bold;letter-spacing: normal;line-height: 100%;text-ali=
gn: center;text-decoration: none;color: #FFFFFF;word-wrap: break-word;-ms-=
text-size-adjust: 100%;-webkit-text-size-adjust: 100%;">Create Your Profil=

Line breaks exactly as they are in the email's source.

mu4e is being used to parse and display the email.  In
html2text-get-attr, execution stops on the following code:

       ;; size=3
       ((string-match "[^ ]=[^ ]" this)
        (let ((attr  (nth 0 (split-string this "=")))
              (value (substring prev (1+ (string-match "=" this)))))

with the message:

    Args out of range: "\"", 6, 1

describe-variable for this says:

    this's value is "title=\"Profile"

The html I've posted above is the only place where 'title="Profile' is
in the email.

In GNU Emacs 24.3.1 (x86_64-pc-linux-gnu, GTK+ Version 3.12.1)
 of 2014-05-05 on trouble, modified by Debian
Windowing system distributor `The X.Org Foundation', version 11.0.11501000
System Description:     Debian GNU/Linux testing (jessie)

Configured using:
 `configure '--build' 'x86_64-linux-gnu' '--build' 'x86_64-linux-gnu'
 '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib'
 '--localstatedir=/var/lib' '--infodir=/usr/share/info'
 '--mandir=/usr/share/man' '--with-pop=yes'
 '--with-crt-dir=/usr/lib/x86_64-linux-gnu' '--with-x=yes'
 '--with-x-toolkit=gtk3' '--with-toolkit-scroll-bars'
 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fstack-protector
 --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wall'

Important settings:
  value of $LC_COLLATE: en_US.UTF-8
  value of $LC_CTYPE: en_US.UTF-8
  value of $LC_MESSAGES: en_US.UTF-8
  value of $LANG: en_US.utf8
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Recent messages:
[mu4e] Indexing completed; processed 176499, updated 1, cleaned-up 0
[mu4e] Found 1 matching message
Args out of range: "\"", 6, 1
Type "q" to delete help window.
Making completion list...

