bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#46342: 28.0.50; socks-send-command munges IP address bytes to UTF-8


From: J.P.
Subject: bug#46342: 28.0.50; socks-send-command munges IP address bytes to UTF-8
Date: Thu, 11 Feb 2021 06:58:00 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

> Then I don't understand why we need to worry about encoding.  IP
> addresses are pure ASCII strings, so they need no encoding whatsoever.

Clearly, I'm failing you here. Between my dearth of communications
skills and lack of Emacs know how, I've obviously managed to deceive you
into thinking that SOCKS IP address fields ought to contain ASCII text
characters such as the following:

  1  9  2  .  1  6  8  .  1  .  1
  31 39 32 2e 31 36 38 2e 31 2e 31

However, this is not the case [a]. In version 4, all addresses are
four-byte sequences, one byte for each component of an IPv4 address, and
the ordering is left-to-right. For example:

  192 168 1  1
  c0  a8  01 01

In version 5, the one covered by RFC 1928, this is extended to include
16-byte IPv6 addresses as well as ASCII domain names. All three are
exclusive to one another but occupy the same field in a union of sorts.
The first byte of that field, the "ATYP" flag, denotes which of the
three to expect, and it appears as the "atype" argument to
socks-send-command.

> I guess I will have to ask you to back up and describe what problems
> you saw with the original code, and show me the details of the strings
> involved in that.

The Elisp manual distinguishes between multibyte and unibyte "sources"
of strings [1]. For these (SOCKS 4) IP address strings, the function
socks--open-network-stream is that source (it creates them). When such
a string includes characters with code points between 128 and 255 (the
latin-1/iso-8859-1 range), single characters are sent as two utf-8
encoded bytes, which the SOCKS service rejects as violating protocol.

Specifically, when a user passes "example.com" to the entry-point
function socks-open-network-stream, its internal helper
socks--open-network-stream resolves the host name into an IP in list
form and then converts this to a string by calling

  (apply #'format "%c%c%c%c" '(93 184 216 34))

This produces a multibyte string of the same character length:

  "]¸Ø\""

However, when socks-send-command passes this to process-send-string,
whose coding system is (binary . binary), the underlying six-byte
sequence is emitted verbatim:

  "]\302\270\303\230\""

My initial idea was to leverage the function unibyte-string to ensure
every character can be encoded in 8 bits before transmission. Regardless,
performing some combination of validating and converting before sending
may be worthwhile since it'll only run once per connection.

Sorry for the extended play-by-play. I certainly hope none of it came
off as insulting or pedantic. I'm quite certain your grasp of such
concepts long ago outpaced any understanding I could ever hope to
attain.

J.P.


[a] My versions of tor and ssh definitely honor requests like

  curl --proxy socks5h://localhost:1080 http://93.184.216.34

passing the IP address as a domain name. Although this defies RFC 1928,
which specifies FQDNs only [1], I'm getting the sense that influential
projects treat the latter more as a living standard. (Note: in its unit
tests, tor only includes this form for its extension commands [2].)

[1] (elisp) Non-ASCII in Strings, second paragraph
[1] https://tools.ietf.org/html/rfc1928#section-5
[2] https://gitweb.torproject.org/tor.git/tree/src/test/test_socks.c#n335





reply via email to

[Prev in Thread] Current Thread [Next in Thread]