bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34133: Huge memory usage and output size when using "H" and "G"


From: Hongxu Chen
Subject: bug#34133: Huge memory usage and output size when using "H" and "G"
Date: Sun, 20 Jan 2019 10:23:16 +0800

Hi Assaf,

    Thanks  for the explanation.

    We think the way sed works may suffer from attacks. If the user
downloads some
sed scripts and run *without root privilege*, the host machine may soon
exceed
the memory; in my case, the machine actually hangs and I have to restart
it. The
problem may be severer when the machine is hosting some service or does
the sed relevant service such as text processing (may be rare) itself even
inside
some sandbox. The issue may also be triggered unconsciously thus cause
surprise
and trouble.

> Any program that keeps the input in memory is vulnerable to unbounded
input size

    I think input size is not big; and the size can still be reduced as
long as more "G;H"s
are appended to the script.
 Maybe sed can do something flush to avoid memory usage?

Best Regards,
Hongxu


On Sun, Jan 20, 2019 at 5:27 AM Assaf Gordon <address@hidden> wrote:

> tags 34133 notabug
> close 34133
> stop
>
> Hello,
>
> On 2019-01-19 2:53 a.m., Hongxu Chen wrote:
> >      We found an issue that are relevant to use of "H" and "G" for
> appending
> > hold space and pattern space.
>
> It is an "issue" in the sense that your example does consume large
> amounts of memory, but it is not a bug - this is how sed works.
>
> >      The input file is attached which is a file of 30 lines and 80
> columns
> > filled with 'a'. And my memory is 64G with equivalent swap.
> >
> >        # these two may eat up the memory
> >      sed 's/a/d/; G; H;' input
> >      sed '/b/d; G; H;' input
>
>
> Let's simplify:
> The "s/a/d/" does not change anything related to memory
> (it changes a single letter "a" to "d" in the input), so I'll omit it.
>
> The '/b/d' command is a no-op, because your input does not contain
> the letter "b".
>
> We're left with:
>     sed 'G;H'
> The length of each line also doesn't matter, so I'll use shorter lines.
>
> Now observe the following:
>
> $ printf "%s\n" 0 | sed 'G;H' | wc -l
> 2
> $ printf "%s\n" 0 1 | sed 'G;H' | wc -l
> 6
> $ printf "%s\n" 0 1 2 | sed 'G;H' | wc -l
> 14
> $ printf "%s\n" 0 1 2 3 | sed 'G;H' | wc -l
> 30
> $ printf "%s\n" 0 1 2 3 4 | sed 'G;H' | wc -l
> 62
> $ printf "%s\n" 0 1 2 3 4 5 | sed 'G;H' | wc -l
> 126
> $ printf "%s\n" 0 1 2 3 4 5 6 | sed 'G;H' | wc -l
> 254
> $ printf "%s\n" 0 1 2 3 4 5 6 7 | sed 'G;H' | wc -l
> 510
> $ printf "%s\n" 0 1 2 3 4 5 6 7 8 | sed 'G;H' | wc -l
> 1022
> $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 | sed 'G;H' | wc -l
> 2046
> $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 | sed 'G;H' | wc -l
> 4094
> $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 | sed 'G;H' | wc -l
> 8190
> $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 12 | sed 'G;H' | wc -l
> 16382
>
> Notice the trend?
> The number of lines (and by proxy: size of buffer and memory usage)
> is exponential.
>
> With 20 lines, you'll need O(2^20) = 1M memory (plus size of each line,
> and size of pointers overhead, etc.). Still doable.
>
> With 30 lines, you'll need O(2^30) = 1G of lines.
> If each of your lines is 80 characters, you'll need 80GB (before
> counting overhead of pointers).
>
>
> >       # this is fine
> >      sed '/a/d; G; H;' input
>
> This is "fine" because the "/a/d" command deletes all lines of your
> input, hence nothing is stored in the pattern/hold buffers.
>
> >      I learned from http://www.grymoire.com/Unix/Sed.html that 'G'
> appends
> > hold space to pattern space, and 'H' does the inverse.
> >      In the first two examples, the buffer of hold space will be
> appended to
> > pattern space, and subsequently content of pattern space will be appended
> > to hold space once more. With one more input line, the two buffers will
> be
> > doubled; and as long as the input file is big enough, sed may finally eat
> > up the memory and populate the output.
>
> Yes, that how it works.
>
> >      We think this is vulnerable since it may eat up the memory in a few
> > seconds.
>
> Any program that keeps the input in memory is vulnerable
> to unbounded input size. That is not a bug.
>
> As such, I'm closing this as "not a bug", but discussion can continue
> by replying to this thread.
>
> regards,
>   - assaf
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]