Discussion:
[PATCH] ash: reset tokpushback before prompting while parsing heredoc
Christoph Schulz
2018-11-11 12:59:58 UTC
Permalink
The parser reads from an already freed memory location, thereby causing
unpredictable results, in the following situation:

- ENABLE_ASH_EXPAND_PRMT is enabled
- heredoc is being parsed
- command substitution is used within heredoc

Examples where this bug crops up are (PS2 is set to "> "):

$ cat <<EOF
`echo abc`
EOF
-sh: O: not found

$ cat <<EOF
$(echo abc)
EOF
-sh: ���i�: not found

The presumable reason is that setprompt_if() causes a nested expansion when
ENABLE_ASH_EXPAND_PRMT is enabled, therefore leaving "wordtext" in an unusable
state. However, when parseheredoc() is called, "tokpushback" is non-zero, which
causes the next call to xxreadtoken() to return TWORD, causing the caller to
use the invalid "wordtoken" instead of reading the next valid token.

The call chain is:

list()
-> peektoken() [sets tokpushback to 1]
-> parseheredoc()
-> setprompt_if()
-> pushstackmark()
-> expandstr()
-> readtoken1()
[sets lasttoken to TWORD, wordtoken points to expanded prompt]
-> popstackmark() [invalidates wordtoken, leaves lasttoken as is]
-> readtoken1()
-> ...parsebackq
-> list()
-> andor()
-> pipeline()
-> readtoken()
-> xxreadtoken()
[tokpushback non-zero, reuse lasttoken and wordtext]

Note that in almost all other contexts, each call to setprompt_if() is preceded
by setting "tokpushback" to zero. One exception is "oldstyle" backquote parsing
in readtoken1(), but there "tokpushback" is reset afterwards. The other
exception is nlprompt(), but this function is only used within readtoken1()
(but in contexts where no nested calls to xxreadtoken() occur) and xxreadtoken()
(where "tokpushback" is guaranteed to be zero).

Signed-off-by: Christoph Schulz <***@kristov.de>
---
shell/ash.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/shell/ash.c b/shell/ash.c
index 051cc671f..18848c7db 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -12913,6 +12913,7 @@ parseheredoc(void)
heredoclist = NULL;

while (here) {
+ tokpushback = 0;
setprompt_if(needprompt, 2);
readtoken1(pgetc(), here->here->type == NHERE ? SQSYNTAX : DQSYNTAX,
here->eofmark, here->striptabs);
--
2.18.1
Alexander Dahl
2018-11-13 08:38:23 UTC
Permalink
Hei hei,

I just tested the bugfix on top of current master (1_29_0-154-g375fc78d5), the
output is like this:

% ./busybox ash
/mnt/data_2/adahl/src/busybox $ cat <<EOF
$(echo abc)
EOF
abc
/mnt/data_2/adahl/src/busybox $ cat <<EOF
`echo abc`
EOF
abc
/mnt/data_2/adahl/src/busybox $ %

Looks right to me, so you may add my

Tested-by: Alexander Dahl <***@lespocky.de>

Greets
Alex
The parser reads from an already freed memory location, thereby causing
- ENABLE_ASH_EXPAND_PRMT is enabled
- heredoc is being parsed
- command substitution is used within heredoc
$ cat <<EOF
`echo abc`
EOF
-sh: O: not found
$ cat <<EOF
$(echo abc)
EOF
-sh: ���i�: not found
The presumable reason is that setprompt_if() causes a nested expansion when
ENABLE_ASH_EXPAND_PRMT is enabled, therefore leaving "wordtext" in an
unusable state. However, when parseheredoc() is called, "tokpushback" is
non-zero, which causes the next call to xxreadtoken() to return TWORD,
causing the caller to use the invalid "wordtoken" instead of reading the
next valid token.
list()
-> peektoken() [sets tokpushback to 1]
-> parseheredoc()
-> setprompt_if()
-> pushstackmark()
-> expandstr()
-> readtoken1()
[sets lasttoken to TWORD, wordtoken points to expanded prompt]
-> popstackmark() [invalidates wordtoken, leaves lasttoken as is]
-> readtoken1()
-> ...parsebackq
-> list()
-> andor()
-> pipeline()
-> readtoken()
-> xxreadtoken()
[tokpushback non-zero, reuse lasttoken and wordtext]
Note that in almost all other contexts, each call to setprompt_if() is
preceded by setting "tokpushback" to zero. One exception is "oldstyle"
backquote parsing in readtoken1(), but there "tokpushback" is reset
afterwards. The other exception is nlprompt(), but this function is only
used within readtoken1() (but in contexts where no nested calls to
xxreadtoken() occur) and xxreadtoken() (where "tokpushback" is guaranteed
to be zero).
---
shell/ash.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/shell/ash.c b/shell/ash.c
index 051cc671f..18848c7db 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -12913,6 +12913,7 @@ parseheredoc(void)
heredoclist = NULL;
while (here) {
+ tokpushback = 0;
setprompt_if(needprompt, 2);
readtoken1(pgetc(), here->here->type == NHERE ? SQSYNTAX : DQSYNTAX,
here->eofmark, here->striptabs);
Denys Vlasenko
2018-11-20 16:47:23 UTC
Permalink
Applied, thanks! (dash is also affected, mind pinging them?)
Post by Christoph Schulz
The parser reads from an already freed memory location, thereby causing
- ENABLE_ASH_EXPAND_PRMT is enabled
- heredoc is being parsed
- command substitution is used within heredoc
$ cat <<EOF
`echo abc`
EOF
-sh: O: not found
$ cat <<EOF
$(echo abc)
EOF
-sh: ���i�: not found
The presumable reason is that setprompt_if() causes a nested expansion when
ENABLE_ASH_EXPAND_PRMT is enabled, therefore leaving "wordtext" in an unusable
state. However, when parseheredoc() is called, "tokpushback" is non-zero, which
causes the next call to xxreadtoken() to return TWORD, causing the caller to
use the invalid "wordtoken" instead of reading the next valid token.
list()
-> peektoken() [sets tokpushback to 1]
-> parseheredoc()
-> setprompt_if()
-> pushstackmark()
-> expandstr()
-> readtoken1()
[sets lasttoken to TWORD, wordtoken points to expanded prompt]
-> popstackmark() [invalidates wordtoken, leaves lasttoken as is]
-> readtoken1()
-> ...parsebackq
-> list()
-> andor()
-> pipeline()
-> readtoken()
-> xxreadtoken()
[tokpushback non-zero, reuse lasttoken and wordtext]
Note that in almost all other contexts, each call to setprompt_if() is preceded
by setting "tokpushback" to zero. One exception is "oldstyle" backquote parsing
in readtoken1(), but there "tokpushback" is reset afterwards. The other
exception is nlprompt(), but this function is only used within readtoken1()
(but in contexts where no nested calls to xxreadtoken() occur) and xxreadtoken()
(where "tokpushback" is guaranteed to be zero).
---
shell/ash.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/shell/ash.c b/shell/ash.c
index 051cc671f..18848c7db 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -12913,6 +12913,7 @@ parseheredoc(void)
heredoclist = NULL;
while (here) {
+ tokpushback = 0;
setprompt_if(needprompt, 2);
readtoken1(pgetc(), here->here->type == NHERE ? SQSYNTAX : DQSYNTAX,
here->eofmark, here->striptabs);
--
2.18.1
_______________________________________________
busybox mailing list
http://lists.busybox.net/mailman/listinfo/busybox
Christoph Schulz
2018-11-20 18:21:42 UTC
Permalink
Hello!
Post by Denys Vlasenko
Applied, thanks! (dash is also affected, mind pinging them?)
I could not reproduce the problem with dash, neither with git dash nor
with older versions (0.5.8, 0.5.9.1, 0.5.10.2, to name a few).
Something must be handled differently. But I haven't compared busybox
ash and dash sources thoroughly either.


Regards,

Christoph
Ron Yorston
2018-11-20 21:56:13 UTC
Permalink
Post by Christoph Schulz
I could not reproduce the problem with dash, neither with git dash nor
with older versions (0.5.8, 0.5.9.1, 0.5.10.2, to name a few).
Something must be handled differently. But I haven't compared busybox
ash and dash sources thoroughly either.
The same issue is present in dash and has been discussed in a mailing
list thread starting here:

https://www.spinics.net/lists/dash/msg01652.html

It isn't always reproducible due to system/compiler differences.

Herbert Xu has recently posted a different patch:

https://www.spinics.net/lists/dash/msg01727.html

Sorry, I should have picked this up sooner, especially as I was involved
in that thread. Lack of concentration.

Ron

Loading...