Discussion:
Bug in wc.
(too old to reply)
Rob Landley
2010-03-08 00:48:50 UTC
Permalink
The busybox "wc" command doesn't work to build mips in 2.6.33. Kernel commit
1b93b3c3e94be2605 added multiple compression types to the kernel image, and
that means that my build is now dying at the end with:

LD vmlinuz
mips64-ld: invalid hex number `0x'

And the actual command line mips64-ld is getting called with is:

mips64-ld "-m" "elf64btsmip" "-m" "elf64btsmip" "-Ttext" "0x" "-T"
"arch/mips/boot/compressed/ld.script" "arch/mips/boot/compressed/head.o"
"arch/mips/boot/compressed/decompress.o" "arch/mips/boot/compressed/dbg.o"
"arch/mips/boot/compressed/piggy.o" "-o" "vmlinuz"

The -Ttext option in ld is generated in arch/mips/boot/compressed/Makefile
which does:

LDFLAGS_vmlinuz := $(LDFLAGS) -Ttext $(VMLINUZ_LOAD_ADDRESS) -T

And VMLINUZ_LOAD_ADDRESS is calculated earlier in that file as:

VMLINUZ_LOAD_ADDRESS := 0x$(shell [ -n "$(VMLINUX_SIZE)" -a -n "$(LOW32)" ] &&
printf "$(HIGH32)%08x" $$(($(VMLINUX_SIZE) + 0x$(LOW32))))

And VMLINUZ_SIZE is:

VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
cut -d' ' -f1)

VMLINUX_SIZE is blank when using busybox tools.

The underlying behavioral wonkiness in busybox "cut" is:

$ busybox wc -c vmlinux
3335777 vmlinux
$ wc -c vmlinux
3335777 vmlinux

Note that we have leading whitespace, the gnu version doesn't. This leading
whitespace is confusing the kernel build, because the cut -d' ' then triggers
on our leading whitespace and produces an empty string, which propogates
through the rest of the build to confuse the linker with a start address of
"0x".

Why do we have unnecessary leading whitespace? What happend to small and
simple and doing no more than absolutely necessary?

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Denys Vlasenko
2010-03-08 03:07:00 UTC
Permalink
Post by Rob Landley
VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
cut -d' ' -f1)
VMLINUX_SIZE is blank when using busybox tools.
$ busybox wc -c vmlinux
3335777 vmlinux
$ wc -c vmlinux
3335777 vmlinux
Note that we have leading whitespace, the gnu version doesn't. This leading
whitespace is confusing the kernel build, because the cut -d' ' then triggers
on our leading whitespace and produces an empty string, which propogates
through the rest of the build to confuse the linker with a start address of
"0x".
Why do we have unnecessary leading whitespace? What happend to small and
simple and doing no more than absolutely necessary?
Good question, I'm redirecting it to author of busybox-1.2.1 (or earlier)
since 1.2.1 displays the same behavior. ;)
--
vda
Rob Landley
2010-03-08 18:54:05 UTC
Permalink
Post by Denys Vlasenko
Post by Rob Landley
VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
cut -d' ' -f1)
VMLINUX_SIZE is blank when using busybox tools.
$ busybox wc -c vmlinux
3335777 vmlinux
$ wc -c vmlinux
3335777 vmlinux
Note that we have leading whitespace, the gnu version doesn't. This
leading whitespace is confusing the kernel build, because the cut -d' '
then triggers on our leading whitespace and produces an empty string,
which propogates through the rest of the build to confuse the linker with
a start address of "0x".
Why do we have unnecessary leading whitespace? What happend to small and
simple and doing no more than absolutely necessary?
Good question, I'm redirecting it to author of busybox-1.2.1 (or earlier)
since 1.2.1 displays the same behavior. ;)
Actually, I don't remember ever touching wc. (I'm going to have to fight with
git, aren't I? I really hate git's UI...)

It looks like wc was completely rewritten to make it much more complicated in
cad5364599e back in 2003, and it was essentially untouched (if you don't count
removing trailing whitespace and tweaking the GPL boilerplate) until you added
a special case in 2006:

commit 3ed001ff2631ad6911096148f47a2719a5b6d4f4
Author: Denis Vlasenko <vda.linux at googlemail.com>
Date: Fri Sep 29 23:41:04 2006 +0000

wc: reduce source cruft, make it so that "wc -c" (one option, no filenames
will not print leading blanks.

Which would have addressed this problem (and prevented the mips 2.6.33 kernel
build from breaking) if it wasn't a special case. This cleanup seems to have
added complexity rather than removing it.

But really, I don't care so much why it's doing what it's doing now as how to
fix it. It looks like what the other wc is doing is holding all output to the
end and calculating the longest string, and prepending spaces for that. The
pathological case is (in the current busybox source):

$ wc -c INSTALL README AUTHORS
5833 INSTALL
8768 README
5171 AUTHORS
19772 total

Meaning it has to know all lines before it outputs any, which is really not
busybox's style.

And which really isn't all that _interesting_, to be honest. I'd be pretty
happy if we never prepended the space and tried to line up the columns at all.
The longest number will _never_ have prepended space, so anything that tries
to parse this multi-column output must deal with the no leading whitespace,
and must therefore treat leading whitespace as _optional_ rather than
required.

Where we're getting bit is programs depending on the longest number not having
any prepended whitespace. Meaning when there's one number output, it's the
longest number, therefore there should never be prepended whitespace to it.

In general it seems to me that the busybox approach to this whole issue would
be either:

A) skip the optional behavior to save the space and complexity.
B) Make the column alignment code a config option and shared with ls -l intead
of hand rolled.

I lean towards A myself...

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Denys Vlasenko
2010-03-08 21:09:43 UTC
Permalink
Post by Denys Vlasenko
Post by Rob Landley
VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
? ? cut -d' ' -f1)
VMLINUX_SIZE is blank when using busybox tools.
$ busybox wc -c vmlinux
? 3335777 vmlinux
$ wc -c vmlinux
3335777 vmlinux
Note that we have leading whitespace, the gnu version doesn't. ?This
leading whitespace is confusing the kernel build, because the cut -d' '
then triggers on our leading whitespace and produces an empty string,
which propogates through the rest of the build to confuse the linker with
a start address of "0x".
Why do we have unnecessary leading whitespace? ?What happend to small and
simple and doing no more than absolutely necessary?
Good question, I'm redirecting it to author of busybox-1.2.1 (or earlier)
since 1.2.1 displays the same behavior. ;)
Actually, I don't remember ever touching wc. ?(I'm going to have to fight with
git, aren't I? I really hate git's UI...)
It looks like wc was completely rewritten to make it much more complicated in
cad5364599e back in 2003, and it was essentially untouched (if you don't count
removing trailing whitespace and tweaking the GPL boilerplate) until you added
commit 3ed001ff2631ad6911096148f47a2719a5b6d4f4
Author: Denis Vlasenko <vda.linux at googlemail.com>
Date: ? Fri Sep 29 23:41:04 2006 +0000
? ?wc: reduce source cruft, make it so that "wc -c" (one option, no filenames
? ?will not print leading blanks.
Which would have addressed this problem (and prevented the mips 2.6.33 kernel
build from breaking) if it wasn't a special case. ?This cleanup seems to have
added complexity rather than removing it.
But really, I don't care so much why it's doing what it's doing now as how to
fix it.
This is the fix I'm going to apply in a few minutes:

--- busybox.4/coreutils/wc.c 2010-01-12 22:15:16.000000000 +0100
+++ busybox.5/coreutils/wc.c 2010-03-08 21:59:01.837629390 +0100
@@ -88,6 +88,8 @@ int wc_main(int argc UNUSED_PARAM, char
if (!argv[0]) {
*--argv = (char *) bb_msg_standard_input;
fname_fmt = "\n";
+ }
+ if (!argv[1]) {
if (!((print_type-1) & print_type)) /* exactly one option? */
start_fmt = "%"COUNT_FMT;
}
--
vda
Rob Landley
2010-03-09 15:45:30 UTC
Permalink
Post by Denys Vlasenko
--- busybox.4/coreutils/wc.c 2010-01-12 22:15:16.000000000 +0100
+++ busybox.5/coreutils/wc.c 2010-03-08 21:59:01.837629390 +0100
@@ -88,6 +88,8 @@ int wc_main(int argc UNUSED_PARAM, char
if (!argv[0]) {
*--argv = (char *) bb_msg_standard_input;
fname_fmt = "\n";
+ }
+ if (!argv[1]) {
if (!((print_type-1) & print_type)) /* exactly one option? */
start_fmt = "%"COUNT_FMT;
}
Confirmed that it works. When's the next bugfix release due?

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Denys Vlasenko
2010-03-12 22:10:54 UTC
Permalink
Post by Rob Landley
Post by Denys Vlasenko
--- busybox.4/coreutils/wc.c 2010-01-12 22:15:16.000000000 +0100
+++ busybox.5/coreutils/wc.c 2010-03-08 21:59:01.837629390 +0100
@@ -88,6 +88,8 @@ int wc_main(int argc UNUSED_PARAM, char
if (!argv[0]) {
*--argv = (char *) bb_msg_standard_input;
fname_fmt = "\n";
+ }
+ if (!argv[1]) {
if (!((print_type-1) & print_type)) /* exactly one option? */
start_fmt = "%"COUNT_FMT;
}
Confirmed that it works. When's the next bugfix release due?
Perhaps this weekend.
--
vda
Denys Vlasenko
2010-03-29 04:13:22 UTC
Permalink
Post by Rob Landley
Post by Denys Vlasenko
--- busybox.4/coreutils/wc.c 2010-01-12 22:15:16.000000000 +0100
+++ busybox.5/coreutils/wc.c 2010-03-08 21:59:01.837629390 +0100
@@ -88,6 +88,8 @@ int wc_main(int argc UNUSED_PARAM, char
if (!argv[0]) {
*--argv = (char *) bb_msg_standard_input;
fname_fmt = "\n";
+ }
+ if (!argv[1]) {
if (!((print_type-1) & print_type)) /* exactly one option? */
start_fmt = "%"COUNT_FMT;
}
Confirmed that it works. When's the next bugfix release due?
1.16.1 has been released today.
--
vda
Rob Landley
2010-03-30 00:09:59 UTC
Permalink
Post by Denys Vlasenko
Post by Rob Landley
Post by Denys Vlasenko
--- busybox.4/coreutils/wc.c 2010-01-12 22:15:16.000000000 +0100
+++ busybox.5/coreutils/wc.c 2010-03-08 21:59:01.837629390 +0100
@@ -88,6 +88,8 @@ int wc_main(int argc UNUSED_PARAM, char
if (!argv[0]) {
*--argv = (char *) bb_msg_standard_input;
fname_fmt = "\n";
+ }
+ if (!argv[1]) {
if (!((print_type-1) & print_type)) /* exactly one option? */
start_fmt = "%"COUNT_FMT;
}
Confirmed that it works. When's the next bugfix release due?
1.16.1 has been released today.
I rebuilt the set of statically linked more or less defconfig busybox binaries
for various targets and uploaded them Morris, see
http://busybox.net/downloads/binaries/1.16.1

By the way, if you need statically linked strace for the same set of targets
(sometimes useful, it's come up here on this list a couple times), you can find
that (and dropbear) at http://impactlinux.com/fwl/downloads/binaries

(Or at least you should once it finishes uploading. I just cut a new release
of my own project... :)

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Denys Vlasenko
2010-04-02 23:39:36 UTC
Permalink
Post by Rob Landley
Post by Denys Vlasenko
1.16.1 has been released today.
I rebuilt the set of statically linked more or less defconfig busybox binaries
for various targets and uploaded them Morris, see
http://busybox.net/downloads/binaries/1.16.1
By the way, if you need statically linked strace for the same set of targets
(sometimes useful, it's come up here on this list a couple times), you can find
that (and dropbear) at http://impactlinux.com/fwl/downloads/binaries
Hi Rob,

Wonderful job! This is so much further along than my crude
cross-compiler. You have fifteen architectures covered.
I had only two.


I downloaded cross-compiler-i686 and cross-compiler-x86_64
and I can build static executables using either
after I made symlinks
/usr/x86_64-unknown-linux -> /whereever/I/untarred/cross-compiler-x86_64/x86_64-unknown-linux

I have a few questions.

When I run "strace -oLOG -f x86_64-gcc --static t.c"
I see that it still tries to use your configured target path,
/home/landley/temp/firmware/build/cross-compiler-x86_64.
Can this be prevented? (I can send you LOG if you need it).

And second, dynamic linking ("x86_64-gcc t.c") also works,
but of course resulting binary needs some files to be in /lib,
in simplest case /lib/ld-uClibc.so.0 and /lib/libc.so.0:

# readelf -aW a.out
...
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x000150 0x000150 R E 0x8
INTERP 0x000190 0x0000000000400190 0x0000000000400190 0x000014 0x000014 R 0x1
[Requesting program interpreter: /lib/ld-uClibc.so.0]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x000439 0x000439 R E 0x100000
LOAD 0x00043c 0x000000000050043c 0x000000000050043c 0x000198 0x00019c RW 0x100000
DYNAMIC 0x000468 0x0000000000500468 0x0000000000500468 0x000130 0x000130 RW 0x8
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x8

Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .hash .dynsym .dynstr .rela.plt .init .plt .text .fini .rodata
03 .eh_frame .ctors .dtors .jcr .dynamic .got.plt .data .bss
04 .dynamic
05

Dynamic section at offset 0x468 contains 14 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.0]
...

I can copy or symlink them to ones in cross-compiler-x86_64/lib
and it will start working.

But I can't make it for more than one
cross-compiling toolchain at once, right?
I can use either cross-compiler-i686 or cross-compiler-x86_64,
but not both at once. But that would be useful.
For example, in order to run randomconfig tests for both 32
and 64 bits in parallel overnight.

I know that various distros use different names, like /lib and /lib64,
to make it possible. How do they do it?

And do you think it might make sense for you
to use /lib-$CROSS instead of /lib for every (cross-)compiler,
making it possible to run many dynamically linked programs
against different sub-arches on the same machine?

This will be an overkill for the case when one runs just a plain
one-subarch, but it will still work for that case too, right?
--
vda
Rob Landley
2010-04-04 09:51:22 UTC
Permalink
Post by Denys Vlasenko
Post by Rob Landley
Post by Denys Vlasenko
1.16.1 has been released today.
I rebuilt the set of statically linked more or less defconfig busybox
binaries for various targets and uploaded them Morris, see
http://busybox.net/downloads/binaries/1.16.1
By the way, if you need statically linked strace for the same set of
targets (sometimes useful, it's come up here on this list a couple
times), you can find that (and dropbear) at
http://impactlinux.com/fwl/downloads/binaries
Hi Rob,
Wonderful job! This is so much further along than my crude
cross-compiler. You have fifteen architectures covered.
I had only two.
I downloaded cross-compiler-i686 and cross-compiler-x86_64
and I can build static executables using either
after I made symlinks
/usr/x86_64-unknown-linux ->
/whereever/I/untarred/cross-compiler-x86_64/x86_64-unknown-linux
I have a few questions.
When I run "strace -oLOG -f x86_64-gcc --static t.c"
I see that it still tries to use your configured target path,
/home/landley/temp/firmware/build/cross-compiler-x86_64.
Can this be prevented? (I can send you LOG if you need it).
Lemme guess, this is when calling out to the linker? (I can deal with
everything else via ccwrap, but this one I'd have to patch out.)

Back in 2006, I sat down and tried to clean out the horrible path logic from
GCC. Just remove everything it shouldn't be doing, and add one clean set of
"this lives here, that lives there" parsing.

When my patch passed 10,000 lines of stuff it was removing, with no end in
sight, I gave up and went with a wrapper.

http://landley.net/notes-2006.html#15-11-2006

The gcc path logic is _evil_. They assemble a big long array of every
possible place the files they're looking for _might_ be, taken from stuff the
./configure infrastructure writes into the header files, and from querying
environment variables at runtime, and from stuff written in their own scripting
language, and from paths derived from where it finds other things at runtime,
and even a few paths literally hardwired into the C code. They do this for
system includes, compiler includes, system libraries, compiler libraries, and
executables they shell out for.

The worst part is that they never remove anything from this list, just add
more to it. (Which is the last thing you want to do when cross compiling, the
big failure for cross compiling is finding the _wrong_ stuff by accident.) When
the stuff they've been doing proves intractably horrible, they just add another
layer on top, inserting more random locations at the beginning of these
arrays, and then fall back to check all the historical craziness.

Their current fresh coat of paint over dry rot is called "sysroot", and just
like the previous _five_ reboot attempts I found in the code they're _sure_
this will fix everything. You know how it works? The code still contains
hardwired absolute paths into the user's home directory it was built in, but
then they string match for them and do a search and replace at runtime.

The approach I took was to write a wrapper (ccwrap.c, based on the old uClibc
wrapper from back before buildroot) which parses the command line and rewrites
it starting with --nostdinc --nostdlib. That way, gcc can hallucinate any
strange paths it likes, it's just explicitly told NEVER TO USE THEM, and
instead only check the locations that ccwrap feeds it on the command line.

Alas, what this _doesn't_ fix is the exec shellouts to find the assembler and
linker and strip and such. (All the file from binutils.) You'd think it would
check the $PATH, and eventually it does fall back to checking the $PATH. (And
yes, ccwrap adjusts the $PATH so what it's looking for is right at the start.)
But first it checks a few random hardwired locations determined at compile
time, and since I started using ccwrap I stopped patching gcc's path logic,
which is what I'd have to do to make it not do that.
Post by Denys Vlasenko
And second, dynamic linking ("x86_64-gcc t.c") also works,
but of course resulting binary needs some files to be in /lib,
You can feed the -L option to qemu to override that, which sometimes works.
("Set the ELF interpreter prefix.") Depends on qemu version and what exactly
you try to do. (Last time I checked it, it worked for the program you ran but
not new programs that program spawned.)

You can also compile things with a different hardwired dynamic linker path,
using the somewhat esoteric command line option:

-Wl,--dynamic-linker,/path/to/new/ld-uClibc.so

But the dynamic linker is always at a hardwired absolute path, that's just the
way it works. It's like #!/bin/bash needing the /bin/ on the front. (Yeah,
in that case you can uuse "env", but you still need an absolute path to env.)

Laptop battery dying, answer the rest in a bit...

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Rob Landley
2010-04-04 12:53:45 UTC
Permalink
Post by Denys Vlasenko
[Requesting program interpreter: /lib/ld-uClibc.so.0]
You'll notice that's a hardwired absolute path. If you check all the other
binaries on your system (including the ones your host came with), you'll
notice they have hardwired absolute paths for this too.
Post by Denys Vlasenko
I can copy or symlink them to ones in cross-compiler-x86_64/lib
and it will start working.
Unfortunately that's just the way the dynamic linker works.
Post by Denys Vlasenko
But I can't make it for more than one
cross-compiling toolchain at once, right?
Each dynamic binary needs an absolute path to its dynamic linker. The kernel
loads this directly, so it doesn't have a search path, in much the same way
/sbin/hotplug hasn't got a search path when the kernel launches that. Such a
search path wold be putting policy into the kernel.

There's an online book on linking that covers this:

http://www.iecc.com/linker/linker10.html
Post by Denys Vlasenko
I can use either cross-compiler-i686 or cross-compiler-x86_64,
but not both at once. But that would be useful.
The wrapper I use is actually run-time configurable with environment variables.

When compiling stuff:

export CCWRAP_DYNAMIC_LINKER=/lib-uClibc-x86_64/ld-uClibc.so.0

Then copy all the appropraite shared librareis to that directory (or whatever
name you prefer to use). The uClibc dynamic linker will look in the directory
the shared linker is installed in as one of its default locations, see line
286 of uClibc/ldso/ldso/dl-elf.c:

/* Look for libraries wherever the shared library loader
* was installed */
_dl_if_debug_dprint("\tsearching ldso dir='%s'\n", _dl_ldsopath);
if ((tpnt1 = search_for_named_library(libname, secure, _dl_ldsopath,
rpnt)) != NULL)
{
return tpnt1;
}

The downside is of course that it'll fall back to looking in /lib if it
doesn't find the library it's it's looking for. The bane of cross compiling is
falling back to default locations at which the host headers and libraries
live. Making it _NOT_ do that is 95% of the game of whack-a-mole you wind up
playing trying to make this crap work.

But in the case of busybox, that shouldn't be too big an issue. You don't
have the "my cross compiler didn't have zlib installed so it found the host
library" issue because we don't use random external dependencies. You can't
leak random external dependencies if you don't _use_ them.

If you did want to hard-wire in both of these changes, you could change the
default path to the dynamic linker in ccwrap.c on line 197:

http://impactlinux.com/hg/hgwebdir.cgi/firmware/file/f3b242456ff7/sources/toys/ccwrap.c

And then you could fix the dynamic linker library search path fallback problem
by rebuilding ld-uClibc.so.0 with a different UCLIBC_RUNTIME_PREFIX, although
if you're going to delve into the horror that is uClibc's path logic, read
this first:

http://ibot.rikers.org/%23uclibc/20081210.html.gz

And then probably give up and just hardwire what you want into dl-elf.c line
299 or so, because it's going to add a hardwired "usr/lib" after the path you
give it, whether you want it to or not. (I believe I convinced bernhard to
stop doing this in current -git, but I haven't gotten around to testing the
new release candidate yet.)
Post by Denys Vlasenko
For example, in order to run randomconfig tests for both 32
and 64 bits in parallel overnight.
Build statically and it'll work fine? That's the easy way...

The thing is, I'm not treating x86 or x86-64 specially. I'm treating 'em the
exact same way I treat mips and arm and such. Those won't run on your host,
you need to use the emulator to run them. I go ahead and use the emulator to
text i486 and such too, because the fact that i486 runs on the host doesn't
mean it'll run on a real 486, and yes some low-power embedded chips emulate a
486 but not the pentium instructions:

http://impactlinux.com/hg/hgwebdir.cgi/firmware/rev/1004

So I generally use a system image, or a chroot with application emulation and
dynamic linking, or I build statically and use application emulation.

A prominent design goal of these toolchains is to get all the architectures to
behave as similarly as possible. Having them use the same dynamic linker name
is part of that. When I build an x86-64 image it's fully 64-bit, with no 32
bit support. (Same as mips64, or the upcoming ppc64 I'm poking at.)

That's also why they don't multilib: you build with this toolchain, it should
produce the right output by default. If you want a different type of output,
use a different toolchain. (If I could get one toolchain to support all
targets, I'd build one and use wrapper scripts to feed in target flags. But
gcc wasn't designed with that in mind. You'll find "gcc wasn't designed with
that in mind" crops up a LOT when you start playing with it, it's their
unofficial motto, I think...)

It's not actually that hard to support "32 bit on 64 bit" sort of things. The
dynamic linker and default library search path are the main things. But I'm
trying to keep down the complexity and having each toolchain and each system
image support exactly one target is a big part of that. Not having two
contentexts that can get confused with each other, thus no cross compiling
issues.
Post by Denys Vlasenko
I know that various distros use different names, like /lib and /lib64,
to make it possible. How do they do it?
They hardwire a different path to the dynamic linker into each executable the
toolchain creates (which in this instance is controlled by ccwrap, see above).
And then they teach that dynamic linker to look for libraries in /lib64 by
default, instead of in /lib. (And then to make themselves feel better they
move the 32 bit libraries to lib32 and symlink /lib to /lib32, even though
nothing anywhere ever uses lib32 directly as a path. Presumably it helps them
sleep at night to pretend they haven't special-cased 64 bit out the wazoo to
work around legacy 32 bit binaries. "See, we abused 32 bit in the same way,
we just made sure it didn't matter in the slightest by symlinking all the
paths that are actually _used_ by the legacy binaries we're supporting to
point to the place we moved it." Makes me want to pat 'em on the head and go
"there there, lay on this couch, tell me about your mother"...)
Post by Denys Vlasenko
And do you think it might make sense for you
to use /lib-$CROSS instead of /lib for every (cross-)compiler,
making it possible to run many dynamically linked programs
against different sub-arches on the same machine?
I could, sure. But you'd still need to use the emulator to run 'em, at which
point running 'em in a chroot or via a system image makes about as much sense.
Post by Denys Vlasenko
This will be an overkill for the case when one runs just a plain
one-subarch, but it will still work for that case too, right?
It would work, yes.

Let's talk over the design issues at CELF next week. If you're serious about
this use case I can put a config option into my build to automate it for you,
but I'd like to demonstrate scriptable system images to you first. I think
they're a better way to do this sort of thing.

System images are nicely self contained, and don't require root access to run.
Adding stuff on the host is not self-contained, requires root access, tends to
bit rot, bypasses your distro's normal package tracking mechanisms (and even
if it didn't, packages are never tagged as "needed for this project" so
reproducing the setup on another machine is a pain). And application emlation
is inherently more brittle than system emulation anyway so you'll spend lots
of your time finding bugs in the _emulator_, not in busybox. (Less so now than
2 years ago, but still. In system emulation it generally either works
completely or not at all, no strange buggy halfway working states.)

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Denys Vlasenko
2010-04-04 14:08:15 UTC
Permalink
Post by Rob Landley
Post by Denys Vlasenko
[Requesting program interpreter: /lib/ld-uClibc.so.0]
You'll notice that's a hardwired absolute path. If you check all the other
binaries on your system (including the ones your host came with), you'll
notice they have hardwired absolute paths for this too.
I know. But different binaries can have different program interpreters,
nobody says it must be in /lib/. Check any distro which has
dual 32/64-bit x86 packages.

For example, on my laptop, I have "mostly 64-bit" Fedora.
"Native" 64-bit binaries use /lib64/ld-linux-x86-64.so.2 interpreter.
32-bit ones use /lib/ld-linux.so.2 one.
And it's not limited to "can install 32-bit packages",
I can compile new 32-executables too: gcc -m32 t.c,
and they load successfully.

Even though it looks like Fedora hacked in this support
instead of doing it in a generic manner (suffix "64" looks arbitrary,
it's not a proper (sub)arch suffix or something like that),
but it shows that it's possible to make it happen.

With your toolschains, this is not possible as of now,
because they all use /lib without any (sub)arch suffix.
Post by Rob Landley
Post by Denys Vlasenko
And do you think it might make sense for you
to use /lib-$CROSS instead of /lib for every (cross-)compiler,
making it possible to run many dynamically linked programs
against different sub-arches on the same machine?
I could, sure. But you'd still need to use the emulator to run 'em, at which
point running 'em in a chroot or via a system image makes about as much sense.
Post by Denys Vlasenko
This will be an overkill for the case when one runs just a plain
one-subarch, but it will still work for that case too, right?
It would work, yes.
Let's talk over the design issues at CELF next week. If you're serious about
this use case I can put a config option into my build to automate it for you,
but I'd like to demonstrate scriptable system images to you first. I think
they're a better way to do this sort of thing.
Ok, let's do it.
--
vda
Michael Abbott
2010-03-08 16:19:51 UTC
Permalink
Post by Rob Landley
[...]
$ busybox wc -c vmlinux
3335777 vmlinux
$ wc -c vmlinux
3335777 vmlinux
Note that we have leading whitespace, the gnu version doesn't. [...]
Why do we have unnecessary leading whitespace? What happend to small
and simple and doing no more than absolutely necessary?
Presumably the code contains a format string of the form "% 9d"
written like that so the output looks pretty on multiple files.

In fact, that's got to be the case. Look at the frankly horrible thing
that gnu wc does on multiple files:

$ wc -c wc.c yes.c
4974 wc.c
823 yes.c
5797 total
$

There must be code in gnu wc to remember the maximum count length and
format accordingly, so what busybox does is indeed simpler ... but not
quite clever enough it would seem

I'm sure the 9-space formatting would suffice for multiple outputs, but
clearly for a single file it has to be avoided. Afraid I'm too witless
today to work up a patch, but it should be easy enough.
walter harms
2010-03-08 17:10:02 UTC
Permalink
Post by Michael Abbott
Post by Rob Landley
[...]
$ busybox wc -c vmlinux
3335777 vmlinux
$ wc -c vmlinux
3335777 vmlinux
Note that we have leading whitespace, the gnu version doesn't. [...]
Why do we have unnecessary leading whitespace? What happend to small
and simple and doing no more than absolutely necessary?
Presumably the code contains a format string of the form "% 9d"
written like that so the output looks pretty on multiple files.
In fact, that's got to be the case. Look at the frankly horrible thing
$ wc -c wc.c yes.c
4974 wc.c
823 yes.c
5797 total
$
There must be code in gnu wc to remember the maximum count length and
format accordingly, so what busybox does is indeed simpler ... but not
quite clever enough it would seem
I'm sure the 9-space formatting would suffice for multiple outputs, but
clearly for a single file it has to be avoided. Afraid I'm too witless
today to work up a patch, but it should be easy enough.
perhaps it is more easy adding a 'tr -s " "' to the linux kernel build script ?
that would leave only one space to worry about no matter how much there where
originaly.

re,
wh
Bernhard Reutner-Fischer
2010-03-08 18:03:03 UTC
Permalink
Post by Rob Landley
The busybox "wc" command doesn't work to build mips in 2.6.33. Kernel commit
VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
cut -d' ' -f1)
cool stuff. I guess
VMLINUX_SIZE := $(firstword $(shell wc -c $(objtree)/$(KBUILD_IMAGE) >2>/dev/null))
or 'stat -c %s' would have been too simple? Perhaps you can suggest this
to the kernel folks.
Post by Rob Landley
VMLINUX_SIZE is blank when using busybox tools.
$ busybox wc -c vmlinux
3335777 vmlinux
$ wc -c vmlinux
3335777 vmlinux
And yes, that should be fixed too, let's just do away with the space
offsets alltogether (but that _will_ break folks who | cut -c10- wc of
course).
Mike Frysinger
2010-03-08 18:14:22 UTC
Permalink
Post by Bernhard Reutner-Fischer
Post by Rob Landley
The busybox "wc" command doesn't work to build mips in 2.6.33. Kernel commit
VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
cut -d' ' -f1)
cool stuff. I guess
VMLINUX_SIZE := $(firstword $(shell wc -c $(objtree)/$(KBUILD_IMAGE)
Post by Rob Landley
2>/dev/null)) or 'stat -c %s' would have been too simple? Perhaps you can
suggest this to the kernel folks.
`stat` is not in POSIX, so this would be an annoying regression

sending the output through `echo` would also normalize the whitespace
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20100308/33415c46/attachment.pgp>
Bernhard Reutner-Fischer
2010-03-08 18:38:08 UTC
Permalink
Post by Mike Frysinger
`stat` is not in POSIX, so this would be an annoying regression
indeed
Post by Mike Frysinger
sending the output through `echo` would also normalize the whitespace
That's make land, so i'd spare spawning one command (cut in this case or
echo as in sh) and normalize it with stuff builtin into make, i.e.
$(firstword )
walter harms
2010-03-08 18:49:31 UTC
Permalink
Post by Mike Frysinger
Post by Bernhard Reutner-Fischer
Post by Rob Landley
The busybox "wc" command doesn't work to build mips in 2.6.33. Kernel commit
VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
cut -d' ' -f1)
cool stuff. I guess
VMLINUX_SIZE := $(firstword $(shell wc -c $(objtree)/$(KBUILD_IMAGE)
Post by Rob Landley
2>/dev/null)) or 'stat -c %s' would have been too simple? Perhaps you can
suggest this to the kernel folks.
`stat` is not in POSIX, so this would be an annoying regression
sending the output through `echo` would also normalize the whitespace
-mike
if stat does not work ls will do: 'ls -1s'

re,
wh
Rob Landley
2010-03-09 02:56:03 UTC
Permalink
Post by Bernhard Reutner-Fischer
Post by Rob Landley
The busybox "wc" command doesn't work to build mips in 2.6.33. Kernel commit
VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
cut -d' ' -f1)
cool stuff. I guess
VMLINUX_SIZE := $(firstword $(shell wc -c $(objtree)/$(KBUILD_IMAGE)
Post by Rob Landley
2>/dev/null)) or 'stat -c %s' would have been too simple? Perhaps you can
suggest this to the kernel folks.
You could suggest it to the kernel folks just as easily. I have my plate full
with suggestions that way, and Peter Anvin has already (repeatedly) accused my
build envioronment of being a one-person experiment of no interest to the rest
of the world.
Post by Bernhard Reutner-Fischer
Post by Rob Landley
VMLINUX_SIZE is blank when using busybox tools.
$ busybox wc -c vmlinux
3335777 vmlinux
$ wc -c vmlinux
3335777 vmlinux
And yes, that should be fixed too, let's just do away with the space
offsets alltogether (but that _will_ break folks who | cut -c10- wc of
course).
http://www.opengroup.org/onlinepubs/9699919799/utilities/wc.html

By default, the standard output shall contain an entry for each input file of
the form:

"%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>

I.E. no leading space, only one space between each thingy, and if that's what
we do we can hide behind SUSv4. :)

I've already written a toybox version, since it was easier for me to write
that from scratch than try to wrestle with the busybox one. The new toybox
command is 98 lines long (1737 bytes of source) and the existing busybox one
is 206 lines (4974 bytes of source) in current git.

Plus implementing a new toybox command involves adding one file to one
directory, and it gets automatically picked up and everything else dynamically
generated from that by the build.

Here is the new toybox file in its entirety:

/* vi: set sw=4 ts=4:
*
* wc.c - word count
*
* Copyright 2010 Rob Landley <rob at landley.net>
*
* See http://www.opengroup.org/onlinepubs/9699919799/utilities/wc.html

USE_WC(NEWTOY(wc, "mLcwl", TOYFLAG_USR|TOYFLAG_BIN))

config WC
bool "wc"
default y
help
Count words, lines, and/or bytes in files or stdin.

usage: wc [-clw] [file...]

-l Line count
-w Word count
-c Byte count
-L Longest line
*/

#include "toys.h"

DEFINE_GLOBALS(
long count[4];
long lines;
)

#define TT this.wc

static void print_results(long *count, char *name)
{
int i, space = 0;

for (i=0; i<4; i++) {
if (toys.optflags & (1<<i)) {
if (space++) xputc(' ');
printf("%ld", count[i]);
}
TT.count[i] += count[i];
}

if (strcmp("-", name)) printf(" %s", name);
xputc('\n');
}

static void do_wc(int fd, char *name)
{
long count[5]; // lwcL plus a count for current L
int i, len, space=1;

bzero(count, 5*sizeof(long));

for (;;) {
len = read(fd, toybuf, sizeof(toybuf));
if (len<0) {
perror_msg("%s",name);
toys.exitval = EXIT_FAILURE;
}
if (len<1) break;

// Loop through the data
for (i=0; i<len; i++) {

// increment c always
count[2]++;

// increment w if this is a space but the previous one wasn't.
if (isspace(toybuf[i])) {
if (!space) count[1]++;
space = 1;
} else space=0;

if (toybuf[i] == '\n') {
// Handle l
(*count)++;
// Handle L
if (count[4]>count[3]) count[3]=count[4];
count[4]=0;
} else count[4]++;
}
}

// Print out the results

print_results(count, name);
TT.lines++;
}

void wc_main(void)
{
if (!(toys.optflags&15)) toys.optflags = 7;
loopfiles(toys.optargs, do_wc);
if (TT.lines>1) print_results(TT.count, "total");
}

To add an applet to busybox, you need to add the actual .c file, and modify
applets.h, and usage.h, and modify the appropriate Config.in, and modify the
appropriate Kbuild file, and while we're at it why not touch
docs/busybox_footer.pod.

I am somewhat curious why my wc says the toybox binary is 2823 words long, the
gnu wc says it's 2761 words long, and the busybox one says it's 2755 words
long. Then again the spec doesn't say _how_ you indicate "word", so... (I'm
just using isspace() followed by !isspace(), seemed fairly straightforward...)
The -c and -l fields are consistent, though. (Still debugging -L in mine,
though.)

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Rob Landley
2010-03-10 02:43:52 UTC
Permalink
Post by Rob Landley
I've already written a toybox version, since it was easier for me to write
that from scratch than try to wrestle with the busybox one. The new toybox
command is 98 lines long (1737 bytes of source) and the existing busybox
one is 206 lines (4974 bytes of source) in current git.
By the way, I'm not sure how much of the "easier" with the writing a new wc
for toybox was me, and how much was the toybox infrastructure. I'm in the
wierd position of not really wanting to continue toybox as a separate project
that's vastly out-resourced by the busybox development community, but also
finding working on busybox incredibly clumsy and tedious compared to working on
toybox.

What I'd really like to do is port the toybox infrastructure over to busybox,
if you guys are interested. I'll describe the process of creating the new wc
command to give you guys a feel for it. (A previous attempt of mine to
document all this is at http://landley.net/code/toybox/code.html by the way.)

Each toybox command is a single C file. Adding a new command to toybox
involves adding a new file to the toys directory. That's it. I don't touch
any makefiles or headers or anything, the rest is entirely generated by the
build script, which scans the toys/*.c files and constructs the other files at
build time. The generic infrastructure has no specific knowledge of the actual
commands.

To start a new command, I cd into the "toys" subdirectory of my toybox source
code and "cp hello.c wc.c". The "hello" command is an example which has all
the basic plumbing a command needs (actually way more than a simple hello
world needs) so it can act as a convenient skeleton for new commands. Note
that I call them "commands" rather than "applets" because this isn't java.
It's a command line, not an applet line.

The toybox hello.c looks like:

/* vi: set sw=4 ts=4:
*
* hello.c - A hello world program.
*
* Copyright 2006 Rob Landley <rob at landley.net>
*
* Not in SUSv4.
* http://www.opengroup.org/onlinepubs/9699919799/utilities/

USE_HELLO(NEWTOY(hello, "e at d*c#b:a", TOYFLAG_USR|TOYFLAG_BIN))

config HELLO
bool "hello"
default n
help
A hello world program. You don't need this.

Mostly used as an example/skeleton file for adding new commands,
occasionally nice to test kernel booting via "init=/bin/hello".
*/

#include "toys.h"

// Hello doesn't use these, they're here for example/skeleton purposes.

DEFINE_GLOBALS(
char *b_string;
long c_number;
struct arg_list *d_list;
long e_count;

int more_globals;
)

#define TT this.hello

void hello_main(void)
{
printf("Hello world\n");
}

But most of that's example boilerplate for skeleton purposs. All it _really_
needs is:

/* hello.c - A hello world program.

USE_HELLO(NEWTOY(hello, NULL, TOYFLAG_USR|TOYFLAG_BIN))

config HELLO
bool "hello"
default n
help
A hello world program. You don't need this.
*/

#include "toys.h"

void hello_main(void)
{
printf("Hello world\n");
}

Each toybox command starts with a specially formatted comment that contains
the command line options, usage info, and kconfig blob for menuconfig. The
command's help text (spit out by the "help" command, as well as by the command
itself if run with unintelligible options) is also extracted from the kconfig
help text, so I don't have to describe the same thing twice.

The first few comment lines (the ones starting with an asterisk) are normal
comment lines that don't get parsed by anything. The convention is to put a
description, copyright notice, and link to the relevant standard (if any)
there, but it's really just a comment.

The USE_XXX(NEWTOY(XXX)) line defines the command name, command line options,
and install location of each command. At compile time a sed invocation
collects this line from from every toys/*.c file into "generated/newtoys.h",
which is then #included to set up the command array toy_exec() searches (see
main.c at the top level).

The USE_XXX() macro chops its contents out if the relevant config option isn't
enabled (just like I added to busybox back in 2006). There's a SKIP_XXX() too
but it's not used much. So this line is always copied into
generated/newtoys.h, but only _used_ if the relevant config entry is enabled.

The NEWTOY() macro takes three arguments: command name, option string, and
install location. If you'd like one command to have multiple names there's
also an OLDTOY() macro, which takes four arguments: the new name, the original
name, command options the new name understands (which can differ from the other
name, but they are washed through the same main() function), and install
location.

The install location is used if you give the "toybox" multiplexer any option
beginning with a dash. Currently for defconfig, it outputs:

./toybox -?
bin/basename usr/bin/bzcat bin/cat usr/bin/catv usr/sbin/chroot
usr/sbin/chvt bin/cksum usr/bin/count bin/cp usr/sbin/df bin/dirname
bin/dmesg bin/echo bin/false bin/help usr/bin/mdev bin/mkfifo sbin/mkswap
bin/nc bin/netcat usr/bin/nice sbin/oneit usr/bin/patch bin/pwd bin/rmdir
usr/bin/seq usr/bin/setsid bin/sh usr/bin/sha1sum bin/sleep usr/bin/sort
bin/sync bin/tee bin/touch bin/toysh bin/true bin/tty bin/uname
usr/sbin/useradd usr/bin/wget usr/bin/which usr/bin/yes

A trivial script can go through that output and install the appropriate
symlinks to the "toybox" binary, something like:

for i in $(./toybox -); do ln -s /bin/toybox $i; done

You can run ./toybox without any arguments to get the list of commands without
the paths prepended, to install all the links in the same directory. (Yes you
can do "toybox cat filename" too, none of the command names start with a dash.)

That leaves the middle argument to NEWTOY(), which is the command line option
string. This is the biggest difference between toybox and busybox, the option
parsing logic is completely different, and so automated you can largely ignore
it. However, I'm going to explain it here in more detail than you probably
really need to know. :)

I wrote my own option parser (lib/args.c, which does _not_ call getopt() so
was net smaller than busybox's last I checked). It's automatically called
before the command's main() function is ever run, using the option string
supplied by NEWTOY() to parse the command line options and fill out global
variables with the appropriate values. You can disable this automatic option
parsing (and call it manually if you like) by passing NULL in as the option
string in NEWTOY(), which is also how you specify you take no arguments so
that the option parsing can get compiled out if nobody's using it. See main.c
for details.

The command_main() functions return void and take no arguments, instead you
use global variables. The main one is the global "toys", which looks like
this:

extern struct toy_context {
struct toy_list *which; // Which entry in toy_list is this one?
int exitval; // Value error_exit feeds to exit()
char **argv; // Original command line arguments
unsigned optflags; // Command line option flags from get_optflags()
char **optargs; // Arguments left over from get_optflags()
int optc; // Count of optargs
int exithelp; // Should error_exit print a usage message first?
int old_umask; // Old umask preserved by TOYFLAG_UMASK
} toys;

toys.optflags is filled out by the option parsing logic with the command line
flags seen this run. exitval defaults to 0 but can be changed by other stuff
(such as any of the functions that exit with an error, or by setting it
manually before returning from main().) optargs[] contains the options left
over after option parsing. (So "ls -l file1 file2 file3", optargs[0] would be
file1 and optargs[2] would be file3, and optargs[3] would be NULL. optc the
equivalent of argc for optargs. argv[] is the unprocessed argument list, kept
around since we can't free it anyway and there's a couple times you might want
to know. (Such as if you passed NULL as the option string to NEWTOY().)

The other interesting global is "this", which is a union of structures
containing all your global variables for each command. That's initialized by
the DEFINE_GLOBALS() macro a bit further down in the file, which lists the
global variables for this file. The contents becomes a structure in a union of
all such structures for each command, which can be accessed as
"this.commandname" (in this case, "this.wc"). The #define TT this.wc is a
shortcut so we can say TT.wc if we have any globals. (I should make the
#define TT automatic as part of DEFINE_GLOBALS() or something, but haven't
figured out how yet. Alas, you can't have a macro resolve to a preprocessor
directive.)

If there are no global variables used by this command, you can omit the
DEFINE_GLOBALS() block entirely. But if the command line parsing saves
results to any variables, you need to list them at the start of the
DEFINE_GLOBALS() block:

1) In order (from right to left).
2) All of them are long/pointer size. (4 bytes on 32-bit, 8 on 64-bit.)

The options are numbered from right to left because that way anybody familiar
with boolean can work out the flag values in their head: "The option string has
abcdefg, command line is -adg, that's 1001001... that's 64+8+1". Whereas if
you number them the other way, you have to reverse them in your head to work
out the values. (This means add extra variables to the beginning of the
string to avoid renumbering the others.)

So if I had an option string "ab:d#" the options are d=1, b=2, a=3 (ignore the
non-letters for that), and the associated globals block could look like:

DEFINE_GLOBALS(
long value_for_d;
char *value_for_b;

int any_other_globals;
)

The appended : means "takes a string argument" (just like in getopt), the
appended # means "takes a number argument". Said arguments are saved into the
global block, right to left becoming top to bottom.

By convention, I put a space between globals filled out by the option parsing
logic and globals that are just globals used by the code. Note that all of
the globals are initialized to zero to start with, and then the option parsing
logic can set the first few to other values, but any that aren't initialized by
the option parsing logic (including ones that _could_ but that option wasn't
used this time) are still reliably zeroed.

That pretty much gets us through all the boilerplate, and in fact is probably
way more info than you'd really need to know to implement the wc command.

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Rob Landley
2010-03-14 18:22:52 UTC
Permalink
So, following up on the resounding response to my previous post (from the
crickets), I'd like to guage the interest in me trying to move busybox towards
the toybox infrastructure. Obviously this would have to be broken up and done
in stages, my question is "do you guys agree it's worth doing"?

I'll detail the actual steps in doing it next message.

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Denys Vlasenko
2010-03-15 00:05:48 UTC
Permalink
Post by Rob Landley
So, following up on the resounding response to my previous post (from the
crickets),
My response would be "generally I like it, but am too lazy
to work on it at the moment"...
Post by Rob Landley
I'd like to guage the interest in me trying to move busybox towards
the toybox infrastructure. Obviously this would have to be broken up and done
in stages, my question is "do you guys agree it's worth doing"?
I am not sure I want the part which autoscans for *.c files
(I frequently have extra *.c files lying around).

The part which allows help text to be stored in *.c file
would be very useful.
--
vda
Rob Landley
2010-03-15 01:36:59 UTC
Permalink
Post by Denys Vlasenko
Post by Rob Landley
So, following up on the resounding response to my previous post (from the
crickets),
My response would be "generally I like it, but am too lazy
to work on it at the moment"...
I was expecting to have to do the heavy lifting myself. :)
Post by Denys Vlasenko
Post by Rob Landley
I'd like to guage the interest in me trying to move busybox towards
the toybox infrastructure. Obviously this would have to be broken up and
done in stages, my question is "do you guys agree it's worth doing"?
I am not sure I want the part which autoscans for *.c files
(I frequently have extra *.c files lying around).
The scan looks for the specially formatted headers, and if it doesn't find them
it ignores the file. Also, if you're working on a command that doesn't build
at the moment, you can "mv thingy.c thingy.new" and then it'll be ignored
until you put it back.

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Rob Landley
2010-03-15 01:44:59 UTC
Permalink
Post by Rob Landley
I'll detail the actual steps in doing it next message.
The central idea is having all the code for each command in a single file, with
the other files generated from that file.

According to http://busybox.net/FAQ.html#adding the current things that are
scattered around the tree instead of auto-generated from a central location
are:

Configuration entries (Config.in)
Makefile entries (Kbuild)
The lookup table (include/applets.h)
Help text (include/usage.h)

In addition toybox handles command line options and per-command globals
differently than busybox does. You shouldn't have to manually call the
getopt(), or #define FLAG_x or #define thingy G.thingy, or have an INIT_G(). It
should all happen behind the scenes for you.

The first step would be creating a new makefile snippet (possibly an included
makefile) that handles the "generated" directory. All of the files generated
from data in *.c files should live in a single place that you can clean with
"rm -rf generated". (It's nice to avoid mixing volatile and repository data
where possible.)

Moving the Config.in entries is easy and could even be done incrementally, just
create a new generated/Config.in file and have converted apps add their blocks
to it. The directory layout could be used to create menus. Then during the
transition period, just include the new Config.in from the old one.

The easiest way to do this is to convert one subdirectory at a time. That way
you don't have two menus for the same subdirectory, and don't have to worry
about renaming config symbols to avoid conflicts from two menus covering the
same symbol (and thus making people change their .configs to add the new
symbol; yes kbuild should automatically enable the menu guard when a symbol in
the menu is set, but it doesn't yet).

Makefile entries can be converted similarly, include generated/Makefile from
another makefile. Those are even easier to do because there aren't menu sumbol
conflicts.

The lookup table is an issue: that has to be converted all at once due to the
alphebetization.

None of those should have any size impact, it's all just refactoring.

Converting the globals probably comes next, and may actualy be a size win.
(We'd have to measure. It would certainly result in _cleaner_ code, with the
"#define thingy G.thingy" stacks and the INIT_G() blocks going away.)

From this point, we hit places where toybox itself is unfinished.

Converting the help is its own post, and involved rephrasing the help text and
moving it into the Config.in entries. The toybox help parser isn't quite
finished, it's supposed to assemble sub-options' "usage:" lines into a single
coherent usage: line but doesn't yet. (Kinda hard to do in bash, and I don't
want to introduce python or something as a build dependency. I should
probably make a C program for it.) Also, it doesn't have configurable different
levels of help text. And toybox handles help text very differently than
busybox does; there's a "help" command a bit like bash help, but it's not
shell-specific. I've pondered aliasing man to it and having fallback behavior
to look in the filesystem, but haven't gone there yet.

We'd have to work out what we want to do for help text, and presumably hold off
doing it until the low hanging fruit was out of the way first.

Converting the option parsing logic is the biggest win in terms of code
cleanup, but it's also the biggest design change, and still has some room for
improvement.

This is tied into the lookup table generation, and probably involves swapping
the old getopt32() for the new one I wrote that doesn't depend on the libc
getopt() at all.

The main missing bit in toybox option parsing is that toybox is not currently
autogenerating the "#define FLAG_x (1<<0)" macros. There should be a big
#include file that has #ifdef blocks for each applet which define the flags for
you, generated by parsing the option string. (So if they move in the option
string, you don't have to change the C code.) Unfortunately, I haven't worked
out the details of implementing that yet...

The globals handling is _almost_ right. I need to make the #define TT toybox
currently has go away, probably some kind of:

#define THIS wc
#include "busybox.h"

And use that #define internally to do behind the scenes magic. (This might
also be able to make the autogenerated FLAG stuff work. I need to study the
c99 preprocessor spec to see what I've got to work with, but that can come
later...)

Anyway, at least some of it can be done in stages.

Oh, and along the way I need to do some serious #ifdefectomy on this code...

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Loïc Grenié
2010-03-15 09:16:32 UTC
Permalink
Post by Rob Landley
Post by Rob Landley
I'll detail the actual steps in doing it next message.
The central idea is having all the code for each command in a single file, with
the other files generated from that file.
[snip]
Post by Rob Landley
In addition toybox handles command line options and per-command globals
differently than busybox does. ?You shouldn't have to manually call the
getopt(), or #define FLAG_x or #define thingy G.thingy, or have an INIT_G(). ?It
should all happen behind the scenes for you.
While I understand the benefits of the porposed modifications and taking
into account that I'm just Random J. User (I'm in no way important to the
project), I just wanted to say that I hate "behind the scenes" work. I usually
find it fairly difficult to understand when I have to modify an
existing program.
I hope you can make the "behind the scenes" work as explicit as possible.

Thanks,

Lo?c
Ralf Friedl
2010-03-15 09:43:56 UTC
Permalink
Post by Rob Landley
The globals handling is _almost_ right. I need to make the #define TT toybox
#define THIS wc
#include "busybox.h"
And use that #define internally to do behind the scenes magic. (This might
also be able to make the autogenerated FLAG stuff work. I need to study the
c99 preprocessor spec to see what I've got to work with, but that can come
later...)
Instead of this, it is better to do:
#include "generated/wc.h"
Then the generated wc.h can contain whatever is necessary for flags,
globals, and so on, and include busybox.h for the common stuff.
If you place everything in one big include file, with preprocessor
conditions on the definition of THIS, you will trigger a recompile of
all files whenever you change a single file.

Ralf
Rob Landley
2010-03-15 22:02:26 UTC
Permalink
Post by Ralf Friedl
Post by Rob Landley
The globals handling is _almost_ right. I need to make the #define TT
#define THIS wc
#include "busybox.h"
And use that #define internally to do behind the scenes magic. (This
might also be able to make the autogenerated FLAG stuff work. I need to
study the c99 preprocessor spec to see what I've got to work with, but
that can come later...)
#include "generated/wc.h"
Yeah, I thought about that, but that's an awful lot of clutter and it's also
harder to automate concisely at the makefile level. (Generating 8 gazillion
small files means the dependencies get brittle, and when you try to look at
them you have a dozen windows/tabs open to follow a single thread. I prefer a
few big ones.)
Post by Ralf Friedl
Then the generated wc.h can contain whatever is necessary for flags,
globals, and so on, and include busybox.h for the common stuff.
Globals have to be in collected into a single table to make the union out of
them, and I already got that part working fine.

Flags are fairly easy too do with a FLAG(o) syntax, but I prefer a FLAG_o
syntax. I'm playing with enums to see if I can get it without generating any
actual code.

The point is, this is like step 6 in the coversion, and several of the early
ones can be done orthogonally, so I'd worry about it later.
Post by Ralf Friedl
If you place everything in one big include file, with preprocessor
conditions on the definition of THIS, you will trigger a recompile of
all files whenever you change a single file.
That's a whole separate rant.

Only if your dependencies are on the intermediate generated files instead of on
the source they're generated from. So you can work around that if you care
to.

And in some cases, you _have_ to do that such as if the size of the largest
global you make the union out of changes, so that's the right_ thing to do.
And then there's ccache, which depends on the contents of files not the
timestamps to determine what needs rebuilding. And of course if you're using
build at once mode (which is how you get the smallest binaries) then you never
do anything except a build all (although during development you get to switch
all apps but the one you're developing off at the config level. In fact, does
anybody except applet developers ever do anything _except_ build all from a
clean start?)

I'm aware of that objection, and would happily argue about its relevance at
great length, but not right now.
Post by Ralf Friedl
Ralf
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
David N. Lombard
2010-03-17 15:16:34 UTC
Permalink
Post by Ralf Friedl
If you place everything in one big include file, with preprocessor
conditions on the definition of THIS, you will trigger a recompile of
all files whenever you change a single file.
... In fact, does
anybody except applet developers ever do anything _except_ build all from a
clean start?)
That quite true. Unless I'm specifically chasing something in busybox, it's
always (re)built completely. It ensures a linear, well-defined path, instead
of some random walk.
--
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.
Rob Landley
2010-03-18 17:12:41 UTC
Permalink
Post by David N. Lombard
Post by Ralf Friedl
If you place everything in one big include file, with preprocessor
conditions on the definition of THIS, you will trigger a recompile of
all files whenever you change a single file.
... In fact, does
anybody except applet developers ever do anything _except_ build all from
a clean start?)
That quite true. Unless I'm specifically chasing something in busybox,
it's always (re)built completely. It ensures a linear, well-defined path,
instead of some random walk.
For toybox, my Makefile's non-phony targets boil down to:

all: toybox

toybox toybox_unstripped: .config *.[ch] lib/*.[ch] toys/*.[ch] scripts/*.sh
scripts/make.sh

And then in make.sh i've got the stuff to regenerate the generated/* stuff
(which you'll notice the above dependencies do _not_ look at), and then the
build itself is (with some enviornment variables expanded):

# Figure out which toys/*.c files are enabled in .config

TOYFILES=$(cat .config | sed -nr \
-e 's/^CONFIG_(.*)=y/\1/;t skip;b;:skip;s/_.*//;p' | \
sort -u | tr A-Z a-z | grep -v '^toybox$' | sed 's@\(.*\)@toys/\1.c@' )

# Compile toybox

$CC $CFLAGS -Wall -Wundef -Wno-char-subscripts -funsigned-char -I . \
-o toybox_unstripped -Os -ffunction-sections -fdata-sections \
-Wl,--gc-sections main.c lib/*.c $TOYFILES \
-Wl,--as-needed,-lutil,--no-as-needed || exit 1
$STRIP toybox_unstripped -o toybox || exit 1

I wasn't trying to push that part into busybox, but essentially I do a "make
all" whenever anything changes, and let the compiler discard unneeded code.
My build logic is fairly simple as a result, and doesn't actually take that
long. (When you "make all" as often as I do, you find ways to keep the compile
time down. :) Also, if I'm testing just one app I can .config everything else
off, but mostly I don't bother...

The main downside is that doesn't take advantage of SMP, but that's really a
compiler issue. (You can't have the compiler doing gc-sections _and_ take
proper advantage of SMP until your compiler becomes multi-threaded. I
blathered about this topic for way too long at an an OLF bof a few years back,
http://free-electrons.com/pub/video/2008/ols/ols2008-rob-landley-linux-
compiler.ogg )

As I said, it's still pretty fast anyway. Admittedly toybox only has about 40
commands right now and busybox has around 7 times that many. But toybox takes
just under 9 seconds to do a defconfig build on my laptop. With a cold cache,
current busybox takes about that long to figure out it has nothing to do when
you type "make" in a directory that's already built everything. Complicating
things does not streamline them.

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Harald Becker
2010-03-09 00:55:11 UTC
Permalink
Hallo Rob!
Post by Rob Landley
Why do we have unnecessary leading whitespace? What happend to small and
simple and doing no more than absolutely necessary?
As far as I remember the original (K&R) behavior of wc was always to
produce leading whitespace (fixed format output). Only the newer
versions of gnu wc striped of this leading whitespace. That lead to
several shell script failures that had to be fixed during the last years.

Harald
Denys Vlasenko
2010-03-09 01:56:07 UTC
Permalink
Post by Harald Becker
Hallo Rob!
Post by Rob Landley
Why do we have unnecessary leading whitespace? What happend to small and
simple and doing no more than absolutely necessary?
As far as I remember the original (K&R) behavior of wc was always to
produce leading whitespace (fixed format output). Only the newer
versions of gnu wc striped of this leading whitespace. That lead to
several shell script failures that had to be fixed during the last years.
... and now we have script failures because _new_ scripts expect _new_
output format >>:( "Progress" sometimes looks like pointless churn.
--
vda
Mike Frysinger
2010-03-09 02:50:25 UTC
Permalink
Post by Denys Vlasenko
Post by Harald Becker
Hallo Rob!
Post by Rob Landley
Why do we have unnecessary leading whitespace? What happend to small
and simple and doing no more than absolutely necessary?
As far as I remember the original (K&R) behavior of wc was always to
produce leading whitespace (fixed format output). Only the newer
versions of gnu wc striped of this leading whitespace. That lead to
several shell script failures that had to be fixed during the last years.
... and now we have script failures because _new_ scripts expect _new_
output format >>:( "Progress" sometimes looks like pointless churn.
it depends on the options i think. normal `wc` still outputs leading spaces,
but `wc -c` never does. coreutils-5.94 and coreutils-8.4 behave the same ...
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20100308/ed0bfd01/attachment.pgp>
Denys Vlasenko
2010-03-09 03:19:38 UTC
Permalink
Post by Mike Frysinger
Post by Denys Vlasenko
Post by Harald Becker
Hallo Rob!
Post by Rob Landley
Why do we have unnecessary leading whitespace? What happend to small
and simple and doing no more than absolutely necessary?
As far as I remember the original (K&R) behavior of wc was always to
produce leading whitespace (fixed format output). Only the newer
versions of gnu wc striped of this leading whitespace. That lead to
several shell script failures that had to be fixed during the last years.
... and now we have script failures because _new_ scripts expect _new_
output format >>:( "Progress" sometimes looks like pointless churn.
it depends on the options i think. normal `wc` still outputs leading spaces,
but `wc -c` never does. coreutils-5.94 and coreutils-8.4 behave the same ...
-mike
I distinctly remember old times when even 'wc -c <file' was spewing out
leading spaces. Gosh... I am old enough now to talk about "old times" :)
--
vda
Mike Frysinger
2010-03-09 06:01:03 UTC
Permalink
Post by Denys Vlasenko
Post by Mike Frysinger
Post by Denys Vlasenko
Post by Harald Becker
Hallo Rob!
Post by Rob Landley
Why do we have unnecessary leading whitespace? What happend to
small and simple and doing no more than absolutely necessary?
As far as I remember the original (K&R) behavior of wc was always to
produce leading whitespace (fixed format output). Only the newer
versions of gnu wc striped of this leading whitespace. That lead to
several shell script failures that had to be fixed during the last years.
... and now we have script failures because _new_ scripts expect _new_
output format >>:( "Progress" sometimes looks like pointless churn.
it depends on the options i think. normal `wc` still outputs leading
spaces, but `wc -c` never does. coreutils-5.94 and coreutils-8.4 behave
the same ...
I distinctly remember old times when even 'wc -c <file' was spewing out
leading spaces. Gosh... I am old enough now to talk about "old times" :)
sorry, didnt mean to imply "never does" as "never has". 5.94 is the latest
version i had sitting around already compiled ... i certainly believe you when
you say older versions had leading whitespace. i recall `wc -l <file>`
changing behavior at some point to not including leading whitespace if there's
only one file as i had to often script around it in previous versions.
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20100309/02113f3c/attachment.pgp>
David N. Lombard
2010-03-09 14:26:15 UTC
Permalink
Post by Denys Vlasenko
Post by Harald Becker
Hallo Rob!
Post by Rob Landley
Why do we have unnecessary leading whitespace? What happend to small and
simple and doing no more than absolutely necessary?
As far as I remember the original (K&R) behavior of wc was always to
produce leading whitespace (fixed format output). Only the newer
versions of gnu wc striped of this leading whitespace. That lead to
several shell script failures that had to be fixed during the last years.
... and now we have script failures because _new_ scripts expect _new_
output format >>:( "Progress" sometimes looks like pointless churn.
True enough. But, there were two failures here:
1) Extraneous whitespace from wc, that *everybody* had to script around.
2) Failure to script defensively, so you didn't care how much whitespace.
--
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.
Rob Landley
2010-03-09 15:12:01 UTC
Permalink
Post by Denys Vlasenko
Post by Harald Becker
Hallo Rob!
Post by Rob Landley
Why do we have unnecessary leading whitespace? What happend to small
and simple and doing no more than absolutely necessary?
As far as I remember the original (K&R) behavior of wc was always to
produce leading whitespace (fixed format output). Only the newer
versions of gnu wc striped of this leading whitespace. That lead to
several shell script failures that had to be fixed during the last years.
... and now we have script failures because _new_ scripts expect _new_
output format >>:( "Progress" sometimes looks like pointless churn.
Endless lateral churn presented as progress is why I've more or less given up
on Linux on the Desktop. (As repeatedly ranted about in my blog.)

But at least for BusyBox, we can point at a standard and beat the hell out of
it with regression test suite. And in this case, the standard (SUSv4)
Post by Denys Vlasenko
STDOUT
By default, the standard output shall contain an entry for each input
"%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>
If the -m option is specified, the number of characters shall replace
the <bytes> field in this format.
If any options are specified and the -l option is not specified, the
number of <newline> characters shall not be written.
If any options are specified and the -w option is not specified, the
number of words shall not be written.
If any options are specified and neither -c nor -m is specified, the
number of bytes or characters shall not be written.
If no input file operands are specified, no name shall be written and no
<blank> characters preceding the pathname shall be written.
If more than one input file operand is specified, an additional line
shall be written, of the same format as the other lines, except that the
word total (in the POSIX locale) shall be written instead of a pathname and
the total of each column shall be written as appropriate. Such an
additional line, if any, is written at the end of the output.
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
Hines, Johnicholas
2010-03-15 12:37:38 UTC
Permalink
Hi.

I would like to be able to write custom busybox applets/commands more easily, and I see this toybox-ish refactoring as making that easier.
Converting the help is its own post, and involved rephrasing the help text and moving it into the Config.in entries. The toybox help parser isn't quite finished, it's supposed to assemble sub-options' "usage:" lines into a single coherent usage: line but doesn't yet. (Kinda hard to do in bash, and I don't want to introduce python or something as a build dependency. I should probably make a C program for it.)
I can try to write this C program, if you specify (maybe just with examples) what it ought to do.

Johnicholas
Continue reading on narkive:
Loading...