Discussion:
What on earth happened to platform.h?
(too old to reply)
Rob Landley
2010-10-24 00:27:18 UTC
Permalink
I need the ability to umount everything under a given directory (so I can do
an rm -rf even after doing a lot of bind mounting and such in a chroot), and
the obvious thing to do is make "umount -a" use its first argument (if any) as
the prefix to umount everything under. Since I wrote the busybox umount -a
implementation as of a few years ago, it didn't seem hard to do.

Except there's all sorts of... STUFF in there. For example, this is in
umount.c. It does not belong in umount.c

#if defined(__dietlibc__)
/* 16.12.2006, Sampo Kellomaki (sampo at iki.fi)
* dietlibc-0.30 does not have implementation of getmntent_r() */
static struct mntent *getmntent_r(FILE* stream, struct mntent* result,
char* buffer UNUSED_PARAM, int bufsize UNUSED_PARAM)
{
struct mntent* ment = getmntent(stream);
return memcpy(result, ment, sizeof(*ment));
}
#endif

Last I checked dietlibc was too broken for words, and the _correct_ response
to the complaint would have been "sounds like dietlibc should add that
function then", but in any case that should be an inline living in platform.h.
Dietlibc isn't umount.c's problem.

There's an also #ifndef PATH_MAX which clearly belongs in platform.h, before I
even _get_ to the main function. (Which is two lines of heavily annotated
crap to start the function, but that's a separate rant.)

So I go look at platform.h, which has this:

#if defined(__GNU_LIBRARY__) && __GNU_LIBRARY__ < 5 && \
!defined(__dietlibc__) && \
!defined(_NEWLIB_VERSION) && \
!(defined __digital__ && defined __unix__)
# error "Sorry, this libc version is not supported :("
#endif

What the...? That's not even a #warning, that's an #error. You added added
extra complexity to REFUSE to work in an unknown environment, when specific
apps might otherwise work. What's the POINT of programming ot things like
SUSv4 if you don't really mean it?

Good grief, you're not even checking for uClibc but just assuming that it will
sufficiently pretend to be glibc. You're not only perpetuating other people's
mistakes, you're trying to top them: uClibc must pretend to be glibc because
of all the packages out there that specifically CHECK for glibc, which is
because THOSE PACKAGES ARE BROKEN. You'd think you'd know better, that "all
the world's glibc" is wrong, so instead of trying to be environment _agnostic_
you play whack-a-mole with the historical record. Because nobody's ever going
to come up with anything new without notifying you first.

By the way, if you're going to have the insane complextiy of "smallint"
(apparently busybox is no longer about being simple), then where's your
developer guide documenting it? "grep -r smallint docs" returns nothing, and
the word does not appear in the FAQ either. New developers have to just hack
on the code long enough to learn this by osmosis?

Is it some sort of project senility? The older a project gets, the more
layers of arterial plaque it accumulates until finally it's ossifies into an
unmodifiable statue? (xmms anyone?)

I spent my time on this project _removing_ as much code as I added. I rewrote
existing apps multiple times because I could make them _simpler_. Is that
still even a project goal? Is all this overgrowth an oversight, or do you
guys honestly not _care_ about this stuff anymore?

When did simplicity stop being a goal?

Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
Denys Vlasenko
2010-10-24 01:20:01 UTC
Permalink
Post by Rob Landley
I need the ability to umount everything under a given directory (so I can do
an rm -rf even after doing a lot of bind mounting and such in a chroot), and
the obvious thing to do is make "umount -a" use its first argument (if any) as
the prefix to umount everything under. Since I wrote the busybox umount -a
implementation as of a few years ago, it didn't seem hard to do.
Except there's all sorts of... STUFF in there. For example, this is in
umount.c. It does not belong in umount.c
#if defined(__dietlibc__)
/* 16.12.2006, Sampo Kellomaki (sampo at iki.fi)
* dietlibc-0.30 does not have implementation of getmntent_r() */
static struct mntent *getmntent_r(FILE* stream, struct mntent* result,
char* buffer UNUSED_PARAM, int bufsize UNUSED_PARAM)
{
struct mntent* ment = getmntent(stream);
return memcpy(result, ment, sizeof(*ment));
}
#endif
Last I checked dietlibc was too broken for words, and the _correct_ response
to the complaint would have been "sounds like dietlibc should add that
function then", but in any case that should be an inline living in platform.h.
Dietlibc isn't umount.c's problem.
It isn't in platform.h because it needs #include <mntent.h>,
and I do not want to do that in platform.h - 99% of platform.h
users do not need that stuff.

Maybe we need to create bb_mount.h?
Post by Rob Landley
There's an also #ifndef PATH_MAX which clearly belongs in platform.h, before I
even _get_ to the main function. (Which is two lines of heavily annotated
crap to start the function, but that's a separate rant.)
Hmm, you spotted a buglet. umount *abuses* PATH_MAX for a mostly
unrelated purpose: a buffer size for getmntent_r.
I'd simply use a "big enough" 4k buffer, and drop PATH_MAX from umount.c
(Did it in git just now)
Post by Rob Landley
#if defined(__GNU_LIBRARY__) && __GNU_LIBRARY__ < 5 && \
!defined(__dietlibc__) && \
!defined(_NEWLIB_VERSION) && \
!(defined __digital__ && defined __unix__)
# error "Sorry, this libc version is not supported :("
#endif
What the...? That's not even a #warning, that's an #error. You added added
extra complexity to REFUSE to work in an unknown environment, when specific
apps might otherwise work. What's the POINT of programming ot things like
SUSv4 if you don't really mean it?
Good grief, you're not even checking for uClibc but just assuming that it will
sufficiently pretend to be glibc. You're not only perpetuating other people's
mistakes, you're trying to top them: uClibc must pretend to be glibc because
of all the packages out there that specifically CHECK for glibc, which is
because THOSE PACKAGES ARE BROKEN. You'd think you'd know better, that "all
the world's glibc" is wrong, so instead of trying to be environment _agnostic_
you play whack-a-mole with the historical record. Because nobody's ever going
to come up with anything new without notifying you first.
Fair enough. I removed that block.
Post by Rob Landley
By the way, if you're going to have the insane complextiy of "smallint"
(apparently busybox is no longer about being simple), then where's your
developer guide documenting it? "grep -r smallint docs" returns nothing, and
the word does not appear in the FAQ either. New developers have to just hack
on the code long enough to learn this by osmosis?
Added docs/smallint.txt
Post by Rob Landley
Is it some sort of project senility? The older a project gets, the more
layers of arterial plaque it accumulates until finally it's ossifies into an
unmodifiable statue? (xmms anyone?)
existing apps multiple times because I could make them _simpler_. Is that
still even a project goal?
Yes (as long as it works compatibly with standard tools).
Post by Rob Landley
Is all this overgrowth an oversight, or do you
guys honestly not _care_ about this stuff anymore?
I wanted to post a very similar rant about uclibc. But did not. Ranting usually
doesn't help. Doing something about things you want to see improved is the way
forward.

You are good at simplifying stuff by rewriting stuff. Thinking positively
about it, bbox is a huge field of work where you can take a small, isolated
chunk of code and rewrite it. Most other projects do not have such structure,
they are more monolithic.

*And* you have write access to git. You do not need to wait for asshole
maintainer to wake up and reply to your patches.

IOW: bbox is a nearly ideal project for you to participate, do
what you like to do, and enjoy the process.

So why are you so grumpy? :D
Post by Rob Landley
When did simplicity stop being a goal?
It did not. For example, sha1/256/512 and md5 hashes were recently simplified.
--
vda
Rob Landley
2010-10-24 09:02:52 UTC
Permalink
Post by Denys Vlasenko
I wanted to post a very similar rant about uclibc. But did not. Ranting
usually doesn't help. Doing something about things you want to see improved
is the way forward.
For uClibc, I tended to mail cakes to people. It seems to work, for some
reason.

Alas at this point, it's hard to consider the uClibc project anything other
than a zombie. NTPL for uClibc has been in development for _five_years_ and
still hasn't shipped. It took the original NPTL developers what, six months
to get a working implementation in glibc? And they had to do the kernel-side
infrastructure while they were at it. The uClibc web page hasn't been updated
in over 6 months...

I know, I should download the git snapshots and regression test and go engage
with that project more... It's on the todo list.
Post by Denys Vlasenko
You are good at simplifying stuff by rewriting stuff. Thinking positively
about it, bbox is a huge field of work where you can take a small, isolated
chunk of code and rewrite it. Most other projects do not have such
structure, they are more monolithic.
*And* you have write access to git. You do not need to wait for asshole
maintainer to wake up and reply to your patches.
I'm not blaming you for this. You're doing a better job of keeping up with
the patch submission flood than I ever did...
Post by Denys Vlasenko
IOW: bbox is a nearly ideal project for you to participate, do
what you like to do, and enjoy the process.
So why are you so grumpy? :D
Eh, I'm usually grumpy. I yell at my own code all the time. (That's why I
keep rewriting it so much.) In an average open source development project, I
throw away ten times as much code as I wind up with. (Day job code I'm paid
to write, I tend to keep my first or second draft because they just usually
want it to work once, by deadline. Open source code, I keep going until I've
got something I can be proud of.)

In this case, I'm grumpy because last time I got seriously involved in busybox
it sort of ate my life, and I've got other projects (like aboriginal linux)
that I'm trying to get to a good stopping point before getting _too_ involved
with other stuff. I still haven't gotten the kernel perl removal patches
pushed upstream. The kconfig stuff is finally getting something _like_
miniconfig, but not actually miniconfig. And SOMEBODY needs to turn glue qemu's
TCG to the old tinycc front end and make a real cross-platform qcc compiler
capable of building the Linux kernel. Plus the "hello world" kernel project,
and about 30 other things...

Once upon a time I spent months each rewriting individual busybox apps, over
and over, until I was pleased with the results. And umount was one of them,
so it seemed safe to go in and make a specific change to busybox umount, to add
a small and simple feature that should have been maybe a three line patch.

Instead I got sidetracked into wondering why dietlibc #ifdefs and hurd patches
were splattered all over the code since the last time I looked at it. "git
show fbedacfc8caa1" is a huge mess replacing realpath(), which is a
posix.1-2001 function. We weren't even using the NULL extension we were doing
our own malloc, and it got replaced with:

+ char *resolved_path = xmalloc_realpath(*argv);
+ if (resolved_path != NULL) {
puts(resolved_path);
+ free(resolved_path);


So we have an xmalloc function and we're checking the null return. Huh?
Either it's badly named, or the caller is wasting bytes.

- safe_strncpy(path, m->dir, PATH_MAX);
+ path = xstrdup(m->dir);

What was wrong with xstrdup() here? And what's "safe" about randomly
truncating a string (which isn't necessary on Linux), this arbitrarily breaks
umount on really long paths when it might otherwise work fine. If the syscall
has a limit, it's the syscall's job to truncate it. (Or error out, which
isn't going to be worse than truncating the path _before_ calling the syscall,
which can't possibly work.)

- realpath(zapit, path);
- for (m = mtl; m; m = m->next)
- if (!strcmp(path, m->dir) || !strcmp(path, m-
Post by Denys Vlasenko
device))
- break;
+ path = xmalloc_realpath(zapit);
+ if (path) {
+ for (m = mtl; m; m = m->next)
+ if (strcmp(path, m->dir) == 0 ||
strcmp(path, m->device) == 0)
+ break;
+ }

Posix 2008 says we can rely on realpath() doing a malloc with NULL now, so
wrapping it seems like something we'd want to _undo_ now. As for the rest of
it, the previous code was better. The new code is pointlessly verbose to
perform the same function. And I have no idea why "path" was renamed "buf" in
lots of places, since "path" describes what the variable holds and "buf" might
as well be "variable" for all it tells us about the function.

This sort of thing makes me grumpy, because when I see "hurd support" as the
topic of a patch I go "is this really a worthwile goal?", and then I see that
the patch makes the code significantly uglier, which is a cost. And a lot of
the patch has nothing to do with hurd but is about adding G.varname prefixes to
things and other unrelated churn, so I have to read it all to figure out which
parts are good and which parts aren't. And each individual quibble seems kind
of petty by itself, but they add up, all together they add up to the code
getting significantly uglier in pursuit of a goal I honestly don't see any
point in.

The hurd is slightly _less_ interesting than Plan 9, haiku-os, netbsd, the
post-illumos Solaris still maintained by Oracle, QNX Neutrino (which is open
source these days), and so on. The only reason Hurd still exists at all is
that some people worship Richard Stallman to the exclusion of all reason, and
insist that his 20-year old fossilized ego trip will somehow do _something_
any day now. (Yeah, and maybe so will ReactOS, but who _cares_?)

This doesn't mean a small and simple patch to tweak something that was
bothering the hurd isn't worth doing, but pouring buckets of ugly over the
codebase in the name of the hurd does not strike me as worth it.

Which means if I go in and start fiddling with umount, the first thing I'm
likely do is break Hurd support. I haven't got a regression test environment,
and I actively fail to care about that target, so I pretty much expect it. A
significant part of my frustration is not knowing whether or not I'm _allowed_
to clean out that crap to get us back to simple, or whether maintaining hurd
support (which I can't test) is important.

I don't understand what the requirements are for this code. Which means my
approach would have to be "make a small localized change and don't change
anything else because it's brittle black magic"... which takes all the fun out
of a project like busybox.

(And you're right, uClibc needs a serious cleanup too. Earlier this week I
took a close look at bits/mman.h across multiple architectures. Between
0.9.30 and 0.9.30, PROT_READ and PROT_EXEC swapped values in "common", and
MAP_FIXED changed its value. MAP_FIXED is still inconsistent accross
architectures, and I need to figure out if that's intentional or not... I just
haven't had time to do that so far this week.)
Post by Denys Vlasenko
Post by Rob Landley
When did simplicity stop being a goal?
It did not. For example, sha1/256/512 and md5 hashes were recently simplified.
/me looks:

Um...

#if 0
remaining = 128 - bufpos;

/* Hash whole blocks */
while (len >= remaining) {
memcpy(ctx->wbuffer + bufpos, buffer, remaining);
buffer = (const char *)buffer + remaining;
len -= remaining;
remaining = 128;
bufpos = 0;
sha512_process_block128(ctx);
}

/* Save last, partial blosk */
memcpy(ctx->wbuffer + bufpos, buffer, len);
#else
yet more code...

Uh-huh.

Attached is the sha1sum.c I wrote for toybox a few years back. (Never did
quite get around to cleaning it up, extending it to support the various other
shaXXXsum sizes, or implementing md5sum. My todo list runneth over.) But
what I was going for was _simple_, and I confirmed it produced the right output
on all the data I threw at it. It's 185 lines including the actual applet
implementation and help text and everything. The busybox one is 900 lines for
the engine.

Busybox has three or four different implementations of each piece of
infrastructure in the sha1sum/md5sum stuff, depending on the "size vs speed"
knob. I focused on doing it once in a way that was easiest to read and to
understand.

I.E. my implementation was simple first, worrying about small and fast second.
The code you were referring to (current as of October 19th, git says) has a
size vs speed config entry leading to a heavily redundant implementation, at
the expense of simplicity, with rather a lot of what it's doing hidden in
various macros.

This honestly seems to me like a question of differing design goals.

Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sha1sum.c
Type: text/x-csrc
Size: 4478 bytes
Desc: not available
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20101024/06ea8485/attachment.c>
Denys Vlasenko
2010-10-24 11:47:32 UTC
Permalink
Post by Rob Landley
Post by Denys Vlasenko
I wanted to post a very similar rant about uclibc. But did not. Ranting
usually doesn't help. Doing something about things you want to see improved
is the way forward.
For uClibc, I tended to mail cakes to people. It seems to work, for some
reason.
Alas at this point, it's hard to consider the uClibc project anything other
than a zombie. NTPL for uClibc has been in development for _five_years_ and
still hasn't shipped. It took the original NPTL developers what, six months
to get a working implementation in glibc? And they had to do the kernel-side
infrastructure while they were at it. The uClibc web page hasn't been updated
in over 6 months...
I know, I should download the git snapshots and regression test and go engage
with that project more... It's on the todo list.
So let's do it.

My problem with working on NTPL is that I dislike everything thread-related.
The only piece of software of the 1024 CPU machine which _has to be_ threaded
is the kernel, everything else is easier to parallelize on a task basis:
don't waste time degeloping, say, 1024-CPU parallelized gzip - instead,
run thousands of gzip copies!

It will be somewhat hard for me to work on NTPL in uclibc...
Post by Rob Landley
Post by Denys Vlasenko
So why are you so grumpy? :D
Eh, I'm usually grumpy. I yell at my own code all the time. (That's why I
keep rewriting it so much.)
Well, this reduces the number of people who want to cooperate with you.
People are illogical. They better take criticism if it is stuff sugar-coated.
Post by Rob Landley
Once upon a time I spent months each rewriting individual busybox apps, over
and over, until I was pleased with the results. And umount was one of them,
so it seemed safe to go in and make a specific change to busybox umount, to add
a small and simple feature that should have been maybe a three line patch.
Instead I got sidetracked into wondering why dietlibc #ifdefs and hurd patches
were splattered all over the code since the last time I looked at it. "git
show fbedacfc8caa1" is a huge mess replacing realpath(), which is a
posix.1-2001 function.
realpath to a char buf[PATH_MAX] has a problem of requiring 4k large buffer.
In almost all cases, xmalloc_realpath() will be MUCH smaller. I liked
this patch mostly for this reason.

Maybe we need to have allocating version of getmntent too.
Post by Rob Landley
We weren't even using the NULL extension we were doing
+ char *resolved_path = xmalloc_realpath(*argv);
+ if (resolved_path != NULL) {
puts(resolved_path);
+ free(resolved_path);
So we have an xmalloc function and we're checking the null return. Huh?
Either it's badly named, or the caller is wasting bytes.
NULL here happens olny when realpath fails to "realpathize" its argument.
Say, xmalloc_realpath("/it/does/not/exist") is NULL.
Post by Rob Landley
- safe_strncpy(path, m->dir, PATH_MAX);
+ path = xstrdup(m->dir);
What was wrong with xstrdup() here?
The change replaces safe_strncpy with xstrdup, not the other way around.
Post by Rob Landley
And what's "safe" about randomly
truncating a string (which isn't necessary on Linux), this arbitrarily breaks
umount on really long paths when it might otherwise work fine.
"safe" in safe_strncpy means "it will always be NUL-terminated" (unlike strncpy).
Post by Rob Landley
- realpath(zapit, path);
- for (m = mtl; m; m = m->next)
- if (!strcmp(path, m->dir) || !strcmp(path, m-
Post by Denys Vlasenko
device))
- break;
+ path = xmalloc_realpath(zapit);
+ if (path) {
+ for (m = mtl; m; m = m->next)
+ if (strcmp(path, m->dir) == 0 ||
strcmp(path, m->device) == 0)
+ break;
+ }
Posix 2008 says we can rely on realpath() doing a malloc with NULL now, so
wrapping it seems like something we'd want to _undo_ now. As for the rest of
it, the previous code was better. The new code is pointlessly verbose to
perform the same function.
Readability. !strcmp() reads as "not strcmp" - ?
strcmp() != 0 reads as "not equal" - much closer to what it actually do.

I prefer more "spelled out" code on the source level exactly beause
it's easier to read. Your code is usually more densely written.
This is not a significant difference.
Post by Rob Landley
Which means if I go in and start fiddling with umount, the first thing I'm
likely do is break Hurd support. I haven't got a regression test environment,
and I actively fail to care about that target, so I pretty much expect it.
Don't worry about it. It's up to hurd people (if any) to fix up.
Post by Rob Landley
A
significant part of my frustration is not knowing whether or not I'm _allowed_
to clean out that crap to get us back to simple, or whether maintaining hurd
support (which I can't test) is important.
I don't understand what the requirements are for this code. Which means my
approach would have to be "make a small localized change and don't change
anything else because it's brittle black magic"... which takes all the fun out
of a project like busybox.
If you want to change something but not sure whether it is ok,
post the patch(es) to the ml.
Post by Rob Landley
Post by Denys Vlasenko
Post by Rob Landley
When did simplicity stop being a goal?
It did not. For example, sha1/256/512 and md5 hashes were recently simplified.
Um...
#if 0
remaining = 128 - bufpos;
/* Hash whole blocks */
while (len >= remaining) {
memcpy(ctx->wbuffer + bufpos, buffer, remaining);
buffer = (const char *)buffer + remaining;
len -= remaining;
remaining = 128;
bufpos = 0;
sha512_process_block128(ctx);
}
/* Save last, partial blosk */
memcpy(ctx->wbuffer + bufpos, buffer, len);
#else
yet more code...
Uh-huh.
Attached is the sha1sum.c I wrote for toybox a few years back. (Never did
quite get around to cleaning it up, extending it to support the various other
shaXXXsum sizes, or implementing md5sum. My todo list runneth over.) But
what I was going for was _simple_, and I confirmed it produced the right output
on all the data I threw at it. It's 185 lines including the actual applet
implementation and help text and everything. The busybox one is 900 lines for
the engine.
I saw it.
Wondered "how the hell Rod managed to squeeze sha1 into that few lines" :D
Post by Rob Landley
Busybox has three or four different implementations of each piece of
infrastructure in the sha1sum/md5sum stuff, depending on the "size vs speed"
knob. I focused on doing it once in a way that was easiest to read and to
understand.
I.E. my implementation was simple first, worrying about small and fast second.
The code you were referring to (current as of October 19th, git says) has a
size vs speed config entry leading to a heavily redundant implementation, at
the expense of simplicity, with rather a lot of what it's doing hidden in
various macros.
This "size vs speed" thing im md5 is not mine. busybox-1.00 has it too.
As you can see, sha256/512 which I added don't have them.
--
vda
Rob Landley
2010-10-26 17:36:09 UTC
Permalink
Post by Denys Vlasenko
Post by Rob Landley
Post by Denys Vlasenko
I wanted to post a very similar rant about uclibc. But did not. Ranting
usually doesn't help. Doing something about things you want to see
improved is the way forward.
For uClibc, I tended to mail cakes to people. It seems to work, for some
reason.
Alas at this point, it's hard to consider the uClibc project anything
other than a zombie. NTPL for uClibc has been in development for
_five_years_ and still hasn't shipped. It took the original NPTL
developers what, six months to get a working implementation in glibc?
And they had to do the kernel-side infrastructure while they were at it.
The uClibc web page hasn't been updated in over 6 months...
I know, I should download the git snapshots and regression test and go
engage with that project more... It's on the todo list.
So let's do it.
It's one of the things I mean to do with Aboriginal Linux. I've got the
ability to run exactly the same setup on a dozen different architectures, and
to rebuild with arbitrary package versions. And it all runs under qemu so you
don't need special hardware to test the result.

I need to document it a bit better some people other than _me_ can do this.
(I have enough trouble keeping up with new kernel versions.)
Post by Denys Vlasenko
My problem with working on NTPL is that I dislike everything
thread-related.
I went from the C64 to DOS to desqview to OS/2 to Java to Linux. You can't go
through OS/2 without getting a pretty thourough grounding in threads, it was
sort of their thing. (Java was just a refresher course.)

That said, I spent almost ten years programming for Linux before bothering to
actually look up the pthreads API. You usually don't _need_ threads on Linux.

And _that_ said, I've pondered doing threaded bunzip2 and bzip2
implementations, because the separate 900k blocks make the sucker almost
trivially parallelizeable, and SMP is available even on ARM these days.

Attached is my threaded hello world program, which exercises pretty much all
of the thread infrastructure. It contains all you'd need to know to add
thread support to pretty much any program that could benefit from it. I use it
to smoke test the threading infrastructure: if this builds/works then the
threading is at least sane.
Post by Denys Vlasenko
The only piece of software of the 1024 CPU machine which
_has to be_ threaded is the kernel, everything else is easier to
parallelize on a task basis: don't waste time degeloping, say, 1024-CPU
parallelized gzip - instead, run thousands of gzip copies!
Which then wind up talking to each other through pipes or fifos or sockets, and
you have a scalability bottleneck in a select statement sending the data back.
Less so now with the new pipe infrastructure, but still, you have to copy data
between process contexts because fiddling with page tables for non-persistent
shared memory is _more_ expensive than copying, and it's a cache flush either
way...

Threads are a tool, just like object orientation. There are times when it's
appropriate and very helpful, and times when shoehorning it onto a problem
makes things worse. I haven't wanted to introduce it to busybox for the same
reason I haven't wanted to introduce an ncurses dependency.
Post by Denys Vlasenko
It will be somewhat hard for me to work on NTPL in uclibc...
I'm not too worried about it, I just haven't gotten around to it.

The LLVM/Clang guys just built a working Linux kernel:

http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-October/011711.html

To quote Scotty from Star Trek 2, "It's bypassed like a christmas tree", but
it did actually work for one architecture. It's not yet to the point where
finding more bugs is useful if you can't fix them yourself (they've got plenty,
thanks), but in another release or two it'll be worth trying that out on a
bunch of architectures, possibly even nightly regression testing.

Meaning that's yet _another_ todo item added to my list. :)
Post by Denys Vlasenko
Post by Rob Landley
Post by Denys Vlasenko
So why are you so grumpy? :D
Eh, I'm usually grumpy. I yell at my own code all the time. (That's why
I keep rewriting it so much.)
Well, this reduces the number of people who want to cooperate with you.
I'm aware of this. Didn't say it was a virtue.
Post by Denys Vlasenko
People are illogical. They better take criticism if it is stuff sugar-coated.
Post by Rob Landley
Once upon a time I spent months each rewriting individual busybox apps,
over and over, until I was pleased with the results. And umount was one
of them, so it seemed safe to go in and make a specific change to busybox
umount, to add a small and simple feature that should have been maybe a
three line patch.
Instead I got sidetracked into wondering why dietlibc #ifdefs and hurd
patches were splattered all over the code since the last time I looked at
it. "git show fbedacfc8caa1" is a huge mess replacing realpath(), which
is a posix.1-2001 function.
realpath to a char buf[PATH_MAX] has a problem of requiring 4k large
buffer. In almost all cases, xmalloc_realpath() will be MUCH smaller. I
liked this patch mostly for this reason.
Behind the scenes, the uClibc iplementation of realpath, with NULL as the
second argument, mallocs a 4k buffer. (Because they're going for small code
over small data set, and assuming that the realpath() return value is
generally temporary.) So you're mallocing a 4k buffer, allocating a second
buffer, copying the data, and freeing the first buffer (probably fragmenting the
heap).

If you wind up fighting the C library that much, I'd get a better C library.
It's the kind of added complexity for a micro-optimization that really doesn't
appeal to me.
Post by Denys Vlasenko
Maybe we need to have allocating version of getmntent too.
More likely we need an mmap() based version, but mostly I trust the C library
to do its' thing.

Linux won't boot in less than 4 megs of ram. It _will_ boot from less than 4
megs of persitent storage. And the smaller amount of code is easier to audit,
easier to maintain, easier to learn/understand, is more likely to fit in L1 and
L2 caches...

*shrug* Different goals.
Post by Denys Vlasenko
Post by Rob Landley
We weren't even using the NULL extension we were doing
+ char *resolved_path = xmalloc_realpath(*argv);
+ if (resolved_path != NULL) {
puts(resolved_path);
+ free(resolved_path);
So we have an xmalloc function and we're checking the null return. Huh?
Either it's badly named, or the caller is wasting bytes.
NULL here happens olny when realpath fails to "realpathize" its argument.
Say, xmalloc_realpath("/it/does/not/exist") is NULL.
So it's an xfunction() which can return failure. Personally I found that non-
obvious.
Post by Denys Vlasenko
Post by Rob Landley
- safe_strncpy(path, m->dir, PATH_MAX);
+ path = xstrdup(m->dir);
What was wrong with xstrdup() here?
The change replaces safe_strncpy with xstrdup, not the other way around.
*shrug* Ok.
Post by Denys Vlasenko
Post by Rob Landley
And what's "safe" about randomly
truncating a string (which isn't necessary on Linux), this arbitrarily
breaks umount on really long paths when it might otherwise work fine.
"safe" in safe_strncpy means "it will always be NUL-terminated" (unlike strncpy).
There used to be a strlcpy() that would always null terminate. I forget where
that was from. (Ok, I wrote my own under DOS in 1991, but I later found out
that other C libraries already included it. Ulrich Drepper thought it was a
horrible idea, but then he thinks static linking is a horrible idea...)
Post by Denys Vlasenko
Post by Rob Landley
- realpath(zapit, path);
- for (m = mtl; m; m = m->next)
- if (!strcmp(path, m->dir) ||
!strcmp(path, m-
Post by Denys Vlasenko
device))
- break;
+ path = xmalloc_realpath(zapit);
+ if (path) {
+ for (m = mtl; m; m = m->next)
+ if (strcmp(path, m->dir) == 0 ||
strcmp(path, m->device) == 0)
+ break;
+ }
Posix 2008 says we can rely on realpath() doing a malloc with NULL now,
so wrapping it seems like something we'd want to _undo_ now. As for the
rest of it, the previous code was better. The new code is pointlessly
verbose to perform the same function.
Readability. !strcmp() reads as "not strcmp" - ?
In the shell, 0 is success and nonzero is failure. In C, 0 is failure and
nonzero is success. In strcmp() there's greater than, less than, and zero.
Anybody who can't keep all these straight is going to have to be very good at
writing test suites.
Post by Denys Vlasenko
strcmp() != 0 reads as "not equal" - much closer to what it actually do.
The linux kernel style guys actually edit out those kind of useless
appendanges as part of their style checking during code review. So you're
adding stuff to make the code look less like the kernel does.

Do you similar bloat if (x) to say "if (x != FALSE)"? I see that kind of
thing in people who are new to C, but not much in people who've been doing it
for a while.
Post by Denys Vlasenko
I prefer more "spelled out" code on the source level exactly beause
it's easier to read. Your code is usually more densely written.
This is not a significant difference.
*shrug* Your call.

There are some studies out there that say there's an optimal module size,
above which defect density increases because you can't keep all the code in
your head at once, and below which defect density increases because the code
is chopped up into such small pieces that you can't see the forest for the
trees and it spends all its time interfacing with itself:

http://catb.org/esr/writings/taoup/html/ch04s01.html

One of the interesting things about those studies is actually that defect
density is not strongly influenced by language. (Variation in individual
programmers is larger than variation across languages.) So once you're
reasonably proficient at both 1000 lines of python and 1000 lines of C are
likely to have about the same number of bugs to work out, but 1000 lines of
python _does_ way more.

What I took away from that is "making the code bigger makes it buggier,
because people can't keep as much of it in their heads". Which is part of the
reason I personally consider making the code "terse but clear" to be a good
call. Even comments need to pull their own weight, there are times when
deleting unnecessary comments and just letting you see all the CODE at once
makes things easier to understand, and I say that as a compulsive commenter.
:)
Post by Denys Vlasenko
Post by Rob Landley
Which means if I go in and start fiddling with umount, the first thing
I'm likely do is break Hurd support. I haven't got a regression test
environment, and I actively fail to care about that target, so I pretty
much expect it.
Don't worry about it. It's up to hurd people (if any) to fix up.
Ok.

I vaguely wonder if there should be a place we list busybox features that
other immplementations don't have. For example, busybox mount never needs to
say "-o loop", because you can trivially autodetect when you're being asked to
mount a file on a directory. (And directory on directory or file on file are
bind mounts, although I don't remember if I made it autodetect that. --move
still needs to be explicitly specified, and overrides the default "directory on
directory" behavior.)

Adding the ability to umount everything under a given directory is another
extension.
Post by Denys Vlasenko
Post by Rob Landley
A
significant part of my frustration is not knowing whether or not I'm
_allowed_ to clean out that crap to get us back to simple, or whether
maintaining hurd support (which I can't test) is important.
I don't understand what the requirements are for this code. Which means
my approach would have to be "make a small localized change and don't
change anything else because it's brittle black magic"... which takes all
the fun out of a project like busybox.
If you want to change something but not sure whether it is ok,
post the patch(es) to the ml.
Gotta write 'em first, which is where the question arises.
Post by Denys Vlasenko
Post by Rob Landley
Attached is the sha1sum.c I wrote for toybox a few years back. (Never
did quite get around to cleaning it up, extending it to support the
various other shaXXXsum sizes, or implementing md5sum. My todo list
runneth over.) But what I was going for was _simple_, and I confirmed it
produced the right output on all the data I threw at it. It's 185 lines
including the actual applet implementation and help text and everything.
The busybox one is 900 lines for the engine.
I saw it.
Wondered "how the hell Rod managed to squeeze sha1 into that few lines" :D
Read the darn algorithm, understood what it was doing (if not why) long enough
to implement it.

*shrug* The usual.

The problem is this kind of thing can't be done in 15 minute increments (or at
least I can't do it), it needs nice solid multi-hour blocks to sit down and
think a big thing through, where I start out well rested. And spare ones of
those are hard to come by these days, they tend to get used for other
things...
Post by Denys Vlasenko
Post by Rob Landley
Busybox has three or four different implementations of each piece of
infrastructure in the sha1sum/md5sum stuff, depending on the "size vs
speed" knob. I focused on doing it once in a way that was easiest to
read and to understand.
I.E. my implementation was simple first, worrying about small and fast
second. The code you were referring to (current as of October 19th, git
says) has a size vs speed config entry leading to a heavily redundant
implementation, at the expense of simplicity, with rather a lot of what
it's doing hidden in various macros.
This "size vs speed" thing im md5 is not mine. busybox-1.00 has it too.
As you can see, sha256/512 which I added don't have them.
I need to sit down and read through busybox's implementation, figure out what
bits of it are md5 and what are shaXXX and what benefit it gets from sharing...

Won't be today, though. I'm backporting "#pragma visibility" (introduced in
gcc 4.0) to gcc 3.4.6 in hopes that's all uClibc++ needs to build there, then
dusting off and finishing up the strace port.

Tonight I might have the energy to finish up the lfs-bootstrap stuff so
aboriginal can natively build the Linux From Scratch 6.7 packags as part of
its nightly automation. Last night I meant to, but wound up just watching
"Meerkat Manor" instead. (Imagine a live action lolcats soap opera, narrated
by Samwise Gamgee.)

Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: thread-hello2.c
Type: text/x-csrc
Size: 2406 bytes
Desc: not available
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20101026/9936b3a5/attachment.c>
Denys Vlasenko
2010-10-26 21:26:39 UTC
Permalink
Post by Rob Landley
Post by Denys Vlasenko
The only piece of software of the 1024 CPU machine which
_has to be_ threaded is the kernel, everything else is easier to
parallelize on a task basis: don't waste time degeloping, say, 1024-CPU
parallelized gzip - instead, run thousands of gzip copies!
Which then wind up talking to each other through pipes or fifos or sockets, and
you have a scalability bottleneck in a select statement sending the data back.
Less so now with the new pipe infrastructure, but still, you have to copy data
between process contexts because fiddling with page tables for non-persistent
shared memory is _more_ expensive than copying, and it's a cache flush either
way...
No, not _those_ tasks. Bigger tasks. When you run gzip, you run it as a part
of a larger "task to accomplish something". E.g. maybe you are processing Mars
Reconnaissance Orbiter photos, and archiving step in pipeline compresses them.

Then, do not bother creating insanely parallelized gzip (and insanely
parallelized image analysis software, and insanely parallelized database...);
instead, process in parallel *insane number of photos* using run of the mill,
simple, *single-threaded* tools.

This may even be faster, since you do not need to bother with locks,
cacheline bouncing and such.

But more importantly, you have so much less complexity and
fewer bugs to fix!

(If you do not have insanely many photos to process, but just a few,
in most cases today's CPUs are fast enough to not optimize for this case.)
Post by Rob Landley
Threads are a tool, just like object orientation. There are times when it's
appropriate and very helpful, and times when shoehorning it onto a problem
makes things worse.
Right. Use them only when you must.
Post by Rob Landley
Post by Denys Vlasenko
Readability. !strcmp() reads as "not strcmp" - ?
In the shell, 0 is success and nonzero is failure. In C, 0 is failure and
nonzero is success. In strcmp() there's greater than, less than, and zero.
Anybody who can't keep all these straight is going to have to be very good at
writing test suites.
Post by Denys Vlasenko
strcmp() != 0 reads as "not equal" - much closer to what it actually do.
The linux kernel style guys actually edit out those kind of useless
appendanges as part of their style checking during code review. So you're
adding stuff to make the code look less like the kernel does.
Do you similar bloat if (x) to say "if (x != FALSE)"? I see that kind of
thing in people who are new to C, but not much in people who've been doing it
for a while.
I think that even though we do understand what "if (strcmp(...))" mean,
it doesn't follow that this is the best style. With a style which
encourages use of == 0 or != 0 with strcmp, I started making less
bugs with inverted logic in string compares...

if ([!]x) is definitely ok for bool and pointers. For ints,
it is sometimes good to write if (x != 0), especially if the code
aroud that place is complex-ish and it's hard to figure out
the type of x. Not a hard rule.
Post by Rob Landley
There are some studies out there that say there's an optimal module size,
above which defect density increases because you can't keep all the code in
your head at once,
Of cource, no one can keep infinite amount of structure in the head.

Readability simply moves that limit a bit farther, by making every
individual sub-fragment of code less taxing on the brain.
Post by Rob Landley
Even comments need to pull their own weight, there are times when
deleting unnecessary comments
Yes. Good comments are an art (and quite distinct from "verbose comments").
Post by Rob Landley
I vaguely wonder if there should be a place we list busybox features that
other immplementations don't have. For example, busybox mount never needs to
say "-o loop", because you can trivially autodetect when you're being asked to
mount a file on a directory. (And directory on directory or file on file are
bind mounts, although I don't remember if I made it autodetect that.
IIRC no, "file on file" bind mounts aren't working.
Post by Rob Landley
Post by Denys Vlasenko
Wondered "how the hell Rod managed to squeeze sha1 into that few lines" :D
Read the darn algorithm, understood what it was doing (if not why) long enough
to implement it.
Done, shrank, sped up & pushed to bbox. Faster than old bbox one on 32 bits,
sadly, slower on 64. With better gcc, 32 bit code can get a bit faster still.
(Stupid robot did not free a register parameter to use it inside the loop.)
Thanks!
Post by Rob Landley
Post by Denys Vlasenko
Post by Rob Landley
Busybox has three or four different implementations of each piece of
infrastructure in the sha1sum/md5sum stuff, depending on the "size vs
speed" knob. I focused on doing it once in a way that was easiest to
read and to understand.
I.E. my implementation was simple first, worrying about small and fast
second. The code you were referring to (current as of October 19th, git
says) has a size vs speed config entry leading to a heavily redundant
implementation, at the expense of simplicity, with rather a lot of what
it's doing hidden in various macros.
This "size vs speed" thing im md5 is not mine. busybox-1.00 has it too.
As you can see, sha256/512 which I added don't have them.
I need to sit down and read through busybox's implementation, figure out what
bits of it are md5 and what are shaXXX and what benefit it gets from sharing...
Everything except sha512 uses common 64-byte stuffing machine and finalization
code (that' what I referred to when I said that it was recently improved).
sha512 uses widened copies of the same algorithms.
--
vda
Rob Landley
2010-10-27 03:14:08 UTC
Permalink
Post by Denys Vlasenko
Post by Rob Landley
Post by Denys Vlasenko
The only piece of software of the 1024 CPU machine which
_has to be_ threaded is the kernel, everything else is easier to
parallelize on a task basis: don't waste time degeloping, say, 1024-CPU
parallelized gzip - instead, run thousands of gzip copies!
Which then wind up talking to each other through pipes or fifos or
sockets, and you have a scalability bottleneck in a select statement
sending the data back. Less so now with the new pipe infrastructure, but
still, you have to copy data between process contexts because fiddling
with page tables for non-persistent shared memory is _more_ expensive
than copying, and it's a cache flush either way...
No, not _those_ tasks. Bigger tasks. When you run gzip, you run it as a
part of a larger "task to accomplish something". E.g. maybe you are
processing Mars Reconnaissance Orbiter photos, and archiving step in
pipeline compresses them.
You mean the way the aboriginal build scripts often take a FORK=1 environment
variable that will, for example, make ./download.sh download and extract each
tarball in parallel?

There are a bunch of ways to achieve parallelism. Threading is one of them.
Threading doesn't work on beowulf clusters. But some programs really don't
break down well to beowulf clusters...
Post by Denys Vlasenko
Then, do not bother creating insanely parallelized gzip (and insanely
parallelized image analysis software, and insanely parallelized
database...); instead, process in parallel *insane number of photos* using
run of the mill, simple, *single-threaded* tools.
If you have a task that breaks down that way, sure.
Post by Denys Vlasenko
This may even be faster, since you do not need to bother with locks,
cacheline bouncing and such.
If you have a task that breaks down that way. Not all tasks do.
Post by Denys Vlasenko
But more importantly, you have so much less complexity and
fewer bugs to fix!
If you get to pick which problems you want to solve, you can avoid the messy
ones.
Post by Denys Vlasenko
(If you do not have insanely many photos to process, but just a few,
in most cases today's CPUs are fast enough to not optimize for this case.)
MPEG video compresison in realtime. Each frame is a delta from the previous
frame, _after_ any changes to the data due to "lossy" compression (which can't
be allowed to accumulate or the image quality goes into the toilet extremely
rapidly and you have to re-keyframe multiple times per second.)

A lot of signal processing issues are like that. You can chop the signal at
each keyframe and distribute across a cluster that way, but if your keyframes
are every 2 seconds then you guarantee that much latency, which sucks for
videoconferencing and other types of live broadcasts.

That's one program domain out of hundreds for which "redefine the problem into
something I feel like solving" turns out to be hard.
Post by Denys Vlasenko
Post by Rob Landley
Threads are a tool, just like object orientation. There are times when
it's appropriate and very helpful, and times when shoehorning it onto a
problem makes things worse.
Right. Use them only when you must.
No, use them when they're appropriate.

There's never a time when you _must_ use object orientation, and yet the linux
kernel's VFS layer is object oriented, with each individual filesystem being a
subclass of the VFS. (They did it in C rather than in C++ but the principle
is there and they happily admit it if you ask 'em.)

Same with threads, lots of times you can do a horrible non-threaded
implementation or a less horrible threaded implementation. I've seen cases
where a non-threaded implementation of something not only scaled less well
then the threaded one, but was _more_ brittle, hard to follow, and prone to
deadlocks.
Post by Denys Vlasenko
Post by Rob Landley
Post by Denys Vlasenko
Readability. !strcmp() reads as "not strcmp" - ?
In the shell, 0 is success and nonzero is failure. In C, 0 is failure
and nonzero is success. In strcmp() there's greater than, less than, and
zero. Anybody who can't keep all these straight is going to have to be
very good at writing test suites.
Post by Denys Vlasenko
strcmp() != 0 reads as "not equal" - much closer to what it actually do.
The linux kernel style guys actually edit out those kind of useless
appendanges as part of their style checking during code review. So
you're adding stuff to make the code look less like the kernel does.
Do you similar bloat if (x) to say "if (x != FALSE)"? I see that kind of
thing in people who are new to C, but not much in people who've been
doing it for a while.
I think that even though we do understand what "if (strcmp(...))" mean,
it doesn't follow that this is the best style. With a style which
encourages use of == 0 or != 0 with strcmp, I started making less
bugs with inverted logic in string compares...
*shrug* It's your baby now. I'm the other way, myself.
Post by Denys Vlasenko
if ([!]x) is definitely ok for bool and pointers. For ints,
it is sometimes good to write if (x != 0), especially if the code
aroud that place is complex-ish and it's hard to figure out
the type of x. Not a hard rule.
*blink* *blink*

You see a significant difference between scalar and pointer types?

It's X bytes of information living on either the stack or the heap, which is a
question of which base register it's offset against. If it's a struct, a
second offset gets applied. If it's an array, an offset is calculated and
applied...

An undimensioned array of pointers to pointers to a function pointer is
basically a const long, except in how it's used. But you could take the long
and typecast it.

A pointer is a scalar type. You can do math on the suckers (admittedly in
base sizeof(*)).

char *fred="123";

printf("%s", 2+fred);

Prints 3. Nothing special about "fred+2" or "&fred[2]"...
Post by Denys Vlasenko
Post by Rob Landley
There are some studies out there that say there's an optimal module size,
above which defect density increases because you can't keep all the code
in your head at once,
Of cource, no one can keep infinite amount of structure in the head.
Readability simply moves that limit a bit farther, by making every
individual sub-fragment of code less taxing on the brain.
It's only taxing on the brain if you don't do a lot of C coding. If you do,
it's more taxing to see the more verbose (and less common) ways of phrasing
the exact same thing. Worse, you start glossing over the verbosity and miss
when they change things in the useless bits (sometimes via typo).

Personally I find:

if (strcmp(walrus) == 0);
{
thingy()
}

Harder to spot than:

if (!strcmp(walrus));
thingy();

I've seldom found adding extra characters helps me parse code. But then I
didn't even put lot of spaces in each line until Erik complained. (And that
wasn't because I thought it was better, that was just for consistency.)
Post by Denys Vlasenko
Post by Rob Landley
Even comments need to pull their own weight, there are times when
deleting unnecessary comments
Yes. Good comments are an art (and quite distinct from "verbose comments").
Post by Rob Landley
I vaguely wonder if there should be a place we list busybox features that
other immplementations don't have. For example, busybox mount never
needs to say "-o loop", because you can trivially autodetect when you're
being asked to mount a file on a directory. (And directory on directory
or file on file are bind mounts, although I don't remember if I made it
autodetect that.
IIRC no, "file on file" bind mounts aren't working.
I've used them in aboriginal linux, which is using busybox 1.17.2 defconfig, so
I know if you specify --bind it works. Sounds like I never got around to
autodetecting the "source and target are both files" or "source and target are
both directories" cases. Fairly traightforward to do. Probably I should add
a config entry for all three loopback/bind autodetection cases. That acts as a
bit of documentation that the feature exists, too...

And while we're at it, I should add the magic probes for "tar xvf
thing.tar.gz" and such, since there are easily identifiable headers for gz,
bz2, and uncompressed file formats all in the first couple dozen bytes. (The
lack of any lzma identifying magic is lzma's problem, but we can always fall
back to attempting that one for unknown types...)

But again, these days my open source work gets crammed into the snippets of
time when I'm too tired and unfocused to do much else, which isn't exactly
ideal. :P

Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
Gilles Espinasse
2010-10-27 06:25:34 UTC
Permalink
----- Original Message -----
From: "Rob Landley" <rob at landley.net>
To: "Denys Vlasenko" <vda.linux at googlemail.com>
Cc: <busybox at busybox.net>
Sent: Wednesday, October 27, 2010 5:14 AM
Subject: Re: What on earth happened to platform.h?
Post by Rob Landley
And while we're at it, I should add the magic probes for "tar xvf
thing.tar.gz" and such, since there are easily identifiable headers for gz,
bz2, and uncompressed file formats all in the first couple dozen bytes.
(The
Post by Rob Landley
lack of any lzma identifying magic is lzma's problem, but we can always fall
back to attempting that one for unknown types...)
xz-5.0.0 has been released a few days ago, and have it's own magic so lzma
without magic is really the past. Gnu tar know those magic
{ ct_tar },
{ ct_none, },
{ ct_compress, 2, "\037\235", COMPRESS_PROGRAM, "-Z" },
{ ct_gzip, 2, "\037\213", GZIP_PROGRAM, "-z" },
{ ct_bzip2, 3, "BZh", BZIP2_PROGRAM, "-j" },
{ ct_lzip, 4, "LZIP", LZIP_PROGRAM, "--lzip" },
{ ct_lzma, 6, "\xFFLZMA", LZMA_PROGRAM, "--lzma" },
{ ct_lzop, 4, "\211LZO", LZOP_PROGRAM, "--lzop" },
{ ct_xz, 6, "\xFD" "7zXZ", XZ_PROGRAM, "-J" },

Gilles
Denys Vlasenko
2010-10-28 04:21:28 UTC
Permalink
Post by Gilles Espinasse
----- Original Message -----
From: "Rob Landley" <rob at landley.net>
To: "Denys Vlasenko" <vda.linux at googlemail.com>
Cc: <busybox at busybox.net>
Sent: Wednesday, October 27, 2010 5:14 AM
Subject: Re: What on earth happened to platform.h?
Post by Rob Landley
And while we're at it, I should add the magic probes for "tar xvf
thing.tar.gz" and such, since there are easily identifiable headers for
gz,
Post by Rob Landley
bz2, and uncompressed file formats all in the first couple dozen bytes.
(The
Post by Rob Landley
lack of any lzma identifying magic is lzma's problem, but we can always
fall
Post by Rob Landley
back to attempting that one for unknown types...)
We do it already for COMPRESS, GZIP, BZIP2 and XZ by looking at magic,
and also detect .lzma by file extension.

We do it not only in tar, but in man, modprobe, loadfont etc...

Actually, tar's way of doing it is a bit rusty,
we can do it better now. get_header_tar_gz() and friends
can be eliminated...
--
vda
Rob Landley
2010-10-30 10:02:51 UTC
Permalink
Post by Denys Vlasenko
We do it already for COMPRESS, GZIP, BZIP2 and XZ by looking at magic,
and also detect .lzma by file extension.
We do it not only in tar, but in man, modprobe, loadfont etc...
Actually, tar's way of doing it is a bit rusty,
we can do it better now. get_header_tar_gz() and friends
can be eliminated...
Cool, I am behind the times and can eliminate a todo list item.

One down, zeveral zillion to go...

Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
Denys Vlasenko
2010-10-27 23:36:16 UTC
Permalink
Post by Rob Landley
Post by Denys Vlasenko
Post by Rob Landley
Post by Denys Vlasenko
The only piece of software of the 1024 CPU machine which
_has to be_ threaded is the kernel, everything else is easier to
parallelize on a task basis: don't waste time degeloping, say, 1024-CPU
parallelized gzip - instead, run thousands of gzip copies!
Which then wind up talking to each other through pipes or fifos or
sockets, and you have a scalability bottleneck in a select statement
sending the data back. Less so now with the new pipe infrastructure, but
still, you have to copy data between process contexts because fiddling
with page tables for non-persistent shared memory is _more_ expensive
than copying, and it's a cache flush either way...
No, not _those_ tasks. Bigger tasks. When you run gzip, you run it as a
part of a larger "task to accomplish something". E.g. maybe you are
processing Mars Reconnaissance Orbiter photos, and archiving step in
pipeline compresses them.
You mean the way the aboriginal build scripts often take a FORK=1 environment
variable that will, for example, make ./download.sh download and extract each
tarball in parallel?
There are a bunch of ways to achieve parallelism. Threading is one of them.
Threading doesn't work on beowulf clusters. But some programs really don't
break down well to beowulf clusters...
Post by Denys Vlasenko
Then, do not bother creating insanely parallelized gzip (and insanely
parallelized image analysis software, and insanely parallelized
database...); instead, process in parallel *insane number of photos* using
run of the mill, simple, *single-threaded* tools.
If you have a task that breaks down that way, sure.
Google did not have a choice. They *had to* scale everything they do,
scale into tens of thousands of subtasks, and they could not affort
waiting thirty years while gzip and gazillion other tools got
parallelized to death. They succeeded, using the above approach.
Post by Rob Landley
Post by Denys Vlasenko
This may even be faster, since you do not need to bother with locks,
cacheline bouncing and such.
If you have a task that breaks down that way. Not all tasks do.
Post by Denys Vlasenko
But more importantly, you have so much less complexity and
fewer bugs to fix!
If you get to pick which problems you want to solve, you can avoid the messy
ones.
Post by Denys Vlasenko
(If you do not have insanely many photos to process, but just a few,
in most cases today's CPUs are fast enough to not optimize for this case.)
MPEG video compresison in realtime. Each frame is a delta from the previous
frame, _after_ any changes to the data due to "lossy" compression (which can't
be allowed to accumulate or the image quality goes into the toilet extremely
rapidly and you have to re-keyframe multiple times per second.)
A lot of signal processing issues are like that. You can chop the signal at
each keyframe and distribute across a cluster that way, but if your keyframes
are every 2 seconds then you guarantee that much latency, which sucks for
videoconferencing and other types of live broadcasts.
That's one program domain out of hundreds for which "redefine the problem into
something I feel like solving" turns out to be hard.
Sure, there _are_ programs which must be threaded. Linux kernel
is another example.
Post by Rob Landley
Post by Denys Vlasenko
if ([!]x) is definitely ok for bool and pointers. For ints,
it is sometimes good to write if (x != 0), especially if the code
aroud that place is complex-ish and it's hard to figure out
the type of x. Not a hard rule.
*blink* *blink*
My failure in English. I meant "It's not a hard rule, though", as in
"not a must". But I suspect it come through as "easy to follow rule".
Post by Rob Landley
You see a significant difference between scalar and pointer types?
This is not the point. The point is:

...
...
...
const re_dfa_t *const dfa = mctx->dfa;
reg_errcode_t err;
int match = 0;
int match_last = -1;
int cur_str_idx = re_string_cur_idx (&mctx->input);
re_dfastate_t *cur_state;
int at_init_state = p_match_first != NULL;
int next_start_idx = cur_str_idx;

err = REG_NOERROR;
cur_state = acquire_init_state_context (&err, mctx, cur_str_idx);
/* An initial state must not be NULL (invalid). */
if (BE (cur_state == NULL, 0))
{
assert (err == REG_ESPACE);
return -2;
}

if (mctx->state_log != NULL)
{
mctx->state_log[cur_str_idx] = cur_state;

/* Check OP_OPEN_SUBEXP in the initial state in case that we use them
later. E.g. Processing back references. */
if (BE (dfa->nbackref, 0))
{
at_init_state = 0;
err = check_subexp_matching_top (mctx, &cur_state->nodes, 0);
if (BE (err != REG_NOERROR, 0))
return err;

if (cur_state->has_backref)
{
err = transit_state_bkref (mctx, &cur_state->nodes);
if (BE (err != REG_NOERROR, 0))
return err;
}
}
}

/* If the RE accepts NULL string. */
if (BE (cur_state->halt, 0))
{
if (!cur_state->has_constraint
|| check_halt_state_context (mctx, cur_state, cur_str_idx))
{
if (!fl_longest_match)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return cur_str_idx;
else
{
match_last = cur_str_idx;
match = 1;
}
}
}

Rob, can you figure out whether fl_longest_match is a number? or bool?
or maybe a string (that is, char*) in the underlined if() statement?

The problem is that in code like this (and this is not the worst example -
it's only what I found with quick grep in uclibc) you need to grep
for the declaration, it's not easily available. A hint on the type
may be useful.
Post by Rob Landley
A pointer is a scalar type. You can do math on the suckers (admittedly in
base sizeof(*)).
char *fred="123";
printf("%s", 2+fred);
Prints 3. Nothing special about "fred+2" or "&fred[2]"...
You think 2+"abc" is a weird expression? How about this? -

#include <stdio.h>
int main() { puts(&3["qwerty"]); return 0; }

Hehe ]:)
Post by Rob Landley
if (strcmp(walrus) == 0);
{
thingy()
}
if (!strcmp(walrus));
thingy();
Me too, but because of {}s, not because of == 0.
Post by Rob Landley
I've seldom found adding extra characters helps me parse code. But then I
didn't even put lot of spaces in each line until Erik complained. (And that
wasn't because I thought it was better, that was just for consistency.)
Iwasusingjammedtogether for(i=0;i<nregs;++i) style for quite some time.
But I eventually changed my mind.
--
vda
Rob Landley
2010-10-28 03:55:44 UTC
Permalink
Post by Denys Vlasenko
Post by Rob Landley
Post by Denys Vlasenko
Then, do not bother creating insanely parallelized gzip (and insanely
parallelized image analysis software, and insanely parallelized
database...); instead, process in parallel *insane number of photos*
using run of the mill, simple, *single-threaded* tools.
If you have a task that breaks down that way, sure.
Google did not have a choice. They *had to* scale everything they do,
scale into tens of thousands of subtasks, and they could not affort
waiting thirty years while gzip and gazillion other tools got
parallelized to death. They succeeded, using the above approach.
Yes, and most web transactions are essentially read-only batch processing, and
thus extremely amenable to parallelize via cluster, as a category. (I had a
class on how to schedule that kind of stuff at Rutgers in 1993, research on
that class of problems dates back to the 50's.)
Post by Denys Vlasenko
Post by Rob Landley
A lot of signal processing issues are like that. You can chop the signal
at each keyframe and distribute across a cluster that way, but if your
keyframes are every 2 seconds then you guarantee that much latency, which
sucks for videoconferencing and other types of live broadcasts.
That's one program domain out of hundreds for which "redefine the problem
into something I feel like solving" turns out to be hard.
Sure, there _are_ programs which must be threaded. Linux kernel
is another example.
There's nothing that _must_ be threaded. There are things that benefit from
threading.

Heck, you can set up shared memory via mmap() of a file (then delete the file to
tell the OS to stop updating the on-disk copy), implement mutex and event
semaphores via signal files or pipes (the O_CREAT|O_EXCL trick is mutex in a
nutshell, and filesystem timeout give you timeout behavior, we're using this in
the passwd command I believe. And the dentry cache means it never _has_ to
touch disk), and do pretty much everything threading can without relying on
pthread primitives.

It's just not necessarily an improvement. :)
Post by Denys Vlasenko
Post by Rob Landley
Post by Denys Vlasenko
if ([!]x) is definitely ok for bool and pointers. For ints,
it is sometimes good to write if (x != 0), especially if the code
aroud that place is complex-ish and it's hard to figure out
the type of x. Not a hard rule.
*blink* *blink*
My failure in English. I meant "It's not a hard rule, though", as in
"not a must". But I suspect it come through as "easy to follow rule".
Post by Rob Landley
You see a significant difference between scalar and pointer types?
...
...
...
const re_dfa_t *const dfa = mctx->dfa;
reg_errcode_t err;
int match = 0;
int match_last = -1;
int cur_str_idx = re_string_cur_idx (&mctx->input);
re_dfastate_t *cur_state;
int at_init_state = p_match_first != NULL;
int next_start_idx = cur_str_idx;
err = REG_NOERROR;
cur_state = acquire_init_state_context (&err, mctx, cur_str_idx);
/* An initial state must not be NULL (invalid). */
if (BE (cur_state == NULL, 0))
{
assert (err == REG_ESPACE);
return -2;
}
if (mctx->state_log != NULL)
{
mctx->state_log[cur_str_idx] = cur_state;
/* Check OP_OPEN_SUBEXP in the initial state in case that we use them
later. E.g. Processing back references. */
if (BE (dfa->nbackref, 0))
{
at_init_state = 0;
err = check_subexp_matching_top (mctx, &cur_state->nodes, 0);
if (BE (err != REG_NOERROR, 0))
return err;
if (cur_state->has_backref)
{
err = transit_state_bkref (mctx, &cur_state->nodes);
if (BE (err != REG_NOERROR, 0))
return err;
}
}
}
/* If the RE accepts NULL string. */
if (BE (cur_state->halt, 0))
{
if (!cur_state->has_constraint
|| check_halt_state_context (mctx, cur_state, cur_str_idx))
{
if (!fl_longest_match)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return cur_str_idx;
else
{
match_last = cur_str_idx;
match = 1;
}
}
}
Rob, can you figure out whether fl_longest_match is a number? or bool?
or maybe a string (that is, char*) in the underlined if() statement?
A) I do python development, which means I've learned when I don't have to
_care_. :)

B) I stopped reading at about REG_NOERROR, with the mental note "there is no
way turning a !x into an X == 0 will make any significant difference to the
readability of this code.

C) ==0 doesn't tell you if it's a char, short, int, long, signed, unsigned,
and doesn't rule _out_ it being a pointer.
Post by Denys Vlasenko
The problem is that in code like this (and this is not the worst example -
it's only what I found with quick grep in uclibc) you need to grep
for the declaration, it's not easily available. A hint on the type
may be useful.
x==0 isn't any more of a hint on the type than !x is. Both imply "probably
not floating point, and not a struct". That's about it.

Code that is unclear can be fixed or commented. This trick doesn't strike me
as either.
Post by Denys Vlasenko
Post by Rob Landley
A pointer is a scalar type. You can do math on the suckers (admittedly
in base sizeof(*)).
char *fred="123";
printf("%s", 2+fred);
Prints 3. Nothing special about "fred+2" or "&fred[2]"...
You think 2+"abc" is a weird expression?
Not really, no.
Post by Denys Vlasenko
How about this? -
#include <stdio.h>
int main() { puts(&3["qwerty"]); return 0; }
Hehe ]:)
I'e read the first several years of obfuscated C code contest entries, thanks.

That's actually how I first got involved with Tinycc:

http://bellard.org/otcc/

The contest had a size limit of 2048 bytes for entries, and Fabrice Bellard
submitted a compiler for a subset of C, capable of rebuilding itself from
source code.

He won "best abuse of the rules", and then untangled his entry and expanded it
into a full c99 compiler.

Its main advantage was speed: it was so fast he added a "-run" mode that let
you set the executable bit on a C file, start it with "#!/usr/bin/tcc -run"
(plus -lpthreads or whatever else you needed), and use C as a scripting
language. Yes, with X11 and everything, if you liked.

The main disadvantage was that it produced x86 machine code directly so was
hard to retarget for different processors (or even x86-64), and while
refactoring it to have a front end and back end he could decouple from each
other (and thus have multiple code generator back ends for various
sarchitectures), he started to wonder about what kind of _front_ ends he could
swap out with. Specifically, he wanted something that would parse x86 machine
code a page at a time, translate it to native code on the fly, and run x86
binaries (specifically wine) on non-x86 platforms.

The new project this led to was called "qemu", and it sucked away so much of
his time tinycc stalled. It's been revived by windows guys, but all they do
is windows development and they don't really care about linux or non-x86
targets.
Post by Denys Vlasenko
Post by Rob Landley
if (strcmp(walrus) == 0);
{
thingy()
}
if (!strcmp(walrus));
thingy();
Me too, but because of {}s, not because of == 0.
I.E. the effect of this "clarification" on code readability is negligible at
best.
Post by Denys Vlasenko
Post by Rob Landley
I've seldom found adding extra characters helps me parse code. But then
I didn't even put lot of spaces in each line until Erik complained. (And
that wasn't because I thought it was better, that was just for
consistency.)
Iwasusingjammedtogether for(i=0;i<nregs;++i) style for quite some time.
But I eventually changed my mind.
I find that the easiest code to read is _what_you_are_used_to_. (That's the
dominant factor by an order of magnitude, and the reason coding style
documents for projects exist.)

The next easiest to read is "the most code fits on the screen", I.E. doesn't
wrap off the right edge or require you to scroll up and down to see functional
units.

Adding whitespace can help delineate functional units, both size of
indendation (two space, 4 space, 8 space) to make block stand out, and blank
lines so you can see start-of-next-thought ("ok, we stopped doing the previous
thing and started doing something else"), which lets you easily pick out
starting points of stuff when hunting around in the code. It's essentially
paragraph breaks so you can find your place in the wall of text. Curly brackets
on their own line do just as well but there aren't enough of those kind of
breaks for complicated linear code.

I prefer "!x" to "x==0" because it takes up less horizontal space (less eye
movement, less likelihood of wrapping off the edge of the screen), because the
! always has to go at the beginning and you can have x == 0 and 0 == x as
synonyms, because "x=0" means something different but looks the same and is
easy to miss (hence some gurus recommend "4 == x" programming style with the
constant on the left so you CAN'T typo up an assignment, but algebra leads
math people to phrase it the other way, so you see both in the wild), because
"x == 0" and "x == NULL" and "x == EXIT_SUCCESS" and "x == STDIN_FILENO" and
so on mean exactly the same thing but look gratuitously different...

But mostly, I'll admit, it's because it's what I'm used to. :)

Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
Denys Vlasenko
2010-10-28 05:04:16 UTC
Permalink
Post by Rob Landley
Post by Denys Vlasenko
Post by Rob Landley
You see a significant difference between scalar and pointer types?
...
...
...
const re_dfa_t *const dfa = mctx->dfa;
reg_errcode_t err;
int match = 0;
int match_last = -1;
int cur_str_idx = re_string_cur_idx (&mctx->input);
re_dfastate_t *cur_state;
int at_init_state = p_match_first != NULL;
int next_start_idx = cur_str_idx;
err = REG_NOERROR;
cur_state = acquire_init_state_context (&err, mctx, cur_str_idx);
/* An initial state must not be NULL (invalid). */
if (BE (cur_state == NULL, 0))
{
assert (err == REG_ESPACE);
return -2;
}
if (mctx->state_log != NULL)
{
mctx->state_log[cur_str_idx] = cur_state;
/* Check OP_OPEN_SUBEXP in the initial state in case that we use them
later. E.g. Processing back references. */
if (BE (dfa->nbackref, 0))
{
at_init_state = 0;
err = check_subexp_matching_top (mctx, &cur_state->nodes, 0);
if (BE (err != REG_NOERROR, 0))
return err;
if (cur_state->has_backref)
{
err = transit_state_bkref (mctx, &cur_state->nodes);
if (BE (err != REG_NOERROR, 0))
return err;
}
}
}
/* If the RE accepts NULL string. */
if (BE (cur_state->halt, 0))
{
if (!cur_state->has_constraint
|| check_halt_state_context (mctx, cur_state, cur_str_idx))
{
if (!fl_longest_match)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return cur_str_idx;
else
{
match_last = cur_str_idx;
match = 1;
}
}
}
Rob, can you figure out whether fl_longest_match is a number? or bool?
or maybe a string (that is, char*) in the underlined if() statement?
A) I do python development, which means I've learned when I don't have to
_care_. :)
B) I stopped reading at about REG_NOERROR, with the mental note "there is no
way turning a !x into an X == 0 will make any significant difference to the
readability of this code.
C) ==0 doesn't tell you if it's a char, short, int, long, signed, unsigned,
and doesn't rule _out_ it being a pointer.
Strictly speaking, yes, it doesn't. If the code is written by
people enchanted with the strange logic of C/C++ that
"0 constant can be a null pointer of any type", then yes,
x == 0 can actully mean "x is a null pointer".

Luckily, I know (as opposed to "I think") that this is stupid,
because Linus also says so.

Back to my interpretation of x == 0 versus x == NULL versus !x.

Assuming I read above mess for a first time, hunting down a bug,
and don't understand the logic yet,
when I see "!fl_longest_match" I think "it's likely a pointer
or a bool, and judging by its name it's a bool",
whereas when I see "fl_longest_match != 0" I think "looks like it's
and integer - perhaps a length of the longest match seen so far".

This helps me to understand the logic faster
than if I'd need to go and find its definition.
The width and signedness of integer is less important
for figuring out the logic that the fact that it is an *integer*.

The snippet above has an example where it _has_ such
more readable comparison: "if (mctx->state_log != NULL)" -
immediately tells you that state_log is a pointer.
(By name alone it could be a bool too)
However, here it isn't useful, since the very next line
uses mctx->state_log[x] and thus reveals "pointer-ness" anyway.
Post by Rob Landley
http://bellard.org/otcc/
The contest had a size limit of 2048 bytes for entries, and Fabrice Bellard
submitted a compiler for a subset of C, capable of rebuilding itself from
source code.
He won "best abuse of the rules", and then untangled his entry and expanded it
into a full c99 compiler.
Its main advantage was speed: it was so fast he added a "-run" mode that let
you set the executable bit on a C file, start it with "#!/usr/bin/tcc -run"
(plus -lpthreads or whatever else you needed), and use C as a scripting
language. Yes, with X11 and everything, if you liked.
The main disadvantage was that it produced x86 machine code directly so was
hard to retarget for different processors (or even x86-64), and while
refactoring it to have a front end and back end he could decouple from each
other (and thus have multiple code generator back ends for various
sarchitectures), he started to wonder about what kind of _front_ ends he could
swap out with. Specifically, he wanted something that would parse x86 machine
code a page at a time, translate it to native code on the fly, and run x86
binaries (specifically wine) on non-x86 platforms.
The new project this led to was called "qemu", and it sucked away so much of
his time tinycc stalled. It's been revived by windows guys, but all they do
is windows development and they don't really care about linux or non-x86
targets.
Pity, eh? Where is the cloning machine when you need it...
Post by Rob Landley
I prefer "!x" to "x==0" because it takes up less horizontal space (less eye
movement, less likelihood of wrapping off the edge of the screen), because the
! always has to go at the beginning and you can have x == 0 and 0 == x as
synonyms, because "x=0" means something different but looks the same and is
easy to miss (hence some gurus recommend "4 == x" programming style with the
constant on the left so you CAN'T typo up an assignment, but algebra leads
math people to phrase it the other way, so you see both in the wild), because
"x == 0" and "x == NULL" and "x == EXIT_SUCCESS" and "x == STDIN_FILENO" and
so on mean exactly the same thing but look gratuitously different...
I see your point.
--
vda
Denys Vlasenko
2010-10-24 13:19:58 UTC
Permalink
Post by Rob Landley
Attached is the sha1sum.c I wrote for toybox a few years back. (Never did
quite get around to cleaning it up, extending it to support the various other
shaXXXsum sizes, or implementing md5sum. My todo list runneth over.) But
what I was going for was _simple_, and I confirmed it produced the right output
on all the data I threw at it. It's 185 lines including the actual applet
implementation and help text and everything. The busybox one is 900 lines for
the engine.
Patch: -19 bytes on x86, sums 300 mbytes in 4 seconds instead of 5.4:
--
vda

--- sha1sum.c.orig
+++ sha1sum.c
@@ -58,34 +58,33 @@
{
int i, j, k, count;
uint32_t *block = this->buffer.i;
- uint32_t *rot[5], *temp;
+ uint32_t rot[5], temp;

// Copy context->state[] to working vars
for (i=0; i<5; i++) {
- this->oldstate[i] = this->state[i];
- rot[i] = this->state + i;
+ rot[i] = this->state[i];
}
// 4 rounds of 20 operations each.
for (i=count=0; i<4; i++) {
for (j=0; j<20; j++) {
uint32_t work;

- work = *rot[2] ^ *rot[3];
- if (!i) work = (work & *rot[1]) ^ *rot[3];
+ work = rot[2] ^ rot[3];
+ if (!i) work = (work & rot[1]) ^ rot[3];
else {
if (i==2)
- work = ((*rot[1]|*rot[2])&*rot[3])|(*rot[1]&*rot[2]);
- else work ^= *rot[1];
+ work = ((rot[1]|rot[2])&rot[3])|(rot[1]&rot[2]);
+ else work ^= rot[1];
}
if (!i && j<16) work += blk0(count);
else work += blk(count);
- *rot[4] += work + rol(*rot[0],5) + rconsts[i];
- *rot[1] = rol(*rot[1],30);
+ rot[4] += work + rol(rot[0],5) + rconsts[i];
+ rot[1] = rol(rot[1],30);

// Rotate by one for next time.
temp = rot[4];
for (k=4; k; k--) rot[k] = rot[k-1];
- *rot = temp;
+ rot[0] = temp;
count++;
}
}
Rob Landley
2010-10-26 17:47:01 UTC
Permalink
Post by Rob Landley
Attached is the sha1sum.c I wrote for toybox a few years back. (Never
did quite get around to cleaning it up, extending it to support the
various other shaXXXsum sizes, or implementing md5sum. My todo list
runneth over.) But what I was going for was _simple_, and I confirmed it
produced the right output on all the data I threw at it. It's 185 lines
including the actual applet implementation and help text and everything.
The busybox one is 900 lines for the engine.
I'm reminded of the time Manuel Nova immediately sped my bunzip2
impelementation up by something like 20%. :)

I've noticed a theme, that any time I get what I consider a nice simple
implementation of something posted, other people immediately improve on it in
size and speed terms.

Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
Continue reading on narkive:
Loading...