This is my Title

20100515

Fun with gst-plugins

To anyone trying to figure out how to play .m4a files with gstreamer, try installing gst-plugins-faad. At least, 50 plugins later, I think that was the one that finally fixed it...

20100506

Since we have entered the "Propaganda Age," there seems to be an increasing distrust of published statements and experts in general, and statistics specifically. I think that this is deplorable, so I would like to go on record objecting to this trend. It is true that there are many dishonest ways to gather information and misleading ways to present it, however that is only half the story. For every deceptive presentation of data, there is an honest interpretation that can refute it once all the facts are known. If you accept the existence of objective reality, and the consistency of rational thought, anyway (so my arguments won't reach the true laissez-faire existentialists, but what can?). If you accept objective reality, there is one set of data, and if you have enough knowledge you can interpret it, or at least be swayed by someone who presents a superior interpretation. So what you should do is verify statistics, not simply doubt all of them. Check the sources, find credible interpretations, and make yourself smarter if needs be (it can't hurt).

Because in all honesty, as much as the producers of propaganda like to slant statistics, they would love it if everyone stopped believing in them altogether. Once you give up your right to rational thought and argument, all you have left are your innate emotional prejudices and opinions, which are way easier to manipulate (research autobiographical advertising and cognitive dissonance for starters).

So remember to Think!

20100505

Adventures in buffer underflow...

I wanted to play around with encrypting my swap partition recently, and was impressed by how easy it was in Gentoo.

I add 2 lines to /etc/conf.d/dmcrypt
swap=crypt-swap
source='/dev/sda2'

and change my fstab from /dev/sda2 to /dev/mapper/crypt-swap and I was good to go. The Gentoo dmcrypt init scripts automatically generate a random key, and format the partition during boot.

Then the fun started... I just wanted to make sure it was working, but the only way to do that is to use up enough memory that my system starts swapping. Not such an easy task when you have 8 GB of ram. After a few failed attempts, (even Firefox doesn't use that much memory), I figured that decompressed image files use lots of memory, and decided to open an entire pictures folder with GIMP. A few hundred windows later, I managed to crash my window server, and still not use enough memory. Restart X. Now I was annoyed and determined. I'm a programmer, I figure I know how to use up memory. So I wrote a perl script that looks like this:

#!/usr/bin/perl
@temp = <STDIN>;

Let's just load entire files into RAM. Seems simple enough. But I needed some big files. The biggest thing I could find on hand was a 2 GB avi video. So I cat that into my script, and it doesn't use up memory fast enough. So I decided to run 3 of them simultaneously. That's gotta chew up some serious ram, right? Well, in your memory display, you've got all these fields, like total, used, free, cached, and then you've got this little neglected field called "buffers." I've never really paid attention to it before, but apparently, when you run out of it, you have a kernel panic. (At least I think you do. As much flack as I give Windows for BSODing, at least they reliably display an error message when they crash. If you kernel panic in X, everything just stops...) The last thing I saw before the screen stopped updating was top displaying my buffer memory as 128k.

So here's what I think happened in retrospect. Perl is a text processing language. So when I read all of STDIN into an array, it's reading things one line at a time. Makes sense, right? But when I decided to send an avi file down the pipe, I'm going to assume that avi files don't have line-ends very often. So somewhere in my I/O chain, something was buffering gigantic amounts of data looking for a line-end. Either that, or most of the file got memory mapped, (multiple times?) and it uses buffer space. I don't really know what layer actually uses that memory, but apparently it's important.

So, I think the crypto stuff worked fine, but I couldn't verify it, so I just gave up. If someone actually has access to my swap partition I'm pretty much pwned anyway. It's not worth the instability.

So the moral of the story? Don't do stupid things on a massive scale.

Unix-isms

I recently read "The UNIX-HATERS Handbook" (which I highly recommend), and learned about a bunch of the deficiencies of Unix. To be fair, I figured I'd check out the other side, so I found a copy of the "Unix (R) System V Release 4.2 Documentation" in my local library. The following is taken specifically the "Programming with Unix System Calls" volume.

A few gems:

The difference between /sbin and /usr/sbin

"The /sbin directory contains executables used in the booting process and in manual recovery from a system failure."
"/usr/sbin: This directory contains executables used for system administration."
Reading between the lines here, I think /sbin used to be mounted on a separate partition, for when your file system or disk failed. There's actually a lot of weird directory structure Unix-isms that hark back to either the lack of stable file systems (journalizing was a revolution in disk stability), or the fact that Unix used to handle running out of disk space particularly poorly (and still does as far as I can tell). But it's still better than what it does when it runs out of memory. One of the great testaments to the infrastructure that is Unix, is that killing arbitrary processes was the most stable option they had when running out of memory. But I digress...

/var is the directory for "files and directories that vary from machine to machine."

I used to think Unix was a fairly secure system until I read the UNIX-HATERS Handbook, (granted, the adoption of ACLs and the invention of SELinux have improved things slightly). I guess back in the day (you know, when Unix was being invented, back in the '70s and '80s) the big security paradigm was "Trusted Computing." The idea being, you only gave privilege to a set of "trusted software" which was thoroughly audited. This sounds great, assuming you can actually perform these kinds of audits reliably. And it certainly can't hurt to try to enumerate what you trust and to what extent. But, even assuming a lack of malice in your trusted set, one bug in something privileged is enough to do you in. Especially in Unix, where there is "one root to rule them all," and anything installed with setuid root has the potential to open a root shell and do whatever it wants. So it is telling that the entire Appendix devoted to security is called "Guidelines for Writing Trusted Software," which I can summarize as "Don't give things root access." A neat trick from the UNIX-HATERS Handbook is to mount a floppy with a setuid binary...

These guys seemed to worship I/O Streams. Granted, they were new, and kinda neat back then, but come on. Once you move out of the realm of tape-drives and modems, things get kinda hairy even for things like terminals and, heaven forbid, files. I quote from the section on file access, "A file is an ordered set of bytes of data on a I/O-device. The size of the file on input is determined by an end-of-file condition dependent on device-specific characteristics. The size of a regular-file is determined by the position and number of bytes written on it, no predetermination of the size of a file is necessary or possible." Okay, I'll grant that its nice not to have to worry about file sizes, but making it impossible? Some input streams have finite length, and sometimes you'd like output streams to have a finite length. Being proud of the fact that you support neither, seems mildly crazy. They're also quite proud of the fact that in Unix, files are "nothing more than a stream of bytes" (paraphrased, I couldn't find the exact quote), and that Unix imposes no record structures. (Old school operating systems used to make you read or write to your file in X-byte chunks (records) only, because they were stored on disk this way.) Again, it's great that they have the capability for arbitrary file structures, but (and correct me if I'm ignorant), I thought that record-structures were imposed as an optimization, for when you were doing record-oriented I/O. Otherwise, the OS can split and fragment your file wherever it feels like, instead of at a record boundary. But maybe advances in file systems and disk-drives have rendered this point moot. I'm too ignorant to say.

Those were the highlights. It's amazing to think how far computers have come in 30 years.

20100426

Amarok wins again...

So apparently, all you have to do to horribly break amarok, is embed one of these little guys in a file name "�". If you can't see it, don't worry, it's some weird unicode character, byte value 0xc281 (UTF-8) better described here. Granted, it's a weird character to have in a filename, but hey, that's what robustness is all about. So, what does amarok do, when it encounters such a character?

Does it:
(a) Fail gracefully
(b) Fail silently and continue as if nothing happened
(c) Corrupt your database
(d) Wedge your CPU
(e) Spam you with error messages, and then hang
(f) All of the above (except a)

If you answered (f), you're right!!

So once again, I have managed to reset all of my song ratings and preference data. "Again?" I hear you cry. Well, when I upgraded from Amarok 1.4 to the "New and Improved*" Amarok 2.0 (*The developers admit, that while the UI was new and shiny, there are several major features that were present in 1.4, that were not yet written in Amarok 2.0. They weren't really ready for a release), it decided to store all of its configuration data in a new location ~/.kde3 instead of ~/.kde4. (Note: This may be Gentoo's fault.) So I lost my metatag database. "But," I hear you cry again, "there was an import tool!" Ah yes, but unfortunately, it was hidden deep inside the configuration menu, so I didn't find it until the week after I deleted my old database (after leaving it around for months).

Now, I use amarok, because quite honestly I couldn't find a better music player for linux (and I looked really hard). The feature that finally sold me on it was the ability to filter my playlist based on search terms. At least, that used to work... Somewhere in the process of upgrading from 2.0 to 2.3.0 it stopped working. Maybe it's time for me to look around again.... I would've kept my old version, but, Gentoo made it increasingly difficult to retain kde3, and I finally had to give it up.

Let me see if I can summarize some of it's wonderful features for you, and compare how it's improved. (Granted, I'm probably being harsher than usual, seeing as I'm in a bad mood, because it just trashed my database!)

Features:

Playlist Filtering!

Amarok 1.4: It worked!
Amarok 2.0: It worked! (although a search is accompanied by no less than a minute of the program stalling and using 100% of a CPU....I think it's sorting things?)
Amarok 2.3: It still stalls, but the playlist no longer filters. (at least on my Gentoo setup)

Song Queueing: Queue songs at the beginning of your random play list!

Amarok 1.4: It worked!
Amarok 2.x: If you're listening to a song, and you queue some more, it goes back and repeats whatever you were listening to before you queued anything... (I am assured that this is fixed in the development sources, but 2 gentoo releases later I have yet to see any results. Maybe Gentoo is behind...)

Queue Management!

Amarok 1.4: You had a nice window that displayed your queue. My favorite menu option was "Toggle Queued Status." If you selected things where some were already in your queue, and some weren't, you could "Toggle Queued Status" on them. No "Enqueue All" no "Dequeue All," just "Toggle Queued Status" ...
Amarok 2.x: The window has not yet been added.

Fast Startup!

Amarok 1.4: About a minute, CPU bound, on my 3.0 Ghz Phenom II
Amarok 2.0: Still a minute!
Amarok 2.3: Only a few seconds! ( I have to admit there was a definite improvement). It will still hang on major operations though (like, adding or removing single tracks from the play list... I think it's sorting things?)
This isn't really a fair comparison, apparently you're not supposed to use "large playlists" because they have "problems." I like to load my entire collection into my playlist so I can filter it, and pick random songs. They have this nifty little "dynamic playlist" that will randomly pick songs for you, but then there's no way to search your collection. I'm told it's very fast for small playlists...

XML Playlists!

Apparently, someone decided it'd be cool to dump your playlist information in XML format. Which sounds great, except it stores your current playlist this way too. Which sounds great, except it's loading and storing this from disk every time it modifies something. Which sounds great, until you realize how big XML is. And it saves backup copies. So every time I modify my playlist, it dumps a 5MB text file to the disk. I think marshalling and unmarshalling this might be the source of my 30-60 second pause when I add a track to my playlist. Or maybe it's sorting things.... (It's hard to tell. Updating a single track's metadata gets the same pause...) Then it rotates the 10 backup copies of the playlist in case I ever wanna "undo." I kid you not, though Amarok 2.x seems to have removed the backup copies... or maybe it just moved them somewhere weird. I had to special case this in my backup script, because amarok 1.4 was generating over 200 MB of updated files every time I ran rsync.

Magnatune Integration!

I'm gonna be honest, I have no idea what this is, (some kinda online store?), but they pay people to make it work. So I'm told it works very well.

Collection Management!

Amarok 1.4:They had some support for this in a side bar.
Amarok 2.x: Yeah, not so much.

Delete files from the disk!

I actually use this all the time. I downloaded most of ocremix at one point, and have been going through and deleting stuff as it comes up and sucks.
Amarok 1.4: It worked!
Amarok 2.0: It worked!
Amarok 2.3: You can no longer access this from your play list. You have to go find the track in their excuse of a Content Manager called "Media Sources", sorted by Genre.

Weighted, Randomized Playlists!

Amarok 1.4: It worked! And was easy to find.
Amarok 2.0: They decided that having a menu option was too confusing. Instead, it's hidden under a button at the bottom of the screen that took me 2 weeks to find (and I'm not the only one). I've just been trained to ignore their random buttons.

Customized Sorting Ability!

Amarok 1.4: Their sort operation was stable (preserves existing order), so you can fake this yourself by clicking on the columns you want in the reverse order. Again, I got pauses of a minute to do any of this.
Amarok 2.x: They added this, but didn't make it savable (unlike everything else in the UI). So it's an improvement, but if I switch to sort by ratings for a minute, I have to rebuild my 8-layer sorting preference after. Still pauses for about a minute.

Cover support!

Amarok 1.4: If you had a cover available, it would display it. There was also a the ability to import covers from amazon and let you pick the right one.
Amarok 2.x: If you don't have a cover, it displays this ugly blank CD icon. You can import covers from amazon, but only for everything at once. And you don't get to pick.

On Screen Display!

Amarok 1.4: Cool blue, rounded box. I think it had transparency support (but maybe I'm dreaming of imaginary better days).
Amarok 2.x: Ugly Gray Rectangle (TM) No apparent transparency support. You have the ability to "Use Custom Colors!" (ie, text only)

Customizable Layout!

Amarok 1.4: The layout was pretty static.
Amarok 2.x: Everything moves. It took me several hours over 3 weeks to figure out how to duplicate the setup they had in Amarok 1.4. It's still not perfect. Several nested layers deep you can control font sizes on the display... by guessing the field width? Random songs have an indented album name, no idea why. (Like, you see "Final Fantasy 8 OST" going down the column, and then suddenly it's all " Final Fantasy 8 OST"). Also, I somehow managed to turn "Open Script Console at Startup" on (no apparent menu option), and couldn't turn it off for months. (Apparently it saves the "you had it open" but not the "you closed it.") I finally nuked my configuration directory and it went away. Back to customizing the UI.

Automatically Find Content on your Disk!

Amarok 1.4: They had two buttons for "Update Collection" and "Rescan Collection". I still have no idea what the difference is.
Amarok 2.x: Now they just have "Update." However, there is no way to remove anything from your collection. So I had to move some of my songs outside my music folder, so that they would never play again (like my "Learning French" CDs).

Welcome to the Music Player of the Future! I could keep going, but my rage is fading. I really can't find a better player for Linux though... so I guess I gotta cut them some slack. This is free software after all, most of them are volunteers (give or take the magnatune guys).

In the true spirit of Open Source, I actually tried to get in and fix some of this myself. Repeatedly (after getting particularly annoyed about something or another). But there was so much weird QT interaction that I couldn't figure anything out. I spent several hours trying to figure out how (or if) the weighted random tracks feature worked, only to conclude that if the button did anything at all, it was doing the randomization in QT itself (or some arcane callback hidden somewhere).

Oh well, back to re-rating everything by hand....

20100415

man gcc

To whom it may concern:

gcc -O3 does NOT enable -funroll-loops, and has not done so since at least version 2.95.3 (released March 16, 2001). I don't know if it used to back in the olden days (like last millennium), but it doesn't now.

gcc 2.95.3:
-O3
Optimize yet more. `-O3' turns on all optimizations specified by `-O2' and also turns on the `inline-functions' option.

gcc 3.04
-O3
Optimize yet more. `-O3' turns on all optimizations specified by `-O2' and also turns on the `-finline-functions' and `-frename-registers' options.

gcc 3.1.1
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options.

gcc 3.2.3
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options.

gcc 3.3.6
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options.

gcc 3.4.6
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -fweb, -frename-registers and -funswitch-loops options.

gcc 4.0.4
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops and -fgcse-after-reload options.

gcc 4.1.2
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops and -fgcse-after-reload options.

gcc 4.4.3
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize options.

gcc 4.5.0
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize options.

This has been a public service announcement regarding gcc.
Thank you for your time. You may return to your regularly scheduled blogging.

20100401

More optimizations...

As half a followup to my last post, I just tried to recompile all of the C/C++ code I've written on my system (and a few small projects I didn't) with -O3 -ftree-vectorizer-verbose=2 to see what it could actually vectorize.

The results were rather dismal. It optimized one loop that looked like this:

for(int x=0;x<mySize;x++) {
    t[x] = myItems[x];
}

And a bunch that were constant initialization like this:

for(int x=0;x<mySize;x++) {
    t[x] = 0;
}

There were also a couple more in my BitArray that had errors, but looked like I might be able to rewrite them to be vectorizable. Granted, you spend a lot of time in loops, so a few optimizations go a long way, but overall it looks like it has a hard time working with most code.