20101205

Things I've Learned from Anime

I recently watched some particularly bad children's anime on Hulu, which has inspired this list.

  1. It's important to keep your promises... even if it kills you.
  2. If you love enough, you can gain magical powers.
  3. People who are trying to kill you, might become your friends if you fight them enough.
  4. Violence is a great way to show affection.
  5. Attacking your teachers is a great way to pass tests.
  6. If you're not willing to die with your friends, you don't truly love them.
  7. Repeatedly blowing up your school has no meaningful consequences.
  8. Animals are named after their favorite word.
  9. Saying random syllables after every sentence just means you have an accent.
  10. The harder you are beaten, the more likely you will power up.
  11. Americans.... speak... really... slowly....and.... clearly.
  12. Girls can pull anything out of their purse.
  13. Girls can pull hammers out of thin air.
  14. Everyone has a tragic backstory.
  15. Flashbacks occur when you most expect them... even if you've seen it before.
  16. Fighting someone is a great way to understand them... especially if they die.
  17. Every party needs a Chinese girl.
  18. The coolest weapon is a sword.  Other weapons pretend to be swords to look cooler.
  19. The worse a girl cooks, the more often she shares.
  20. If you are different, children will bully you.
  21. Children do best when their parents are never seen.
  22. Grocery store lotteries are the great enabler.
  23. Angry women can do anything.
  24. High schools are run by student councils.
  25. Songs with more English make less sense.
  26. Tantrums are a great way to get what you want, but only if you're cute.
  27. The standard unit of skill is 100 years.  (You're 100 years too early to...)
  28. Promises made to 6 year old girls are legally binding.
  29. There's always another conspiracy.
  30. Small children always run away after you give them something.
  31. Defeat builds character, especially if you win the rematch later. 
  32. A fall from any height is safe, as long as someone catches you at the bottom.
  33. Wings can make anything fly (no matter what they're made of).
I'll post more if inspiration strikes again.

Fonts

So I normally run Firefox without allowing pages to set their own fonts.  So I didn't realize until recently that the template I picked for my blog has ugly fonts.  I apologize, (though my readership consists mostly of me, so I guess no one was seriously harmed).  At any rate, I now have mostly no fonts, so if it looks bad change your browser's default fonts.

20101125

Politics and Budgets

(I had a lot of time to think this week while sitting in airports, hence the posting blitz.)

While watching the recent economic turmoils such as the collapse (so to speak) of Greece, and the pending collapse of the state of California, I've come to the conclusion that Democracy is failing.  Now, I'm not much of a political science guy, so I really don't have a better system on the whole.  The really nice thing about Democracies is that they tend to be stable governments (in the sense that they resist violent-revolution) since malcontents and prospective revolutionaries find it easier to work within the system than to overthrow it.  But they seem to be failing the "rational self-interest" test for short-sightedness across the board (ie, "Will I later regret the costs of getting what I want now?").

There is a trade-off here between short-sightedness, and the length of time you are willing to tolerate bad leaders.  Giving elected leaders longer terms will force them to plan farther ahead, since they will suffer more consequences when they are held responsible for the results.  So, to some extent, short-sightedness is the price you pay for avoiding being unhappily misgoverned by a leader you cannot dispose for a while.  But I think there are more basic problems with the control systems that might be fixable without such a trade-off.  (Yeah, I'm a computer engineer, so ideal governments are just fancy control systems with lots of inputs.)

The most basic problem I see, is that there is the lamentable lack of an educated, and self-disciplined populace.  This seems to be indirectly asserted by the prevalence of political advertising.  Everyone who puts up a political sign with "Yes on Blah" and "Vote for Me," seems to be implicitly admitting that they expect a large enough portion of the voting population to be swayed merely by repeated exposure to their signs that it is worth the expense and effort of placing them.  This to me, is the antithesis of a well-educated and self-disciplined populace.  But short of disenfranchising such folk (which severely reduces the stability benefits mentioned above), or pumping lots of money into education (which has been shown to increase both education and discipline, but takes a good 20-40 years to kick in), I have no real solution, so I will limit my whining.

The solvable part of what I think is behind the economic troubles, is that legislatures experience what Frederick Brooks calls the "committee effect."  When you get a group of people together with varying interests, there is a strong incentive to pass anything you do not strongly object to.  To use a game theory explanation, if each committee member has an item he or she personally wants passed, and fighting against another's item will cause them to retaliate by trying to block yours, it is in your best interest to make friends and support everyone's agenda, so that they will support yours.  In engineering, this results in "design by committee," where the feature list is so bloated as to be unrealizable.  The problem here is that while each individual item may be doable, there is an inherent trade-off in constrained resources.  However, the game-theory dynamics of each individual actor in the standard committee promote neither the conservation of those resource, nor any consideration of the trade-offs.  The result is that while each actor succeeds at the committee game by getting their feature on the list, they ultimately lose the real-world game, when the project fails (or random features get cut, if you have a particularly aggressive engineering department).  For legislative committees, the items are often government programs, and the constrained resource is budget dollars for their implementation.  When everyone's programs pass, the government overspends their revenue, and eventually the bottom falls out and someone loses the reality game (usually the voters).

The underlying economic problem seems to be requiring a balanced budget (or at least requiring the budget to fit some fixed figure, determined by the financial policy gurus who may calculate the optimal level of overspending).  There are a few ways to do this, and I think the best solution is a hybrid approach that uses all of them.  First, you could require the legislatures to balance the budget.  This effectively moves the resource constraint problem inside the committee.  Unfortunately, it also destabilizes the equilibrium of the game theory, which quickly results in deadlocks.  The California legislatures inability to pass a budget every year is a prime example of this.  So while I ultimately think that the solution is to move the resource constraints inside the control system, you have to be slightly more clever.

Another way to get around this is to add a second, post-committee control system that intelligently handles the over-spending problem.  If you can excuse the Computer Science reference, this would be a second-pass on the output.  This is akin to the engineering department that intelligently drops what they consider to be the less important features in order to make it fit into the schedule and budget.  You've now separated the decision making from the people who should rightly be making the decisions, and lost some subtleties about the true priorities and rankings in the process, but at least you have the problem solved somewhat intelligently by an interested party.  The improvement here is that otherwise, the real-world could impose constraints randomly, as whatever features fail to be finished on-time are cut, or when the project fails and everyone loses.  In the legislature, the place to vest this power seems to be with the governor.  Any budget which passes the legislature that over-spends the expected revenue, will be trimmed down to size by the governor however he sees fit.

While I think this is an improvement, I see it as falling into one of two possible traps.  If the legislature is consistently unable to pass a controlled budget, then you have just given the governor basically arbitrary control over the budget and program funding (up to the amount the legislature goes over).  To me this seems like a lot of power to stick in one place, even if it's only there as a check when the legislature has "failed".  Alternatively, the legislature could realize that it's in their rational self-interest to balance the budget themselves, and the dynamics reduce to the previous case where the legislature deadlocks.  So this system seems to oscillate between misrepresenting the true priorities (by concentrating the power in one man), or deadlocking.  But at least now we're only deadlocking sometimes.

The third approach is to change the game, so that the priorities of the actors promote balancing the budget more than (or at least appropriately with) their own personal (or constituent if you are less cynical) agendas.  This is kind of tricky, especially given the term-length/responsibility trade-off I've mentioned above.  But, I think that one way to improve this, is to give voters an incentive to care about the state budget, by raising taxes across the board to fund any budget overages.  Especially if this is done on a yearly basis, so when the taxes shoot up, or a new unpopular program gets started, it's fairly easy for an individual voter (or their favorite political media) to figure out why, and calculate their expected return for canceling the program.  Once this is well-understood enough that political candidates can use a rivals spending preferences against him, it should help coalesce the legislative agendas with the true cost to society, while avoiding the long-term overspending trap.  It might even produce better-educated, engaged and disciplined voters, but that's just a pipe-dream of mine...  I don't see a particular downside to this (besides the difficulties of getting it implemented), but it's unclear to me how much of an effect this would have by itself on the voters.  If nothing else, it would reduce the debt-incurring costs of overspending, even if the government itself is not made more efficient with respect to societies' true wishes.

But, I think that you can combine the strengths of all of these into a hybrid system.  First, require tax increases to cover any spending deficits, putting an overall constraint on the system.  Then, allow the governor the power to cut funding at will from any program if an over-spending budget passes the legislature.  To avoid the power issues, give the legislature the power to override the "budget veto" with a super-majority.  Together, this would establish hard-cap on the over-spending from the tax increases (with whatever accompanying changes in voter priorities that come out of it as a secondary benefit), allow intelligent decisions to be made about where to cut over-committed resources during legislative failure, move the constraining problem inside the committee deliberations (with the "out" of passing an over-spending budget to avoid deadlocks), and not concentrate too much power in the governor by allowing to a legislative override.

I realize, the current economic situation is much more complicated than government overspending, but this seems to be a recurring theme in modern democracies.  But, I'm approaching this from a control systems angle with a little bit of game theory thrown in for politics.  Anyone have a better idea?

20101124

Fun with Fedora

I don't want to be the guy who rants all the time, because life's too short to spend it complaining.  But I had such a negative experience that I'm going to share it anyway.  Thumper's Mother can scold me later...

I had to test some things for work on the new shiny Fedora 14 release, and decided that it was the most brain-dead and hard to setup Linux distribution I've ever worked with.  (After some reflection, the only other serious contender was Topologilinux, who's built-in upgrade system used to leave the installation unbootable.  But that was back in 2002, and since they're essentially some fancy installation scripts around Slackware, I'll cut them some slack (no pun intended), since I don't expect them to maintain the entire Slackware repository.)    All I wanted to do was get VMware tools installed properly, and to do that I needed to install gcc and some kernel headers, which sounded simple enough.  I'm willing to overlook (what I perceive as) the grave crimes of any distro that doesn't ship with gcc and the kernel headers included, because I realize that not everyone uses systems the same way I do.

Now, granted, I've never worked with Fedora or RedHat before, and Gentoo's package management system has pretty much spoiled me for life, so I was prepared for a bit of a learning curve to get everything setup properly.  The first task was networking.  I'm behind the firewall-proxy setup at work, so I didn't expect networking to work out of the box.  I'm even willing to overlook the fact that there is no happy graphical utility to set a system-wide proxy configuration.  Nobody else seems to do this either, despite the fact that it would just have to dump two environment variables to a config file.  But, on a modern Linux distribution, and especially on standard hardware environments like a VMware virtual machine, I expect my network devices to come up under DHCP without any user intervention.  So I wasted several minutes trying to play with proxy settings, before I realized that I had the more fundamental problem: eth0 does not start by default.

The next thing I expect from a software system, is reasonably descriptive error messages. If there's a network problem, I expect to see something to the effect of "Unable to connect to fedora.com," or even "Unable to download package database" or even "Plug in your network cable you moron."  But, the message yum gave me was something like "Error: Cannot retrieve repository metadata (repoman.xml)," which just doesn't help me at all.  Now, maybe they tried, since the word "retrieve" is in there, which sounds vaguely networky, but it's not terribly descriptive.  It sounds like the kind of generic error message you get when the programmers were either too lazy to implement proper error handling, or the errors are obscure enough that the programmers could not have reasonably predicted them.  So I went digging around with yum, trying to rebuild package databases, and running obscure maintenance-looking commands, trying to figure out what could possibly be weirdly screwed up on my stock install.  After fixing my proxy settings (aka, copying the environment variables so many places, yum has to get it right), I was confronted with a different error message.  Now yum was listing websites and telling me it was failing to connect with them.  Somehow, after giving yum access to the Internet, it was finally giving me an error message that looked definitively network related.... go figure.  Clearly something else was wrong.  I managed to convince myself that the sites in question really did exist (at this point I was praying they weren't dumb enough to ship with dead mirrors), and that I really could connect to them through the proxy and the firewall at work.

After longer than I'd like to admit, and double and triple checking the network connectivity and proxy settings, I gave up and started digging around on Google.  I finally found some other poor saps looking at the same error message who had managed to find a fix.  Apparently, and for reasons I cannot adequately explain, yum is unable to access HTTPS sites out of the box.  This is, as you might imagine, a severe limitation, when all of the default mirrors include https:// in the URL.  So, I went and hand-edited my repo list, and converted all of the https:// to http://, and prayed that nobody was running a https only mirror.  That finally worked for me, and I was able to start the onerous process of matching up real-world package names like "gcc" and "kernel headers" to the cryptic, numbered formulas that seem to be the best non-Gentoo package managers can offer.  I'm really hoping that this was an issue with the Squid proxy at my work, and that the Fedora folks didn't ship a release that was unable to validate SSL certificates.  For that matter, I'm not completely sure what added security running over HTTPS really gives you.  I'm hoping they checksum the binaries, so the only advantage I see to HTTPS is to make it so that people upstream from me can't tell exactly which packages I'm downloading...

Meanwhile, I had another problem: yum was deadlocking itself.  Apparently, yum will happily spin for over 10 minutes, trying to acquire some internal yummy lock.  Furthermore, Unix file-locking being what it is, killing the competing processes doesn't release the lock.  So, I had to go Google around for the lock files yum uses, so I can kill all the yums on my system, free the locks, and try again.  (I had by this point accumulated quite a number of hung yums.)  Somehow, this was unsuccessful.  So I logged out, and restarted my X session.  No luck.  I rebooted.  No luck (after reworking the solutions for eth0, and the proxy above).  I finally figured out that whatever system-tray icon their default session launches to inform me of all the wonderful updates I'm missing was also running yum, and hanging in some new and interesting way.  I never solved that one.  But, if I killed it, and all my hung yums, and cleaned all the lock files, finally I could install packages.  However, mistyping a shell command would cause my shell to hang indefinitely... I can only assume it was asking yum what I meant to type, so it could offer to install it for me, but that yum was still horribly broken in some unusual way.

This, is the Linux distribution that gets all the money?  I fear for our novices...

20101003

Fun fact of the day: Ping

Pressing "Ctrl+\" while running ping will display the current statistics without quitting.

You may now return to your regularly scheduled programming...

20101002

The Design of Design

I recently finished reading The Design of Design: Essays from a Computer Scientist by Frederick P. Brooks, (better known for his classic text The Mythical Man-Month: Essays on Software Engineering).  I would highly recommend it for any computer science students, and even any other technical discipline that deals with design work.  While most of what he says about design isn't anything particularly new or sensational, he manages to cover the breadth of design wisdom and put it all into a coherent framework, with lot's of examples and stories to back it up.  It's also a fairly quick read.  My only complaint is that about half the book is an account of several iterations of him designing his family beach house.  That was boring enough that I skimmed or skipped most of it.  He used it mainly to illustrated some of his design principles in a way that was accessible to a general audience, and I'm sure it would be more useful to people who are hearing some of his design philosophy for the first time.  More useful, was his analysis of some of the early IBM computer architectures that he worked on, and some discussion on the constraints and trade-offs involved.

Another chunk of the book was devoted to discussing the design process as it interacts with an institutional framework such as a business.  Having recently started a full-time job as a computer programmer, I found this to be the most interesting part of the book.  He balanced theory with examples and lots of practical advice, which is fairly rare among most of the people I've seen try and cover the topic.

Of uids and shell scripts ...

So, I've got this extensive collection of shell scripts that I've written sitting in ~/bin, that I use all time.  In fact, I like them so much, that I wanted to be able to call them while  I was root.  But that's a huge security hole, because it means that anyone who has become me and can edit my scripts, now has a pretty good chance of me running them as root.  Granted, if someone actually owns my user account, I already have some serious problems, because in all reality I care about my data much more than I care about "root access" by itself.  But, seeing as I wrote all of these scripts myself, there's a reasonable chance that there's some egregious bug in them that I don't want to run as root anyway.

I used to just put my user scripts dir in my path as root, figuring that I'm the only one who's ever me, and I usually have physical security of my box.  But as I started copying my scripts to more and more computers (including my linode, which I don't  have physical control of), this started making me nervous.   So eventually I solved this problem by making two copies of my scripts, one to sit in /root and one to sit in ~/bin.  This worked great for security, but now I had to maintain and sync two copies of this stuff, and bug fixes weren't propagating quickly.  But I didn't have a better way to do things.


I was talking to one of my friends yesterday, who told me that he uses "su -c" for that problem.  That let's you run a command as another user, so you can drop root privileges.  If the syntax was a little different, this is almost exactly what I wanted, but unfortunately I play enough games with arguments, that I wanted to be able to pass ./script 1 2 3 'a b c' correctly, and I couldn't figure out how to do that on a shell level.  I also don't want to worry about sanitizing arguments with spaces or escape sequences or anything.

So I finally convinced myself that the right way to do this was in C (after a failed attempt to find perl commands that would let me do this).  It actually turned out to be way easier than I expected.  Essentially I can just call setuid, pull off the command line arguments, and pass them to exec, and I have a script which runs a command as the specified user.  I ended up pulling some library routines out of the "su" utility to do some environment sanitization , but other than that the code is a few system calls and some error checking.  If I ever clean it up, and make sure I've preserved all the copyright notices I'll stick it up here.

But here's the gist of it (error checking removed):
Called as "suwrap user command args"

int main(int argc, char ** argv)
{     
    int x;
    uid_t uid;
    char **args;
    passwd *p;
    int argsize; 
     
     
    p = getpwnam(argv[1]);
     
    addenv("HOME", p->pw_dir);
    addenv("USER", p->pw_name);
    addenv("LOGNAME", p->pw_name);
    
    uid = p->pw_uid;
    
    setuid(uid); 
         
    argsize = argc-COMMAND_INDEX+1;
    args = malloc(argsize*sizeof(char*)); 
     
    for(x=COMMAND_INDEX;x<argc;x++) { 
        args[x-COMMAND_INDEX] = argv[x]; 
    }
         
    args[argsize-1] = NULL; 
    
    execvpe(argv[COMMAND_INDEX], args, newenvp); 
     
    //No one will ever see this... 
    printf("\nDone!\n");
    return 0;
} 


I don't know if I should feel proud or disturbed that I've progressed from writing shell scripts to C code, but I definitely feel that I have crossed a Unix threshold here...

20100929

My poor neglected Blog...

Ok, the Amazon context ads for my blog are showing "Eric Clapton" and books on massage.  Clearly I need to add some more geeky posts.  I'll see what I can do.  (Of course posting this message will only convince the AI that it was right...)

20100623

Book Review: Priests and Programmers

I recently finished reading Priests and Programmers by J. Stephen Lansing, a book on irrigation practices and native religion in Bali.  To be honest, I found this book in the $1 used book bin at a textbook store outside UC Irvine, and I thought they were talking about computer programmers.  So I bought it expecting to find some Orwellian tale of a totalitarian government using religion and technology to blind it's populace.  Turns out they mean "programmers" as in "people who make up schedules," in this case, the priests setting the irrigation schedules.  Once I started reading it though, it was so interesting I kept going.

Apparently, the island of Bali (located in Indonesia) has an amazing irrigation system that has to be carefully maintained in order for anyone to grow any rice.  During the rainy season, there's water enough for everybody, but during the dry season, the water usage has to be regulated so that all of the farmers get enough water for their rice.  This gets slightly tricky since rice needs lots of water at the beginning of it's growth cycle, and very little at the end, so if you stagger the planting of the various farmers in the region, you can minimize the water shortages and maximize the rice.  The real trouble comes from the rice pests, which hop from field to field as long as somebody nearby is growing rice.  So to control the pest population you want to coordinate nearby fields so that they all lie fallow at the same time and the bugs and things die.  Striking the correct balance between synchronizing and staggering the fields is a complicated process that has to be tuned to the current conditions like rainfall and local weather.

In Bali the farmers attach mystical significance to the river and irrigation waters as being the property of the various gods and goddesses that inhabit the island.  As such, all of the water usage is coordinated by priests that inhabit temples built at the source of each canal, river, and lake.  These are called water temples, and serve to coordinate the water usage and planting cycles of all of the farms that belong to a particular watershed.  Lansing also argues that the religious rituals of the Bali people that involve the mingling of holy water from the various springs and lakes attached to the temple, symbolize the reliance of all these separate farmers and peoples on each other, and create a sense of group consciousness of the responsibility and social implications associated with water use.

Besides offering detailed accounts of the relevant religious ceremonies and an analysis of a computer model demonstrating the impact of water temple coordination with pest cycles, Lansing also tells the history of the Bali island and its water temples.  The Dutch arrived and eventually conquered the island in the 1850's or so, and tried to manage the local populace and the rice production.  However, they failed to notice the role of the water temples in coordinating the farming activities, because they tended to discount anything related to the local religions.  So the system kept functioning undisturbed until the "Green Revolution" hit Bali in the 1970's and 1980's.  Bali was now being run by Western-educated Indonesians, who didn't understand the roles of the water temples, and started disrupting the traditional farming practices by introducing new high-yield crops and farming technologies.  This quickly resulted in disaster as pest populations swarmed out of control, and devastated rice yields.  Farmers who resisted the transformation were written off as religious conservatives who were fighting change.  (Granted, it's not clear to me that the farmers or the priests completely understood their role in regulating the ecology of the rice fields, so I don't think you can completely write off the Western influence as being unduly dismissive of the traditional practices.)  At the time of the writing, the government of Bali was starting to study and understand the role the water temples play in managing rice production, and is beginning to allow them greater control in regulating the rice production.

All in all it was a good read.  There was a little bit of social-science style "Marxian Analysis" at the beginning and the end, but the rest of it was just anthropological study, history, and ecology.  I think it's definitely worth the few hours it takes to read it.  (It's not quite 200 pages altogether).

20100622

The future of mico-pricing schemes...

I just read The Man Who Could Unsnarl Manhattan Traffic on Wired, about one man's attempt to model New York city's traffic system and find ways to improve it.  He's released a giant spreadsheet outlining his findings, along with a plan to improve the efficiency by introducing a toll system to charge drivers entering high congestion areas.  The article itself was a fun read, but what really got my attention was the stuff at the end on micro-pricing schemes.  The idea is, you can put GPS technology in everyone's cars, and then charge people on a per-street basis for where and how much they drive or park.  From a theoretical modeling stand-point, this level of control always looks amazingly efficient, because you can optimize prices to more accurately reflect the cost of maintaining that stretch of road, and the cost to society of the delays you cause on other drivers during congestion.  You get to decide, at 4pm rush hour, whether or not it's worth $5 for you to drive home the quick way, and in return, society has a quick way, for the people who are willing to pay $5.

Another industry that could benefit from a serious micro-payment system is online newspapers.  Newspaper "pay-walls" repeatedly fail, and the newspapers blame the publics' unwillingness to pay for anything on the Internet because they feel entitled to it.  That's probably at least part of the problem, but I don't think that's the whole story.   I pay for things on the Internet all the time, but I'm not willing to shell out $50, or even $20/year for a subscription to some random newspaper.  I get most of my news from links on aggregator sites, things like Google News, Slashdot, and the few dozen blogs and RSS feeds I subscribe to on Google Reader.  Now partly that's increased competition driving the prices down, but mostly I just don't read enough articles from a particular source.  Now, over the course of a year, I probably read a few hundred New York Times articles, and they're usually very good.  I could even assign a value to the amount they enrich my life, and would easily be willing to pay $10-$50 (depending on my mood) for the privilege of doing so.  But each individual article contributes relatively little to the total, perhaps 10 cents for a good article.  So when facing a pay-wall to read any particular article, there's no way I'm going to shell out $10 in order to read it.  While it might rationally be in my best interest, I don't have enough information on-hand to feel justified making a commitment to read $10 worth of articles over the next year.  This changes when you're going to a particular site frequently enough that you know you'll get your money's worth.  I actually bought a one-year online subscription to Scientific American for $50, because I'm signed up for their RSS feed, and I read at least 20-30 articles a week off of their website, so I felt fairly confident that I could read $50 worth (in value to my life) of articles in a year.

As a small aside, I ended up canceling my subscription, because it didn't end up doing my any good.  What I ended up buying was a PDF version of their magazine (complete with the usual ads) every month.  I don't sit down and spend a few hours reading a magazine, I spend 10-30 minutes in a stretch reading a smattering of individual articles off of my RSS feeds.  The subscription wasn't at all integrated with their website, and when I would click an article off of my feed and hit a pay-wall, it would redirect me to a page letting me know how I could buy the PDF and read the article.  But I already had the PDF, it was just too much work for me to go dig it up, and figure out what page the article was on, and dodge the advertisements in order to read it.  So I would move on with my life to the next thing on my feed.  I ended up not doing anything with the PDFs, and still reading the same articles I was reading for free off of their website.  One day I finally found an article that looked cool enough for me to track down in the PDF, only to find that I had moved or deleted it (realizing that I never used it) and thought I could download it again. But, alas, your download link is only valid for a week or two, so they wanted me to pay them another $15.  That was the last straw for me.  I canceled my subscription and they were nice enough to refund me $30 of it.  Kudos for that at least. 


 Personally, I don't think that the business-end of the publishing companies and news agencies have completely caught up with the change in people's reading habits due to technology.  They're still thinking in yearly subscriptions, when people are now thinking in individual articles.  The conundrum they always face with these pay-walls, is that to make people realize how much value they are getting from The New York Times, you need to let them see enough content for free that they feel justified in making the commitment of a subscription, but not so much that they can just get it all for free.  What they really need is a micro-payment system that works on an article level.  If faced with a pay-wall that could take 5 cents out of my Paypal account (so that it's site independent), then buying the article becomes a legitimate decision that I feel qualified to make on the spot.  They tried to do this with online advertising, where the advertisers essentially pay my 5 cents for me, but between the bottom falling out on the online ad markets, and the increased competition from every paper in the world (as opposed to the smaller local papers you can actually buy physically at the store), they still seem to be failing.  I think moving to an article-based model would actually help them, but it requires a good micro-payment system to be in place.  Of course, it also requires the news agencies to step up their game, and compete with each other on the quality of their articles.  When you can just buy the headline articles, they don't get to force you to bundle all their filler with it.  I mean, honestly, have you ever read the "Cooking Tips" articles in the back, and if so, would you pay for them?  So while the news agencies might be making less money selling you "Pet Diet Tips", the actual efficiency of the system increases, because they stop writing articles nobody cares about.


But practically, (in my opinion), the problem with all of these micro-pricing schemes is that people have a hard time managing that many micro-decisions.  First, they have a hard time making that many decisions optimally.  It's one thing to decide on a given day whether or not you want to pay $5 to take the toll-road on your way to work and save an hour of commuting time.  It's another thing entirely to make block-by-block decisions while driving, about whether it's cheaper to turn left now, or go an extra block and then turn left, and to do so at every turn.  It's just too many decisions for a person to make to be worth the hassle of them caring.  Granted, the granularity of the pricing scheme can be adjusted, but the coarser you make the controls, the less efficiency gains you get compared to the theoretical ultra-fluid model.  Even if the number of decisions is reduced to a manageable level, the associated costs of each option have to be large enough that people can realistically assess them.  For instance, if taking a particular one-block detour from my normal route will save me 5 cents, but cost me a minute of time, it's kind of a toss-up to me whether it's worth the 5 cents and the mental strain of taking a detour, or the extra minute I save going that way.  If I were particularly motivated, and I travel the route every day, with predictable prices, I might justify a permanent detour in the name of saving myself $10/year or so and costing myself an hour of my time.  But I'd have to really care.  But if you increase the granularity a step further, and the trade-off was between 5 tenths of a cent, versus a few seconds of my time, I'm really unable to make the call, and for stakes that small, I really don't even care.  Again, on a larger scale, making a consistent decision for every trip I take over a year might add up to something meaningful, but it's really hard to tell from the collection of micro-decisions.  Let alone adjusting these fluidly to account for the fact that if I'm late for an important meeting, as opposed to on my way home from the park, my priorities may be drastically different.  At some point, the fine-grained types of theoretical control that these super-efficiency gains need break down when you require people to be making these sorts of decisions continually, and rationally.

But even counting the number of times people make irrational decisions, either because they can't tell, or they don't care, let's suppose that they can make rational decisions enough of the time, that you can get efficiency gains.  The next problem with a practical implementation of this is the perennial customer-service favorite: the Bill.  The cellphone companies are starting to get a taste of this with the consumer back-lash to what is essentially, a flat-rate micro-payment system for your voice and data traffic.  Discrete, simple choices lend themselves to customer satisfaction.  Assuming I get what I think I paid for, when I spend $50 and buy a chair, I did so because I knew the chair would be worth more to me than $50 and whatever else I could exchange for it.  At the end of the day, I'm happy with my chair, or at very least, I feel responsible for having made the decision.  I'm not surprised when the bill comes at the end of the month, because I remember making the call.  If I eat out at a fast food restaurant a few times a month, I may not remember the exact number of visits, or how much each trip cost when I get my credit card bill.  But, I at least feel confident that each time I went there, I made the decision that at the time, it was worth $5 to get my taco.  But if I visit 2,000 websites on my cellphone, and get a bill at the end of the month for $30, I may not feel so confident that I visited each of those websites will a full understanding of the consequences.  Partly this is due to lack of transparency.  If I could see the cost of a page before I went there, or at least track my progress (debt?) over time, I could get enough feedback to self-regulate my usage.  That would help me keep my overall use and overall cost in the same ballpark, but I'd still have a hard time looking at my bill at the end of the month, and deciding whether or not the 2,000 micro-decisions I made were each justified.  Now take out the flat-rate, like in the case of the road networks or newspaper articles.  Imagine that your data-usage fees varied by time of day, day of the week, and which websites you visit.  Instead of the one-shot decision of 3 cents per google search, now you're faced with scheduling issues where waiting an hour to do your google search might only cost you a penny, or you can do it in 15 minutes for 2 cents.  As the complexity of these decisions moves beyond the ability and motivation for people to make, your satisfaction and confidence in your bill decreases.  This is why the cellphone companies use block-subscriptions to sell their service.  People buy more than enough minutes or bandwidth to cover their needs, and can then make one decision that $30/month is worth their overall usage.  (It doesn't help that the cellphone companies are predatory, and quite happy to charge you obscene rates for going over your plan.)  But the efficiency of these networks suffers as a result.  With a subscription plan, everyone calls during peak hours, because they have no incentive not to, and the services get overloaded.  Things like free "night and weekend minutes" are a halfway measure to increase efficiency (and act as a great marketing ploy) by encouraging people to move nonessential calls to times when the network is free by offering monetary rewards (free minutes and therefore a reduced bill).  The trade-off then is between the unpopularity and the efficiency gains that are lost, or only partially achieved.  And that's assuming that there really were efficiency gain on average with an unpopular pricing scheme.  Otherwise, if people end up not caring or acting irrationally because it's beyond their abilities, you simply end up with capricious and opaque pricing schemes, on top of the normal inefficient usage patterns.  So now the micro-management system is useless, and unpopular.

That's why I think micro-payment systems imposed on people are doomed to fail.  But, they have a shot if people can express their preferences, and let a computer make the hundreds and thousands of micro-decisions for them while protecting their interests.  The most talked-about example of this is probably the so-called "Smart Grid".  Power companies are faced with incredibly volatile and complex markets.  Every time you flip on an appliance, a huge amount of power starts suddenly starts flowing into it, creating what's essentially an electrical shock wave that flows all the way back to the power plant.  If they don't compensate for all these jolts to the grid, the system collapses, and everybody loses power, and/or fries their toasters (both literally, and figuratively).  A number of years ago, someone documented that relatively large spikes would occur during television commercial breaks, as everyone left their televisions to go turn something on or off.  (I can't find a source for this right now, but if I do I'll put it in.  I'm fairly confident in the assertion however.)  The power companies also have to match total power output with the amount people are using, to avoid either wasting power, or causing black/brown outs.  So, as demand spikes during the day, they turn on increasingly expensive generators to make sure it all works.  But if everyone washed their clothes at 3 AM, when all the lights were off and the businesses were closed, it would be much cheaper for the power companies because the total load on the grid is greatly reduced.  But currently there's no way to communicate such a complex set of power pricing to consumers, and even if there were, there's no way to measure it so that the bills could be computed.  And after you solve those, you still have my earlier points on the failure of micro-pricing schemes.  What the designers of smart grids hope to do, is to put electronics into every appliance and home on the grid, to report and coordinate power usage and pricing schemes.  The dream is that your computer or heater could enter "low power mode" during peak usage times, or short bursts of strain on the grid, and your dishwashers and laundry machines could wait to turn on until rates were cheaper.  The power company would then return some of the savings to you in the form of cheaper power when you're using it at the non-peak times.  Assuming you could get all the electronics in place, I think there's hope for regulating continuous things like air conditioners.  Almost everyone I know regulates their thermostats and air conditioner usage to save money on the power.  If they could either get a read-out of the current, actual costs, or be able to set preferences on their computer, they could only do so more efficiently.  Imagine if you could tell your air-conditioner to not only turn off when you're not home, but try to keep your monthly bill under $20.  Perhaps while keeping the house under 85 degrees at all costs.  At that point its just a question of user-interface, to specify what you're willing to spend and tolerate.  I have less hope for sporadic things like dishwashers and laundry machines.  I guess I wouldn't mind my dishes being done by morning, but if I have to go switch my clothes into the dryer I want to know before hand when it'll be done.  But at any rate there's hope.  I would do my laundry at 3 AM if you paid me enough.

The salvation for micro-pricing systems of road-usage is the advent of trip-planning GPS navigators.  Whether you're planning your trip on Google Maps which advises you that you can save time and money if you wait an hour to leave, or informing your portable GPS device that you value your time at $15/hour, the computer can process all of the possible route information and optimize it for you.  Saving 5 tenths of a cent for a 3 second delay may not matter much to you at any given time, but saving $20 for an hour of time spent driving over the course of a year might be worth it.  At any rate, now you have the capabilities to make that call.  Run one of these in conjunction with the ability to look up your current account status, and you've got a shot at having a workable system.  Particularly if you can see options and prices you can choose between before each trip, or the pricing and timing differences on your bill at the end of the month had you have selected one of the "economy" or "speed" modes.  Granted this all really only matters in big cities and other high-congestion areas, but those are also probably the only areas where you would want to implement some kind of variable toll or taxing system.

Newspapers are harder.  I'm not sure about the per-article production costs a newspaper had, so it's hard for me to say whether or not they could get enough to compensate for the loss of subscriptions with mandated filler purchase, or how much they actually make off of the in-page advertising.  I honestly think a micro-payment system through Paypal could work if the prices are small enough.  But sometimes it's hard to tell if an article is worth $0.15 cents before you read it, and you still have the problem of getting your bill at the end of the month and not quite being able to account for where it all came from.  Computer technology is less able to help us here, because even with access to the full text, computers aren't yet good enough to tell us how good an article is (on average), and predict whether or not we'd like it.  I think there is hope down the road if AI technology continues to improve however.  I don't think it would take much to integrate a payment system into Google Reader.  Individual articles could have some kind of "paid content" icon, with a listed price next to it.  AI technology could process what kinds of articles I read, and allow me to rate them, and set price levels that I would be willing to pay for them to train it.  All the "Web 2.0" is doing this already so that they can data-mine peoples preferences for advertising.  Just take some of that technology and pump it into predicting price levels I am willing to pay for it.  Then, let me set maximum price thresholds or monthly bills, and view their progress, and let the AI automatically filter content that is either outside my price range, or costs more than expected interest in it.  Perhaps have a "premium" category of things I'd like to look at the headlines for, but be warned about the price before I click them.  Add a button to let me flag "this article was stupid/over-priced" and you've got a shot at a workable system.  Cable companies and Newspaper companies have always gone with subscription services to "networks" of content (be it channel selections or a series of newspapers), but I really think with the Internet proliferating the number of content/article providers, there's too many with too little control to make subscribing to a giant "news and content network" plausible.  The only way I think it could work is with a micro-payment model, perhaps coupled with an AI to help keep my bill within reasonable bounds.  The big problem is that it's so hard to quantify the trade-offs involved.  In my car, I can choose to spend the extra hour on the road, or pay $5, and time and money are both quantities that aggregate across multiple trips.  The amount of "life-enrichment" I get from reading an article is hard to compare to another article, let alone calculating the amount of money I saved by missing out on a few dozen "liferichment points".  But if you read a lot of articles and the prices involved are too small or large, I'm not sure a human could keep up with the necessary decisions.  So put me down for a "maybe" on the future of online news articles with micro-payments.

Finally, cellphones are a lost cause.  Choosing to delay a call from "prime-time" daylight, to "free night and weekend minutes" is about as complicated a choice as I think people are prepared to make.  For web surfing its even worse, because there's often no way to tell how much data is required for a given visit to a site, or surfing session.  The best I can offer here is lots of tracking and transparency (which so far the cellphone companies have resisted providing, whether out of avarice or laziness).  I think there's some hope for AI programs to track your usage patterns, and offer suggestions, but I have a hard time seeing large enough price incentives for people to care.  A trip to Youtube might pop up a message saying something to the effect of "Normally you download 5 MB on a visit to Youtube, which will cost you $2 right now.  If you wait an hour, it would only cost you $1."  Perhaps more useful would be "You usually talk to your sister for 3 hours, but if you call her after 6pm, it would be free."  But I suspect that most phone web-usage is too time sensitive enough to delay, and setting truly variable rates would be too chaotic to make meaningful future predictions.  The best I can offer if the networks get really desperate is a message when you start your browser saying "Current rates are 0.10 cents/ KB", and maybe someone doing causal browsing would decide it's not worth it right now, but I'm not sure that's really much of an improvement over the flat-rates they offer now.

So uh, I'm not sure if I had a point in all this, but the next time you see someone talking about micro-pricing schemes, be skeptical, and look for the computers!

20100607

Another victory over Amarok!

Playlist filtering works!!!!


I was bemoaning my inability to search, when I noticed that editing tags on the previously unhideable tracks made them start working normally.  But only some tags.  I can only assume, for some reason, they were never entered in the amarok database?  I dunno.  It was  a mix of flac and mp3 files that were in there.  Possibly things I've never edited by hand since the reincarnation of my amarok database.  But at any rate, giving them all a star rating makes them work now.

Now maybe I can finally finish tagging this mess...

20100527

Amarok is Usable Again!

Okay, I feel kinda bad after my last Amarok rant. I really do like the thing on a good day, it's just got some weird quirks.

I managed to fix some of my problems today, and they weren't all Amarok's fault.

The intermittent crashing I finally tracked down to a gstreamer bug. I managed to patch it myself, before realizing that it was already fixed in git. At any rate, not Amarok's fault, and not a problem any more.

My playlist filtering has actually been working, I just have this block of 100+ tracks that seem to be immune. I'm going to assume that it can't index them for some reason, and displays them by default. Weird default behavior maybe, but whatever. Removing them from my play list solves the problem, so I'm happy.

I used to have issues with other applications stealing my sound and cutting off Amarok, or vice versa, yay linux sound! This seems to be a xine-lib problem, and switching everything to gstreamer solves it. Ironically, switching to gstreamer forced me to upgrade all my gstreamer plugins to get .m4a files working which exposed the aforementioned crashing bug. At any rate, Amarok's off the hook on this one.

And for the record, the new versions have dramatically improved performance when updating meta-data, though I'll still get the 1-minute cpu-bound hangs every once in a while for some operations. To be honest this really puzzles me, since while I was looking at backtraces from the crashing problem, it's running 28 separate threads at any given time, most of which are running mysql code. I guess they're not getting a whole lot of parallelism out of them for whatever weird cases I trigger.

But yeah, Amarok's back in my good graces for the moment.

20100526

New Laptop and Linux

I recently bought myself a new Toshiba Laptop, model T135D-S1325 for anyone who's interested.  I'm always impressed when laptop hardware actually works in Linux, and since I couldn't find much on Google, I thought I'd share the highlights.

Whatever Windows 7/Toshiba partitioning setup they put on here causes ntfs-resize to break Windows when I tried it.  Not that I really mind, but I tried (and failed) to back up the existing setup in case I ever needed it.

The screen brightness function keys are done in hardware, so they work under Linux.  This is really nice, since on my last Toshiba laptop where I had to boot into Windows to change it.

However, most of the other function-keys don't work. No real surprise here, but there doesn't seem to be any other hardware wifi switch, which is mildly annoying.

I haven't tried the webcam, and maybe never will. I don't trust them...
[Edit: I have since gotten the webcam working. It was also fairly painless.]

My X server came up on HAL without hardly any configuration on my part, but the synaptics touchpad was a little bit of a pain.  I'm not sure if I'm doing something wrong, but I can't seem to initialize the thing right, so that tapping the pad sends a mouse click.  But, I can get it to work manually by running synclient TapButton1=1, so I just set that to run as part of my XFCE startup and I'm off and running.

There also doesn't seem to be a way in the BIOS to require a password to boot off of a flashdrive. I set an "administrator password" (to change BIOS settings) only, because otherwise the "user password" needs to be entered every boot to access the hard disk, so it's possible that would do it for me. At any rate, after digging around on the internet, it sounds like there's a master Toshiba password that'll clear it all out anyway, so I suspect I am vulnerable to flashdrive booting no matter what. Such is life.

The only real challenge was the wireless card, which took me several days to finally nail. The wireless card does not currently have a driver in the mainline kernel.  It shows up as this in lspci.

09:00.0 Network controller: Realtek Semiconductor Co., Ltd. Device 8172 (rev 10)

After splicing together several howto guides, and the terribly written readme in the driver itself, I finally managed to get it working. Getting wpa_supplicant working with it was even harder. Apparently over several versions of the chipset/driver they switched the wpa interface from ipw2200 to wext. The readme tries to explain as much with a step-by-step guide that's on par with the IRS tax code in clarity. Here's what finally worked for me:

Get the driver from here Realtek Drivers. The driver is "RTL8192SE" (despite the fact that it doesn't match the version in lspsci...) Extract it somewhere and run make install. Every time I update my kernel the make install fails, and I have to manually create the /lib/modules/2.6.32-gentoo-r7/kernel/drivers/net/wireless directory before make install works. I also had to manually copy the contents of firmware/RTL8192SE into /lib/firmware/RTL892SE. You can do it yourself without make install, but you have to copy the module into /lib/modules and run depmod yourself. At this point, it kind of worked, but I would get weird error messages. Apparently the driver needs a bunch of library routines in the kernel's wifi code. Specifically the 802.11 TKIP routine and one of the MAC algorithms. I just went through and turned on as much as I could and it eventually worked.

Use the following settings for wpa_supplicant: (This highly depends on specific chipset and revisions!)

wpa_supplicant_args="-c /etc/wpa_supplicant/wpa_supplicant.conf -D wext -i wlan0"

And that's it. Other than the wifi card, the whole setup was much less painful than my last laptop. Linux has come a long way...

20100515

Fun with gst-plugins

To anyone trying to figure out how to play .m4a files with gstreamer, try installing gst-plugins-faad.  At least, 50 plugins later, I think that was the one that finally fixed it...

20100509

findDuplicates.pl

I have a big collection of wallpapers that my siblings have a copy of and will add new ones into the mix.  The problem is, sometimes they move stuff around into their own folders, and make it really hard to sync.

So I was looking around on the internet, and found hardlinkpy.  It finds duplicate files on your file system, and lets you hardlink them together.  While nifty, it wasn't quite what I wanted to do, but they inspired me.  It looked easier to write my own than to try and modify their huge python script (I don't know much python), so the end result is findDuplicates. It goes through a set of directories, and compares files of identical size to see if they are the same, and then either lists or deletes them for you.

Now I can copy an entire folder system, remove the duplicates from mine, and then merge the two without worrying about duplicates.  (I was going to write a merge mode, but it proved easier to handle by hand.)

For anyone who's got a similar need, enjoy!

20100506

Statistics

Since we have entered the "Propaganda Age," there seems to be an increasing distrust of published statements and experts in general, and statistics specifically.  I think that this is deplorable, so I would like to go on record objecting to this trend.  It is true that there are many dishonest ways to gather information and misleading ways to present it, however that is only half the story.  For every deceptive presentation of data, there is an honest interpretation that can refute it once all the facts are known.  If you accept the existence of objective reality, and the consistency of rational thought, anyway (so my arguments won't reach the true laissez-faire existentialists, but what can?).  If you accept objective reality, there is one set of data, and if you have enough knowledge you can interpret it, or at least be swayed by someone who presents a superior interpretation.  So what you should do is verify statistics, not simply doubt all of them.  Check the sources, find credible interpretations, and make yourself smarter if needs be (it can't hurt).

Because in all honesty, as much as the producers of propaganda like to slant statistics, they would love it if everyone stopped believing in them altogether.  Once you give up your right to rational thought and argument, all you have left are your innate emotional prejudices and opinions, which are way easier to manipulate (research autobiographical advertising and cognitive dissonance for starters).

So remember to Think!

20100505

Adventures in buffer underflow...

I wanted to play around with encrypting my swap partition recently, and was impressed by how easy it was in Gentoo.

I add 2 lines to /etc/conf.d/dmcrypt
swap=crypt-swap
source='/dev/sda2'

and change my fstab from /dev/sda2 to /dev/mapper/crypt-swap and I was good to go.  The Gentoo dmcrypt init scripts automatically generate a random key, and format the partition during boot.

Then the fun started... I just wanted to make sure it was working, but the only way to do that is to use up enough memory that my system starts swapping.  Not such an easy task when you have 8 GB of ram.  After a few failed attempts, (even Firefox doesn't use that much memory), I figured that decompressed image files use lots of memory, and decided to open an entire pictures folder with GIMP.  A few hundred windows later, I managed to crash my window server, and still not use enough memory.  Restart X.  Now I was annoyed and determined.  I'm a programmer, I figure I know how to use up memory.  So I wrote a perl script that looks like this:

#!/usr/bin/perl
@temp = <STDIN>;

Let's just load entire files into RAM.  Seems simple enough.  But I needed some big files.  The biggest thing I could find on hand was a 2 GB avi video.  So I cat that into my script, and it doesn't use up memory fast enough.  So I decided to run 3 of them simultaneously.  That's gotta chew up some serious ram, right?  Well, in your memory display, you've got all these fields, like total, used, free, cached, and then  you've got this little neglected field called "buffers."  I've never really paid attention to it before, but apparently, when you run out of it, you have a kernel panic.  (At least I think you do.  As much flack as I give Windows for BSODing, at least they reliably display an error message when they crash.  If you kernel panic in X, everything just stops...)  The last thing I saw before the screen stopped updating was top displaying my buffer memory as 128k.

So here's what I think happened in retrospect.  Perl is a text processing language.  So when I read all of STDIN into an array, it's reading things one line at a time.  Makes sense, right?  But when I decided to send an avi file down the pipe, I'm going to assume that avi files don't have line-ends very often.  So somewhere in my I/O chain, something was buffering gigantic amounts of data looking for a line-end.  Either that, or most of the file got memory mapped, (multiple times?)  and it uses buffer space.  I don't really know what layer actually uses that memory, but apparently it's important.

So, I think the crypto stuff worked fine, but I couldn't verify it, so I just gave up.  If someone actually has access to my swap partition I'm pretty much pwned anyway. It's not worth the instability.

So the moral of the story?  Don't do stupid things on a massive scale.

Unix-isms

I recently read "The UNIX-HATERS Handbook" (which I highly recommend), and learned about a bunch of the deficiencies of Unix.  To be fair, I figured I'd check out the other side, so I found a copy of the "Unix (R) System V Release 4.2 Documentation" in my local library.  The following is taken specifically the "Programming with Unix System Calls" volume.


A few gems:
  • The difference between /sbin and /usr/sbin
    • "The /sbin directory contains executables used in the booting process and in manual recovery from a system failure."
    • "/usr/sbin: This directory contains executables used for system administration."
    • Reading between the lines here,  I think /sbin used to be mounted on a separate partition, for when your file system or disk failed.  There's actually a lot of weird directory structure Unix-isms that hark back to either the lack of stable file systems (journalizing was a revolution in disk stability), or the fact that Unix used to handle running out of disk space particularly poorly (and still does as far as I can tell).  But it's still better than what it does when it runs out of memory.  One of the great testaments to the infrastructure that is Unix, is that killing arbitrary processes was the most stable option they had when running out of memory.  But I digress...
    • /var is the directory for "files and directories that vary from machine to machine."
        • I used to think Unix was a fairly secure system until I read the UNIX-HATERS Handbook, (granted, the adoption of ACLs and the invention of SELinux have improved things slightly).  I guess back in the day (you know, when Unix was being invented, back in the '70s and '80s) the big security paradigm was "Trusted Computing."  The idea being, you only gave privilege to a set of "trusted software" which was thoroughly audited.  This sounds great, assuming you can actually perform these kinds of audits reliably.  And it certainly can't hurt to try to enumerate what you trust and to what extent.  But, even assuming a lack of malice in your trusted set, one bug in something privileged is enough to do you in.  Especially in Unix, where there is "one root to rule them all," and anything installed with setuid root has the potential to open a root shell and do whatever it wants.  So it is telling that the entire Appendix devoted to security is called "Guidelines for Writing Trusted Software," which I can summarize as "Don't give things root access."  A neat trick from the UNIX-HATERS Handbook is to mount a floppy with a setuid binary...
        • These guys seemed to worship I/O Streams.  Granted, they were new, and kinda neat back then, but come on.  Once  you move out of the realm of tape-drives and modems, things get kinda hairy even for things like terminals and, heaven forbid, files.  I quote from the section on file access, "A file is an ordered set of bytes of data on a I/O-device.  The size of the file on input is determined by an end-of-file condition dependent on device-specific characteristics.  The size of a regular-file is determined by the position and number of bytes written on it, no predetermination of the size of a file is necessary or possible."  Okay, I'll grant that its nice not to have to worry about file sizes, but making it impossible?  Some input streams have finite length, and sometimes you'd like output streams to have a finite length.  Being proud of the fact that you support neither, seems mildly crazy.  They're also quite proud of the fact that in Unix, files are "nothing more than a stream of bytes" (paraphrased, I couldn't find the exact quote), and that Unix imposes no record structures.  (Old school operating systems used to make you read or write to your file in X-byte chunks (records) only, because they were stored on disk this way.)  Again, it's great that they have the capability for arbitrary file structures, but (and correct me if I'm ignorant), I thought that record-structures were imposed as an optimization, for when you were doing record-oriented I/O.  Otherwise, the OS can split and fragment your file wherever it feels like, instead of at a record boundary.  But maybe advances in file systems and disk-drives have rendered this point moot.  I'm too ignorant to say.

          Those were the highlights.  It's amazing to think how far computers have come in 30 years.

          20100426

          Amarok wins again...

          So apparently, all you have to do to horribly break amarok, is embed one of these little guys in a file name "�". If you can't see it, don't worry, it's some weird unicode character, byte value 0xc281 (UTF-8) better described here. Granted, it's a weird character to have in a filename, but hey, that's what robustness is all about. So, what does amarok do, when it encounters such a character?

          Does it:
          (a) Fail gracefully
          (b) Fail silently and continue as if nothing happened
          (c) Corrupt your database
          (d) Wedge your CPU
          (e) Spam you with error messages, and then hang
          (f) All of the above (except a)

          If you answered  (f), you're right!!

          So once again, I have managed to reset all of my song ratings and preference data.  "Again?"  I hear you cry.  Well, when I upgraded from Amarok 1.4 to the "New and Improved*" Amarok 2.0 (*The developers admit, that while the UI was new and shiny, there are several major features that were present in 1.4, that were not yet written in Amarok 2.0.  They weren't really ready for a release), it decided to store all of its configuration data in a new location ~/.kde3 instead of ~/.kde4.  (Note: This may be Gentoo's fault.)  So I lost my metatag database.  "But," I hear you cry again, "there was an import tool!"  Ah yes, but unfortunately, it was hidden deep inside the configuration menu, so I didn't find it until the week after I deleted my old database (after leaving it around for months).

          Now, I use amarok, because quite honestly I couldn't find a better music player for linux (and I looked really hard).  The feature that finally sold me on it was the ability to filter my playlist based on search terms.  At least, that used to work... Somewhere in the process of upgrading from 2.0 to 2.3.0 it stopped working.  Maybe it's time for me to look around again....  I would've kept my old version, but, Gentoo made it increasingly difficult to retain kde3, and I finally had to give it up.

          Let me see if I can summarize some of it's wonderful features for you, and compare how it's improved.  (Granted, I'm probably being harsher than usual, seeing as I'm in a bad mood, because it just trashed my database!)

          Features:
          • Playlist Filtering!
            • Amarok 1.4: It worked!
            • Amarok 2.0: It worked! (although a search is accompanied by no less than a minute of the program stalling and using 100% of a CPU....I think it's sorting things?)
            • Amarok 2.3: It still stalls, but the playlist no longer filters.  (at least on my Gentoo setup)
          • Song Queueing:  Queue songs at the beginning of your random play list!
            • Amarok 1.4: It worked!
            • Amarok 2.x: If you're listening to a song, and you queue some more, it goes back and repeats whatever you were listening to before you queued anything... (I am assured that this is fixed in the development sources, but 2 gentoo releases later I have yet to see any results.  Maybe Gentoo is behind...)
          • Queue Management!
            • Amarok 1.4: You had a nice window that displayed your queue.  My favorite menu option was "Toggle Queued Status."  If you selected things where some were already in your queue, and some weren't, you could "Toggle Queued Status" on them.  No "Enqueue All" no "Dequeue All," just "Toggle Queued Status" ...
            • Amarok 2.x: The window has not yet been added.
          • Fast Startup!
            • Amarok 1.4: About a minute, CPU bound, on my 3.0 Ghz Phenom II
            • Amarok 2.0: Still a minute!
            • Amarok 2.3: Only a few seconds!  ( I have to admit there was a definite improvement).  It will still hang on major operations though (like, adding or removing single tracks from the play list... I think it's sorting things?)
            • This isn't really a fair comparison, apparently you're not supposed to use "large playlists" because they have "problems."  I like to load my entire collection into my playlist so I can filter it, and pick random songs.  They have this nifty little "dynamic playlist" that will randomly pick songs for you, but then there's no way to search your collection.  I'm told it's very fast for small playlists...
          • XML Playlists!
            • Apparently, someone decided it'd be cool to dump your playlist information in XML format.  Which sounds great, except it stores your current playlist this way too.  Which sounds great, except it's loading and storing this from disk every time it modifies something.  Which sounds great, until you realize how big XML is.  And it saves backup copies.  So every time I modify my playlist, it dumps a 5MB text file to the disk.  I think marshalling and unmarshalling this might be the source of my 30-60 second pause when I add a track to my playlist.  Or maybe it's sorting things.... (It's hard to tell.  Updating a single track's metadata gets the same pause...) Then it rotates the 10 backup copies of the playlist in case I ever wanna "undo."  I kid you not, though Amarok 2.x seems to have removed the backup copies... or maybe it just moved them somewhere weird.  I had to special case this in my backup script, because amarok 1.4 was generating over 200 MB of updated files every time I ran rsync.
          • Magnatune Integration!
            • I'm gonna be honest, I have no idea what this is, (some kinda online store?), but they pay people to make it work.  So I'm told it works very well.
          • Collection Management!
            • Amarok 1.4:They had some support for this in a side bar.
            • Amarok 2.x: Yeah, not so much.
          • Delete files from the disk!
            • I actually use this all the time.  I downloaded most of ocremix at one point, and have been going through and deleting stuff as it comes up and sucks.
            • Amarok 1.4: It worked!
            • Amarok 2.0: It worked!
            • Amarok 2.3: You can no longer access this from your play list.  You have to go find the track in their excuse of a Content Manager called "Media Sources", sorted by Genre.
          • Weighted, Randomized Playlists!
            • Amarok 1.4: It worked!  And was easy to find.
            • Amarok 2.0: They decided that having a menu option was too confusing.  Instead, it's hidden under a button at the bottom of the screen that took me 2 weeks to find (and I'm not the only one).  I've just been trained to ignore their random buttons.
          • Customized Sorting Ability!
            • Amarok 1.4:  Their sort operation was stable (preserves existing order), so you can fake this yourself by clicking on the columns you want in the reverse order.  Again, I got pauses of a minute to do any of this.
            • Amarok 2.x: They added this, but didn't make it savable (unlike everything else in the UI).  So it's an improvement, but if I switch to sort by ratings for a minute, I have to rebuild my 8-layer sorting preference after.  Still pauses for about a minute.
          • Cover support!
            • Amarok 1.4: If you had a cover available, it would display it.  There was also a the ability to import covers from amazon and let you pick the right one.
            • Amarok 2.x: If you don't have a cover, it displays this ugly blank CD icon.  You can import covers from amazon, but only for everything at once.  And you don't get to pick.
          • On Screen Display!
            • Amarok 1.4: Cool blue, rounded box.  I think it had transparency support (but maybe I'm dreaming of imaginary better days).
            • Amarok 2.x: Ugly Gray Rectangle (TM)  No apparent transparency support.  You have the ability to "Use Custom Colors!" (ie, text only)
          • Customizable Layout!
            • Amarok 1.4: The layout was pretty static.
            • Amarok 2.x:  Everything moves.  It took me several hours over 3 weeks to figure out how to duplicate the setup they had in Amarok 1.4.  It's still not perfect.  Several nested layers deep you can control font sizes on the display... by guessing  the field width?  Random songs have an indented album name, no idea why.  (Like, you see "Final Fantasy 8 OST" going down the column, and then suddenly it's all "    Final Fantasy 8 OST").  Also, I somehow managed to turn "Open Script Console at Startup" on (no apparent menu option), and couldn't turn it off for months.  (Apparently it saves the "you had it open" but not the "you closed it.")  I finally nuked my configuration directory and it went away.  Back to customizing the UI.
          • Automatically Find Content on your Disk!
            • Amarok 1.4:  They had two buttons for "Update Collection" and "Rescan Collection". I still have no idea what the difference is.
            • Amarok 2.x: Now they just have "Update."  However, there is no way to remove anything from your collection.  So I had to move some of my songs outside my music folder, so that they would never play again (like my "Learning French" CDs).

          Welcome to the Music Player of the Future!   I could keep going, but my rage is fading.  I really can't find a better player for Linux though... so I guess I gotta cut them some slack.  This is free software after all, most of them are volunteers (give or take the magnatune guys).

          In the true spirit of Open Source, I actually tried to get in and fix some of this myself.  Repeatedly (after getting particularly annoyed about something or another).  But there was so much weird QT interaction that I couldn't figure anything out.  I spent several hours trying to figure out how (or if) the weighted random tracks feature worked, only to conclude that if the button did anything at all, it was doing the randomization in QT itself (or some arcane callback hidden somewhere).

          Oh well, back to re-rating everything by hand....

          20100415

          man gcc

          To whom it may concern:

          gcc -O3 does NOT enable -funroll-loops, and has not done so since at least version 2.95.3 (released March 16, 2001). I don't know if it used to back in the olden days (like last millennium), but it doesn't now.

          gcc 2.95.3:
          -O3
          Optimize yet more. `-O3' turns on all optimizations specified by `-O2' and also turns on the `inline-functions' option.

          gcc 3.04
          -O3
          Optimize yet more. `-O3' turns on all optimizations specified by `-O2' and also turns on the `-finline-functions' and `-frename-registers' options.

          gcc 3.1.1
          -O3
          Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options.

          gcc 3.2.3
          -O3
          Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options.

          gcc 3.3.6
          -O3
          Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options.

          gcc 3.4.6
          -O3
          Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -fweb, -frename-registers and -funswitch-loops options.

          gcc 4.0.4
          -O3
          Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops and -fgcse-after-reload options.

          gcc 4.1.2
          -O3
          Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops and -fgcse-after-reload options.

          gcc 4.4.3
          -O3
          Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize options.

          gcc 4.5.0
          -O3
          Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize options.

          This has been a public service announcement regarding gcc.
          Thank you for your time. You may return to your regularly scheduled blogging.

          20100401

          More optimizations...

          As half a followup to my last post, I just tried to recompile all of the C/C++ code I've written on my system (and a few small projects I didn't) with -O3 -ftree-vectorizer-verbose=2 to see what it could actually vectorize.

          The results were rather dismal. It optimized one loop that looked like this:

          for(int x=0;x<mySize;x++) {
              t[x] = myItems[x];
          }

          And a bunch that were constant initialization like this:

          for(int x=0;x<mySize;x++) {
              t[x] = 0;
          }

          There were also a couple more in my BitArray that had errors, but looked like I might be able to rewrite them to be vectorizable. Granted, you spend a lot of time in loops, so a few optimizations go a long way, but overall it looks like it has a hard time working with most code.

          20100331

          Optimizations and CFLAGS in an ideal world...

          I was messing around with my make.conf today, and decided to play with my CFLAGS again. Now, I realize that realistically, there are some seriously diminishing returns for turning on additional compiler flags on a system-wide basis. Even with a single package, if you don't know what you're doing you can make things worse, and if you do know what you're doing, you might just be able to spend a few hours to squeeze out that extra 1% reduction in execution time. Playing with your whole system is likely to make some things marginally better and others marginally worse. And, when all is said and done, you're probably never going to notice anyway, because amarok still takes several minutes to filter your playlist. (Amarok and I have a love-hate relationship, but that's another rant.)

          But, (like many Gentoo users) I twiddle with them anyway. Partly because it's fun learning about weird compiler optimizations, but truth be told, I'm also lured by the imaginary speed gains, and the vanity of turning on weird things and knowing what they do.

          So, I was reading the Linux Magazine benchmarks comparing different Gentoo Optimization levels, and there's basically no significant difference from -O2 and -O3 with a couple of weird exceptions.

          One of the UT2004 demos has a 20 FPS gain for -O2 over -O3, but none of the rest of them do. So that's a weird fluke.

          The Dbench filesystem tests show a huge difference between optimization levels, with -O3 losing quite dramatically, with the difference diminishing as the number of clients goes up. This one's kinda concerning, it's definitely not worth setting -O3 if it's going to halve my disk access speed. But (perhaps in my ignorance), I really have no idea how changing your optimization levels will affect disk I/O. That should all be handled in the kernel, which is optimized separately. So I'm gonna assume that's some kinda weird behavior in the test harness, because I dunno anything about Dbench.

          The image processing tests show a predictable bonus for -O3, since that's the kind of CPU bound task that would benefit from vectorization and inlining.

          And last but not least, is the GtkPerf results for "AddText," where -O3 takes half as long as -O2. In my ignorance, I would guess that the gain is from inlining functions, since from what little gtk programming I've done it seems like there's tons of small functions you call for everything. But I really have no idea. All the AddText test does is keep adding short text fragments into a scroll box, so maybe there's a copy loop that gets optimized somewhere, I dunno. But, I know how to run GtkPerf, so I decided to check this out.

          Turns out, because of Gentoo Bug #133469, gtk filters your cflags, and doesn't even distinguish between -O3 and -O2. Since the bug was dealt with in 2006, (and they're still filtering today), and the benchmark was done sometime in 2009, I'm going to assume that they were using the standard Portage tree and were running the same gtk library in both tests. So any speed gains they saw must've been some underlying system library they were calling into. So, I'd put my money on some string copy function which runs about twice as fast under -O3.

          Now, the bug report was describing some pretty catastrophic failures, (and I'm sure there were), but after hacking the ebuild to use -O3, and playing with my system, I haven't seen anything crash yet. So why is it still filtered? Well, since Gentoo actually lets users set their own CFLAGS, a bunch of crazy idiots like me decide to go through and turn on whatever they can turn on for kicks, speed and glory, and then complain and file bugs when stuff breaks. They honestly don't want to deal with that, and I can't blame them. So their general policy (as far as I understand it) is to ignore any issues where people have something other than "-march=native -O2" set. And I can't fault them there, especially when most of them are volunteers, and there are lots of idiots who want to muck with CFLAGS that break things. And granted, you can't test every gcc release to see if it stops some random, unpredictable crash. But I've seen things like this come up often enough that I wanna put in my two cents.

          First off, a lot of these issues come up because there's really no way to set package specific CFLAGS. The people who are seriously about optimizing, want to set different weird flags for each program, or at very least set all their weird crazy flags, and then filter them for package they know break. (Like this guy...) Yes, I know a couple people have hacked up their own techniques to do this, but they're not integrated into Portage very well, and they're all shunned by the developers. Personally, I don't think this problem is going away. There's always going to be people (crazy or guru) who want to set custom flags for custom packages, to fix known bugs if nothing else. I was once unable to compile a working gmp with my arch setting because of some weird bug, and at that point all you can do is manually adjust the flags yourself to emerge gmp, and then put them back afterwards. Granted that works, but it kinda sucks. And the more packages you want to do that with the more it sucks. I think they should just stick a package.cflags into portage, the same way package.use works now, and let people trash their system if they're retarded. That also has the advantage of letting them put all the cflag filtering that they do into one file, the way they manage package.mask.

          But what's a sensible cflags policy? Well, let me describe the imaginary, ideal world first. Ideally, there's really only 3 kinds of optimizations (for whatever you're looking for, speed, size, etc). Optimizations that always work, optimizations that sometimes work, and optimizations that make assumptions in order to work.

          Things that always work, are reasonable for any user to turn on, and make their own cost-benefit determination of compile-time/running-time/memory. Setting gcc -O2 turns on more or less things that always work, and anything that breaks with these settings is a bug.

          Optimizations that sometimes work, include things like loop-unrolling, and is reasonable for someone to turn on if they know what they're doing. These might actually hurt your performance, but they shouldn't actually break anything, or it's a bug. Enabling gcc -O3 turns on some things in this category like inline-functions. In my opinion, users shouldn't be protected from their own ignorance if they're slowing their system down by setting things beyond their competence level. But if these actually break things, you've probably hit a compiler bug, and to me it seems reasonable to filter them out with a warning message.

          Finally optimizations that work with assumptions, are the ones you shouldn't set without knowing what you're doing, and should (probably) never set them system wide. These are things like disabling exception handling in C++, or enabling fast-math. It might help under certain circumstances, but it will break if you don't satisfy the assumptions (like you're using exceptions). Anyone using these system-wide is probably stupid, or at very least they're trying to do something weird on their own.

          So ideally, you should filter out everything known to break because of a compiler bug, temporarily, until the compiler can be fixed, and then allow them again. But, we don't live in an ideal world. Compiler bugs are hard to find, hard to prove, and hard to fix. And the crazier the optimization, the harder it is to verify.

          So what do you do in the real world?

          First, if Portage had a package.cflags, anyone who wants to do turn on weird optimizations globally can have their own place to fix it, without filing a bunch of bugs demanding ebuild filtering for everyone. And, you now have a place to store package-specific performance information. If somebody on their own wants to publish a set of "known fastest CFLAGS for this arch" by package, they can.

          Second, I think that a policy of "if you're not using -O2, we don't care" is a bit much. I think it's fair to designate -O2 as stable, and -O3 as unstable, but if there are known issues with -O3 where things actually break, it seems like you should go ahead and filter the flags. So in other words, you don't verify that anything works at -O3, but if its clear something doesn't, then go ahead and make it work for everyone else. It's not that hard, and things really shouldn't be breaking at -O3 (on a theoretical level anyway). But any weird flags making assumptions about math or runtime or whatever, it's fair to ignore, because if you were guru enough to have a reason to need that flag on, you can filter it out yourself in a package.cflags, and if you weren't that guru, you should probably turn it off. And I'm only talking genuine, verifiable crashes or errors at -O3. Slowing your system down is your problem, and if -O3 is designated "unstable," I think it's fair to let other people find and verify the problems as they come up.

          Of course, I say -O3 "shouldn't" break things, but I really have no idea how risky it is, or how often bugs come up. I have random parts of my system compiled at -O3 and haven't noticed any problems, but I'm just one guy with my weird setup. If Gentoo decided to start ignoring -O3 crashes because there were too many for them to handle otherwise, I can understand. But if not, it seems like they're worth noting and fixing in the ebuilds when they come up. For the record, I think Gentoo does an excellent job on the whole, and it's really probably not worth a developer's time to chase down weird CFLAG bugs people report when they could be fixing other things. I also totally agreed with the rationale behind filtering gtk+, though if the problems are gone, it might be time to remove the filtering (but I can't tell that from my setup).


          So when all is said and done, I went and ran gtkperf with just my gtk+ optimized differently. The final results?

          No optimizations: Total time: 76.99
          -O2: Total time: 74.25
          -O3: Total time: 74.02

          There's really no difference.

          20100212

          My XFCE Setup

          I use XFCE for my desktop environment in Linux, and I think it strikes a nice balance between minimalism and functionality.  You can add lots of crazy panels and menus if you want, but you don't have to.  Personally, I find that the crazier, more "user friendly" setups of KDE or Gnome provide way more interface than I need, and are just annoying.  I can launch anything I want by typing it into the terminal, so I don't need to navigate pages and pages of GUI to find something when I already know what it's called.

          My setup has a floating taskbar which gives me quick access to the stuff I want to use quickly (like Firefox, Thunderbird, gedit...), but I can still pull up a menu to dig around if I need to find a particularly weird program (which is rare, I pretty much do everything in a terminal).

          The coolest thing about my setup (at least I think), is that I've got a huge list of wallpapers loaded up, that change on a cron job every half hour. Here's the relevant shell code:

          Dump the current wallpaper image:
          xprop -root | grep 'XFDESKTOP_IMAGE_FILE_0(STRING)' | cut -f 2 -d\"

          Reload a new image off the list:
          xfdesktop -reload
          (Note that to do this in a cron job, you may need to set the DISPLAY and XAUTHORITY right.)

          I've got another set of scripts that removes the current wallpaper from the list (just using text manipulations on the file), so I can remove the one's I don't like. I'm thinking about auto generating the lists using everything in my "wallpapers" folder, but so far it's been easier to use find and then merge them by hand.

          So that was the easy part. The hard part was getting xfdesktop to do something intelligent with it's "auto" setting for stretching/tiling them. In the defense of XFCE, I wanted some weird rules for my distorted aspect-ratio LCD screen. The xfdesktop defaults are to zoom anything that's not tile-able, which makes more sense if the aspect ratios match standard wallpaper sizes. The only really wonky thing they do is zoom in on the upper square of particularly bad aspect-ratio images, but I suppose normal people don't have manga pages and raw hubble footage in their rotation.

          So to set that up I had to hack XFCE (see my previous post for getting this to work nice with Gentoo). I'm happy to report that it wasn't too hard. The xfdesktop code I looked through was pretty easy to read.

          Basically I made it tile slightly more things, and then not distort things too much. (The specific numbers were chosen by looking at how the standard wallpaper resolutions looked on my monitor.) Here's my final patch:
          diff --git a/src/xfce-backdrop.c b/src/xfce-backdrop.c
          index 486a796..a028f3f 100644
          --- a/src/xfce-backdrop.c
          +++ b/src/xfce-backdrop.c
          @@ -805,17 +805,34 @@ xfce_backdrop_get_pixbuf(XfceBackdrop *backdrop)
               }
               
               if(backdrop->priv->image_style == XFCE_BACKDROP_IMAGE_AUTO) {
          -        if(ih <= h / 2 && iw <= w / 2)
          +        xscale = (gdouble)w / iw;
          +     yscale = (gdouble)h / ih;
          +     gdouble sDiff = xscale - yscale;
          +     if(sDiff < 0) {
          +      sDiff = -sDiff;
          +     }
          +     
          +     int xTile = (ih <= h/2);
          +     int yTile = (iw <= w/2);
          +        
          +        if(xTile && yTile) {
                       istyle = XFCE_BACKDROP_IMAGE_TILED;
          -        else
          -            istyle = XFCE_BACKDROP_IMAGE_ZOOMED;
          -    } else
          +        } else if ((xTile || yTile) && sDiff < 0.36) {
          +         istyle = XFCE_BACKDROP_IMAGE_TILED;
          +        } else if ( sDiff < 0.30) {
          +            istyle = XFCE_BACKDROP_IMAGE_STRETCHED;
          +        } else {
          +         istyle = XFCE_BACKDROP_IMAGE_SCALED;
          +        }
          +    } else {
                   istyle = backdrop->priv->image_style;
          +    }
               
               /* if the image is the same as the screen size, there's no reason to do
                * any scaling at all */
          -    if(w == iw && h == ih)
          +    if(w == iw && h == ih) {
                   istyle = XFCE_BACKDROP_IMAGE_CENTERED;
          +    }
               
               /* if we don't need to do any scaling, don't do any interpolation.  this
                * fixes a problem where hyper/bilinear filtering causes blurriness in
          

          Now all I have to do is go through my wallpapers and figure out which ones I really want in my rotation...