Information overload

In the last couple of months, I've added at least a hundred feeds to my aggregator. At the same time I've suffered from information overload in my e-mail for years, to the point where I now mostly just scan the subjects of my main mailbox and direct anybody I actually want to make sure I answer quickly to a variation of other addresses depending on context.

The problem is I want information. I'm an information junkie.

The problem is that most of what I receive is noise. 99.9% of the e-mail I receive (AFTER having removed the spam) is unimportant, and at least 95% of it is uninteresting.

The issue is bad filtering.

"Good filtering" is still a big problem, and one that can not be solved exclusively with Bayesian filtering.

Why?

One of the key problems I'm facing in managing my information flow is that my connections to people are often transient, and increasingly so - I might exchange 2-3 messages based on answers on a blog and it might take a year before I talk to that same person again unless we're generally interested by the same things. The incoming message probably won't be junked by my spam filter, but that's not the problem - the problem is determining whether or not a message is more important than another.

Another issue is classification: I strict hierarchy works for a lot of tasks - for instance I use a deeply nested folder hiearchy that is very static at work for managing my projects. However at home that doesn't work, as my interested change rapidly with regards to more peripheral areas. I do, for instance find OpenGL programming fascinating, but it's not something that's important enough for me to want to spend time on regularly, so there will be occasional bursts. At the same time, some messages about OpenGL will always be interesting because the content have other important factors (i.e. some new development, or someone have made a great new rewrite of "Elite", one of my all time favourite games).

The problem is that many of these interests are also transient enough that I can't be expected to train a Bayesian filter to recognise them on word frequencies alone - I need more context.

I want a neural network and a set of agents to monitor my general activity and build a profile of me.

I want a system that will sort my mail not just based on my mail, but based on the fact that I wrote an entry about a similar topic on my blog a week ago, or that one of my (unpublished, unmentioned) programs contains comments that seems similar, or that I've written a (private) diary entry about it, or that the person writing it happens to be the operator of a website I spend a lot of time on, or the owner of a blog I read regularly.

I want the same for my RSS aggregation - it needs to be powered by an engine that knows me.

Yes, I'm difficult, and I realise I'll probably have to settle for less for a long time.

Information overload 2005-04-21