My server was having a constant income traffic of 1.7mb/s for a service that download RSS from the internet and process them. Basically it need to return the last updates of multiple RSS feeds. It’s a very basic pooling system, but it was downloading too much data for just 15.000 active users. The growth wasn’t looking very feasible..
I was using the ROME java library to parse the XML. So far so good, the problem was that it downloads the whole feed and process it all. With my application scope I don’t need to download the whole RSS, just the new entries that i didn’t downloaded yet.
The solution was to use a custom SAX RSS parser, looping through the “” tags and identifying “”. In this way i can parse item per item, and identify if the current item is not updated, so I can abort the http connection and stop the download of the feed. I wish that ROME had an option to do that, like “stop processing when ‘publishedDate’ minor than..”.
The impact on bandwidth usage and processing time was impressive:
If someone is interested I can post and explain the java class. It’s compatible with com.sun.syndication.feed.synd and uses the SyndEntry and SyndFeed interfaces.