Rafael Sanches

October 18, 2009

RSS parsing optimization for bandwidth and processing time with SAX and httpclient – pooling scripts

Filed under: android, maintainability, performance, programming — Tags: , , , — mufumbo @ 3:55 pm

My server was having a constant income traffic of 1.7mb/s for a service that downloads RSS from the net and process them in order to return the last updates of multiple RSS feeds. It’s a very basic pooling system, but it was downloading too much data for just 5000 active users. The growth wasn’t looking very feasible.

I was using the ROME java library to parse the XML. So far so good, the problem was that it downloads the whole feed and process it all. With my application scope I don’t need to download the whole RSS, just the new entries that i didn’t downloaded yet.

The solution was to use a custom RSS parser, looping through the “” tags and identifying “”. In this way i can parse item per item, and identify if the current item is not updated, so I can abort the http connection and stop the download of the feed. I wish that ROME had an option to do that, like “stop processing when date minor than..”.

The impact on bandwidth usage and in processing time were impressive:

If someone is interested I can post and explain the java class. It’s compatible with com.sun.syndication.feed.synd and uses the SyndEntry and SyndFeed interfaces.

May 14, 2008

database replication tools

Filed under: performance, programming — Tags: — mufumbo @ 9:32 pm

Today I was searching about replication architecturesand found a very interesting presentation: Portable Scale-Out Benchmarks for MySQL that refers to GORDA – “Open Replication of Databases“. The following tools are the result of my search on that topic:

ESCADA is a opensource implementation of the GORDA replication server interface. It provides a full range of database replication options across a multiple database management systems, in a single inter-operable and evolutive package. Target application scenarios include:

  • Asynchronous master-slave replication, the no-frills industry standard approach.
  • Consistent multi-master/update everywhere replication for scalable and high performance shared-nothing clusters.
  • Zero data-loss inter-cluster replication over WAN for mission critical applications and disaster recovery.

SEQUOIA is a database cluster middleware that allows any Java application to transparently access a cluster of databases through JDBC. You do not have to modify client applications, application servers or database server software. You just have to ensure that all database accesses are performed through JDBC.

SEQUOIA allows to achieve scalability, high availability and failover for database tiers. It instantiates the concept of Redundant Array of Inexpensive Databases (RAIDb). The database is distributed and replicated among several nodes and SEQUOIA load balance the queries between these nodes. The server can be accessed from a generic JDBC driver, used by the clients. The client drivers forward the SQL requests to the SEQUOIA controller that balances them on a cluster of replicate d databases (reads are load balanced and writes are broadcasted).

Slony-I is a “master to multiple slaves” replication system supporting cascading (e.g. – a node can feed another node which feeds another node…) and failover.

The big picture for the development of Slony-I is that it is a master-slave replication system that includes all features and capabilities needed to replicate large databases to a reasonably limited number of slave systems.

Blog at WordPress.com.