Rafael Sanches

October 18, 2009

RSS parsing optimization for bandwidth and processing time with SAX and httpclient – pooling scripts

Filed under: android, maintainability, performance, programming — Tags: , , , — mufumbo @ 3:55 pm

My server was having a constant income traffic of 1.7mb/s for a service that downloads RSS from the net and process them in order to return the last updates of multiple RSS feeds. It’s a very basic pooling system, but it was downloading too much data for just 5000 active users. The growth wasn’t looking very feasible.

I was using the ROME java library to parse the XML. So far so good, the problem was that it downloads the whole feed and process it all. With my application scope I don’t need to download the whole RSS, just the new entries that i didn’t downloaded yet.

The solution was to use a custom RSS parser, looping through the “” tags and identifying “”. In this way i can parse item per item, and identify if the current item is not updated, so I can abort the http connection and stop the download of the feed. I wish that ROME had an option to do that, like “stop processing when date minor than..”.

The impact on bandwidth usage and in processing time were impressive:

If someone is interested I can post and explain the java class. It’s compatible with com.sun.syndication.feed.synd and uses the SyndEntry and SyndFeed interfaces.

August 5, 2009

reading java-style properties file in PHP

Filed under: caveats, php, programming — Tags: , , , , — mufumbo @ 4:46 am

It’s very strange that PHP only support the “parse_ini_string” as configuration function. I don’t like it at all! It has problems handling quotes, new lines, and other caveats.

The only benefit of parse_ini_string against Java Properties file is that it can handle “arrays”, but I don’t think that’s a benefit anyways. I had some trouble because I was wanting to use properties file in php for translations, since I only found buggy versions on the net I had build my own:

      function parse_properties($txtProperties) {
		$result = array();
		$lines = split("\n", $txtProperties);
		$key = "";
		$isWaitingOtherLine = false;
		foreach ($lines as $i => $line) {
			if (empty($line) || (!$isWaitingOtherLine && strpos($line, "#") === 0))
				continue;

			if (!$isWaitingOtherLine) {
				$key = substr($line, 0, strpos($line, '='));
				$value = substr($line, strpos($line, '=')+1, strlen($line));
			}
			else {
				$value .= $line;
			}	

			/* Check if ends with single '\' */
			if (strrpos($value, "\\") === strlen($value)-strlen("\\")) {
				$value = substr($value,0,strlen($value)-1)."\n";
				$isWaitingOtherLine = true;
			}
			else {
				$isWaitingOtherLine = false;
			}

			$result[$key] = $value;
			unset($lines[$i]);
		}

		return $result;
	}

This function can be used to create a php properties class. It should have the same behavior as the Java properties, so it should handle ” quotes and \ for new lines.

Let me know if it have bugs :)

August 22, 2008

How to correctly make Javascript onclick links and keep in mind SEO and compatibility

Filed under: javascript, seo — Tags: — mufumbo @ 8:44 am

Today I have faced a common problem with web developers: How to correctly make Javascript onclick links keep in mind SEO and compatibility.

My solution was very simple, first of all, never use:

<a href="javascript:void(0)" onclick="some_function('someparameter')">Text</a>

This is the worst way of having javascript links. As discussed in this post this is a known issue of IE and why the use of the javascript pseudo-protocol for the value of HREF attributes is actively discouraged.

Suppose that you have an ajax box with tabs that allows you to refresh the content of the tabs with ajax. What would be the best way of assign onclick events to refresh the content with ajax?

What I did was:

  • When the page is loaded the first tab is loaded in the server side, without ajax. In this way search engines can crawl the contents.
  • The link on the tabs points to the HREF of the current page with a parameter that permits to change the pre-selected tab in the server side when the page is loaded. Ex: ?sec=onlineusers loads the page with the tab of onlineusers pre-selected and ?sec=lastusersers load the page with lastusers pre-selected.
  • The link on the tabs have the event onclick associated with a function and it returns false.

Code speaking:
<a href="?tabselection=onlineusers" onClick="changeTab('onlineusers');return false;">Online Users</a>
<a href="?tabselection=lastusers" onClick="changeTab('lastusers');return false;">Last registered users</a>

What are the advantages of this solution?

  • If the user have javascript disabled he can still have a fully functional website.
  • Search engines and mobile phones can navigate without problem.
  • It is compatible with all browsers.

For those who have already a website running, the simpliest way of having compatibility is to link the HREF to a page like “/noJavaScriptEnabled.html” and return false in the onclick event. Doing like this seems to be better than use the “javascript:void(0)” or point to the anchor “#”.

May 14, 2008

database replication tools

Filed under: performance, programming — Tags: — mufumbo @ 9:32 pm

Today I was searching about replication architecturesand found a very interesting presentation: Portable Scale-Out Benchmarks for MySQL that refers to GORDA – “Open Replication of Databases“. The following tools are the result of my search on that topic:

ESCADA is a opensource implementation of the GORDA replication server interface. It provides a full range of database replication options across a multiple database management systems, in a single inter-operable and evolutive package. Target application scenarios include:

  • Asynchronous master-slave replication, the no-frills industry standard approach.
  • Consistent multi-master/update everywhere replication for scalable and high performance shared-nothing clusters.
  • Zero data-loss inter-cluster replication over WAN for mission critical applications and disaster recovery.

SEQUOIA is a database cluster middleware that allows any Java application to transparently access a cluster of databases through JDBC. You do not have to modify client applications, application servers or database server software. You just have to ensure that all database accesses are performed through JDBC.

SEQUOIA allows to achieve scalability, high availability and failover for database tiers. It instantiates the concept of Redundant Array of Inexpensive Databases (RAIDb). The database is distributed and replicated among several nodes and SEQUOIA load balance the queries between these nodes. The server can be accessed from a generic JDBC driver, used by the clients. The client drivers forward the SQL requests to the SEQUOIA controller that balances them on a cluster of replicate d databases (reads are load balanced and writes are broadcasted).

Slony-I is a “master to multiple slaves” replication system supporting cascading (e.g. – a node can feed another node which feeds another node…) and failover.

The big picture for the development of Slony-I is that it is a master-slave replication system that includes all features and capabilities needed to replicate large databases to a reasonably limited number of slave systems.

May 10, 2008

simple script to merge commits from a bugzilla id

Filed under: maintainability, programming — Tags: , , , , — mufumbo @ 9:15 pm

Today i have made my first PERL script!

For me it is very painful when it arrives the time to merge, into another branch, all the commits that i have done in the “trunk”. I have searched a little and did not find anything that could magically solve all my problems. I know that it’s better to create a separated branch when there are lot’s of commits, but there are some cases that a super-simple functionality can explode into a big ball of mud.

Practically the script merge all the commits of a bugzilla id to another branch. If someone knows a standard way to do this; please tell me!

The script take three inputs:

  1. The starting revision ID to filter the search.
  2. The SVN address of the source.
  3. The search string to filter the results. Here you put your bugzilla bug id.

Commands that are executed when you launch the script:

  1. Go to the directory of the destination branch.
  2. To execute the script simply do:
  3. svn_search_merge.pl 0 https://svn.example.com/main/trunk/ “1: “
  4. Note that “1: ” is the bugzilla bug id. What happens next is:
  5. svn log -r 1:HEAD https://svn.example.com/main/trunk/
  6. With that command we get the log of all commits from the revision 1 to the HEAD. After it’s just matter of check if the string “1: ” is inside the log. Then we simply execute:
  7. svn merge -r (ACTUAL_REVISION-1):ACTUAL_REVISION https://svn.example.com/main/trunk/

Source code of the script:

#!/usr/bin/perl

# Simple script to merge commits from a source branch to the current destination directory.
# http://mufumbo.wordpress.com/2008/05/10/simple-script-to-merge-commits-from-a-bugzilla-id/
#
# Example:
# $ cd my-branch-destination/
# $ svn_search_merge.pl 3000 https://svn.example.com/main/trunk/ "bug 673"
# Where 3000 is the starting revision and "bug 673" is the string to match in the comments.
#
use strict;
use warnings;

my $prev_revision = shift;
my $svnHost = shift;
my $searchStr = shift;

print "Starting Revision: $prev_revision\n";
print "SVN addr: $svnHost\n";
print "Search pattern: $searchStr\n";

my $buffer;
$buffer = `svn log -r $prev_revision:HEAD $svnHost`;
my $shouldContinue = "y";
LOGS: foreach my $changelog_entry (split(/----+/m, $buffer)) {
	if($changelog_entry =~ m/($searchStr)/) {
	        #my (undef, $info, undef, $comment) = split(/\n/, $changelog_entry);
	        #next unless $info =~ m/^r/;

		print "\n--------------------------------------------------";
		print $changelog_entry;
		my $revisionId = substr($changelog_entry, 2, 5);
		$revisionId =~ s/^\s+//;
		$revisionId =~ s/\s+$//;

		if ($shouldContinue ne 'a') {
			PROMPT: while(1) {
				print "\nShould continue with merge of revision '$revisionId'? (Yes,Always,Skip,Exit): ";
				$shouldContinue = <>;
				chomp($shouldContinue);

				last PROMPT if $shouldContinue eq 'y';
				last PROMPT if $shouldContinue eq 'a';
				next LOGS if $shouldContinue eq 's';
				die("User requested to stop.") if $shouldContinue eq 'e';
			}
		}
		else {
			print "\nAuto merging '$revisionId'\n";
		}

		my $pRevisionId = $revisionId-1;
		my $mergeBuffer = `svn merge -r $pRevisionId:$revisionId $svnHost`;
		print $mergeBuffer;
	}
}

Blog at WordPress.com.