Posts Tagged ‘regex’

Cleaning Flickr Tags

Sunday, November 2nd, 2008

As part of an application I’m developing, I needed to store tags from multiple sources, and I chose to use Flickr’s method of storing raw and clean tags. I needed to figure out how Flickr converts raw tags to clean ones. This article by Terrell Russell helped a lot, but missed a few elements (and I needed it in Java).

The original regular expression by Russell did not include a comma, and I also found out certain special characters are substituted (I guess I will find more of them as I keep comparing Flickr tags).

public static String cleanRawTag(String raw, boolean isMachineTag)
{
    	if(isMachineTag)
    	{
    		// raw  = geo:lat=13.751193
    		// name = geo:lat=13751193
    		int equals = raw.indexOf('=');
    		return raw.substring(0, equals+1).toLowerCase() + cleanRawTag(raw.substring(equals+1), false);
    	}
    	else
    	{
    		String clean = raw.replaceAll("[s"!@#$%^&*():,-_+='/.;`<>[]?\]", "").toLowerCase();
    		return clean.replace('ß', 's').replace('ς', 'σ');
    	}
}