VisualizeUs official blog: Bookmark pictures that inspire you.

» official blog.   Also at , and tumblr!

back to VisualizeUs »

Fighting against duplicated images

behind the magic — Victor on June 27, 2008 at 10:33 pm
Share

With this post I start a new category “Behind the magic“, where I will be giving some details about the VisualizeUs internals.

One of the first things I noticed when I released VisualizeUs (almost 9 months ago now… woa!), was the duplicity of posted images. Some person found a fancy image in a site, and posted it to her account; when some other found the same image in another web, and posted it too. Both files are the same image, but have different addresses so there’s no easy way to identify them as the same image. That happens a lot when people post from sites like ffffound, flickr, and so on. And in fact, is one of the things I most hate when browsing ffffound as spectator.


Here’s one clear example, just check out the number of reference urls. Without a system to control the duplicates, that would be mean six times the same image repeated.


So… how to deal with it? The approach of a lot of sites with this problem is… non-existant :D No, really, it’s a hard battle to fight, and probably it’s one you will never win (unless you have a lot of money like the Digg guys and can borrow some fancy image recognition technology to deal with it :P). So the most usual approach is “why bother”, which, I should say seems now pretty logic for me.

I still don’t know why I started to dealing with this duplicate issue, but for me it was clear that displaying repeated results wasn’t very good to the spectator (fool of me!). So, I started to mark duplicated pictures, simply based on my own visual memory recall. As you can guess, that was a tough, time-wasting, painful task, and even more, not very productive (although I exercise my visual memory as some sort of Brain Training game!).

After some deep research on different approaches to make it painless and more efficient, I finally came up with an algorithm based on color analysis of the images. It’s not a silver bullet, but it does pretty well the job. Basically, it gets a 3×3 color matrix from each image, and given a threshold, compares it with the rest of images. Unfortunately, it wasn’t that easy and I had to tweak things a lot to make it useful. Things like borders, different crops, texts embedded, different quality files and so on, do not help. Of course it’s not an automated process and it requires human intervention but it helps a lot when finding possible duplicate candidates.

With all that said, don’t think that you won’t find a single image duplicated in the site. I wish, but unfortunately that’s near to impossible. But at least, I hope that minimizing the number of duplicates will help to improve the VisualizeUs experience for all, when browsing and watching tons of pictures. To give you an idea, there are about 70K images posted till now, and among them 50K are “unique” where 20K have been found duplicated or posted from the VisualizeUs “i like it” link.

Share

Playing with web typography

improvements — Victor on June 16, 2008 at 9:26 pm
Share

Lastly I’ve been playing with big typography on web. The truth is I’ve never felt comfortable enough with it. You know, too restricted, too many issues between browsers, and so on. But recently I stumbled on the great series about web typography examples (I, II, III) from the I Love Typography blog, which was really inspiring. Kudos to John for his great blog!

And so, I finally did a pending task for a while, redesign the welcome page for non-registered users, clarifying it a lot and giving more info about the site.

Here is how it looks like now:

Share

One of the few things a domain registrar company should do

issues — Victor on June 4, 2008 at 1:08 pm
Share

What should be? You want to bet? Of course, if the company lives from registring domains, they should do that well. That’s obvious. But… what if the company does not send any mail notification to its customers about expiring domains? What may happen then? I’ll tell you what: a lot of “happy” customers will happen. Just as happy as I am right now, because that’s exactly what happened to me with the vi.sualize.us domain.

You maybe didn’t notice (hopefully a lot of you didn’t notice at all), because of DNS caching, but during some period of time (I estimate around one or two hours) the domain redirects to an ugly parking page. That also applies to VisualizeUs.com because this one redirects to vi.sualize.us. Basically, VisualizeUs was screwed just because a mail wasn’t send as it was supposed to be.

That’s in general a very ugly feeling. The only thing the company have to do was warn me: “hey, your domain is going to expired in a month”. That’s all. But they don’t, so cheers for them. And now I’ve lost my trust on them, because for me they don’t seem to care of their users and that, in my humbly opinion, should be a company’s cornerstone. Funny thing here is their marketing mails arrives periodically (sigh)

So… sorry for this.

Share
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. | theme based on barecity by Shahee Ilyas | VisualizeUs official blog: Bookmark pictures that inspire you.