16th Nov, 2007

How to steal images off Imdb

I wrote a small movie database web application some time ago that uses page scraping to download movie data from imdb so as to categorize the movies and make it easier for me to find a good movie that I have not yet watched. If you are interested how I did it I wrote a post about it here. This is what the application looks like.

imdbimage.jpg

What I did not explain in the above post is how I managed to download Imdb’s thumbnail on my site and bypass their cross site scripting protection. If you try referencing an image as following on your public website e.g. <img src=”http://ia.imdb.com/media/imdb/01/I/40/67/11/10m.jpg”> it will not work because when you request an image, imdb tries to write a cookie and if your page is not on their domain it will fail due to cross site scripting and the image will show as broken.

Bypassing this protection was easy as pie. I created a new aspx page that makes an http request to the image to get an http response stream and load it as an Image using Image.FromStream method. Then I simply wrote the stream to the OuputSteam of the page and changed the ContentType to “image/jpeg”;

In the page load I had this code which called into the business layer to get the image stream.

byte[] bytes = HttpHelper.GetImdbImage(url);
Response.OutputStream.Write(bytes, 0, bytes.Length);
Response.ContentType = “image/jpeg”;

public static byte[] GetImdbImage(string url)
{
HttpWebRequest webRequest =
(HttpWebRequest)WebRequest.Create(url);
HttpWebResponse webResponse =
(HttpWebResponse)webRequest.GetResponse();
return getImageBytes(System.Drawing.Image.FromStream(webResponse.GetResponseStream()));
}

Have you ever been to Malta? All you need to know is a click away! Malta Travel Guide, Bargain Accommodation in Malta, Malta Hotels

Responses

I’ve been wondering about it for awhile now :D
Anyway, you forgot the getImageBytes function:

private static byte[] getImageBytes(System.Drawing.Image image)
{
MemoryStream ms = new MemoryStream();
image.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
return ms.ToArray();
}

Otherwise, it works! Thanks a bunch!

How do you obtain the url for the movie image you want? I didn’t see it readily available in the imdb interface.

Thanks

Hello carlo
You really did a great job, I was trying to achive the same thing (capture image from IMDB) but in vain.

I developpe a phpbb3 mod that shows current box office listings, I used an rss feed from yahoo, then when you click on the title it generates a link to imdb movie page.

now i want to include thumbnails for titles, i succeded to get the image link but when i want to show it, it doesnt work, unless i type the link in a page directly.

I work under php, and would like to heve the same code as yours but in php (i’m dummie in asp)

her’s the link to that mod
http://forum.numediastudio.com/boxoffice.php

I succeded now to show it up using php
$ch = curl_init($_REQUEST['imdb_image_link']);
if (! $ch) {
die( “Cannot allocate a new PHP-CURL handle” );
}

// We’ll be returning this transfer, and the data is binary
// so we don’t want to NULL terminate
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);

// Grab the jpg and save the contents in the $data variable
$_REQUEST['imdb_image_link'] = curl_exec($ch);

// close the connection
curl_close($ch);

// Set the header to type image/jpeg, since that’s what we’re
// displaying
header(”Content-type: image/jpeg”);

// Output the image
print( $_REQUEST['imdb_image_link'] );

Hi Abdessamad,

How did you manage to get the rss feed and display it in your forum? I’m trying to create a page in phpBB3 that shows feeds from a number of blogs.

Thanks

I found a technique over here that works

Leave a response

Your response:

Categories