I wrote a small movie database web application some time ago that uses page scraping to download movie data from imdb so as to categorize the movies and make it easier for me to find a good movie that I have not yet watched. If you are interested how I did it I wrote a post about it here. This is what the application looks like.

What I did not explain in the above post is how I managed to download Imdb’s thumbnail on my site and bypass their cross site scripting protection. If you try referencing an image as following on your public website e.g. <img src=â€http://ia.imdb.com/media/imdb/01/I/40/67/11/10m.jpgâ€> it will not work because when you request an image, imdb tries to write a cookie and if your page is not on their domain it will fail due to cross site scripting and the image will show as broken.
Bypassing this protection was easy as pie. I created a new aspx page that makes an http request to the image to get an http response stream and load it as an Image using Image.FromStream method. Then I simply wrote the stream to the OuputSteam of the page and changed the ContentType to “image/jpeg”;
In the page load I had this code which called into the business layer to get the image stream.
byte[] bytes = HttpHelper.GetImdbImage(url);
Response.OutputStream.Write(bytes, 0, bytes.Length);
Response.ContentType = “image/jpeg”;
public static byte[] GetImdbImage(string url)
{
HttpWebRequest webRequest =
(HttpWebRequest)WebRequest.Create(url);
HttpWebResponse webResponse =
(HttpWebResponse)webRequest.GetResponse();
return getImageBytes(System.Drawing.Image.FromStream(webResponse.GetResponseStream()));
}

