Code This

std::cout <<me.ramble() <<std::endl;

Picasa Photo Scraper Using GData

with 2 comments

I took a trip to Japan and China about a year and a half ago.  I normally don’t take pictures (I don’t even own a digital camera!), but I have a friend who took over 1000.  Of course, once we got back, I wanted them.  The problem was he had uploaded them to Picasa and then gotten rid of them.  So I was stuck trying to get over 1000 pictures from Picasa, which I obviously didn’t want to do by hand.  I wasn’t familiar with the software, but I assumed you could not just download all the pictures in an album in one shot (which I never bothered to verify).  Instead, a friend had informed me about the Google Data APIs (GData), which he was using to display photos from Picasa on another website.  This sounded like exactly what I was looking for.  You can actually use GData to interface with all of the different data hosting services Google provides; it’s very handy.  The homepage for the project has the relevant downloads, as well as some documentation.  You can use GData from within a variety of languages, including .NET.  C# just happens to be my “quick ‘n dirty” language of choice, so I was all set.

At first it was difficult finding a good quick start guide, or something to get me up and running.  Modeling the URLs correctly to get the data you want isn’t exactly intuitive.  This situation may have improved since I wrote this little application, which was over a year ago.  I’m going to go through the code I wrote to scrape the pictures out of Picasa.  I will assume you have knowledge of C#.  This app was intended to be written quickly for a specific purpose. All data is hard coded. My knowledge of GData is limited to what is contained in this app, since I haven’t had a need for it since.  In order to get started, you need to download and install GData from the project page mentioned above.

Here is the code:

using System;
using System.Collections.Generic;
using System.Text;
using System.Net;
using System.IO;
using Google.GData.Client;
using Google.GData.Photos;

namespace PicasaPhotoFetcher
{
   class Program
   {
      // Modify the following data according to your needs.
      // This array contains the folder names that will be created for each album.
      private static readonly string[] Albums = {
         "Your album name here"
      };
      // This array contains the URLs for the data feed for each album.
      private static readonly string[] Urls = {
         "http://picasaweb.google.com/data/feed/api/user/[USER]/album/[ALBUM]?kind=photo"
      };
      // This is the absolute path to the folder where the albums will be stored.
      private static readonly string DownloadPath = "C:\\My Picasa Photos\\";

      // You shouldn't need to modify beyond this point.
      static void Main(string[] args)
      {
         PicasaService photoService = new PicasaService("Picasa");

         for (int i = 0; i < Albums.Length; i++) {
            FeedQuery photosQuery = new FeedQuery(Urls[i]);
            PicasaFeed albumFeed = photoService.Query(photosQuery) as PicasaFeed;
            DownloadAllPhotos(Albums[i], albumFeed.Entries);
            Console.WriteLine();
         }
      }

      static void DownloadAllPhotos(string albumName, AtomEntryCollection photoList)
      {
         DirectoryInfo dirInfo = Directory.CreateDirectory(DownloadPath + albumName);

         int photoNum = 1;
         foreach (AtomEntry photo in photoList) {
            Console.SetCursorPosition(0, Console.CursorTop);
            Console.Write("Fetching image {0} of {1}...", photoNum, photoList.Count);

            HttpWebRequest photoRequest = WebRequest.Create(photo.Content.AbsoluteUri +
               "?imgmax=800") as HttpWebRequest;
            HttpWebResponse photoResponse = photoRequest.GetResponse() as
               HttpWebResponse;

            BufferedStream bufferedStream = new BufferedStream(
               photoResponse.GetResponseStream(), 1024);
            BinaryReader reader = new BinaryReader(bufferedStream);

            FileStream imgOut = File.Create(dirInfo.FullName + "\\image" +
               photoNum++ + ".jpg");
            BinaryWriter writer = new BinaryWriter(imgOut);

            int bytesRead = 1;
            byte[] buffer = new byte[1024];
            while (bytesRead > 0) {
               bytesRead = reader.Read(buffer, 0, buffer.Length);
               writer.Write(buffer, 0, bytesRead);
            }
         }
      }
   }
}

 
In order to build this code, you will have to add references to the “Google Data API Core Library” and the “Google Data API Picasa Library” to your project.

As you can see, the class starts out with 3 data members that contain all of the configuration data required by the application.  You can add as many albums as you want to the string arrays.  Each index in the first array corresponds to the equivalent index in the second array (the arrays should have the same # of indexes).  The first array contains the name of the folders that will be created on the disk for each album.  The second array contains the actual URL sent to Google to get the album contents.  There are a few things to note here.  Obviously, the [USER] and [ALBUM] markers should be replaced with your username and album.  Also, if the album is private but has an auth key, you can add the key to url as a parameter:

&authkey=[KEY]

Where [KEY] is your auth key.  I believe you can also pass your username and password in the URL, but I am not sure of the parameter names for that off hand.

There is nothing too fancy going on here.  We are essentially reading the data of each image in 1024 byte chunks and dumping those chunks to the image file.  One additional thing to note is that we are appending the parameter:

&imgmax=800

To the photo URL.  This limits the maximum dimension of the image to 800 pixels.  The aspect ratio is maintained when the image is scaled.  If you do not want to scale your images automatically, then you can simply remove this piece of the code.

My colleague Johnathan has written a post discussing how to retrieve Picassa content with PHP and has also written a WordPress plugin for this purpose. Check them out.

Advertisements

Written by Kris Wong

November 18, 2008 at 10:20 am

2 Responses

Subscribe to comments with RSS.

  1. Hi,

    There is a fairly extensive Developer’s Guide that details how to do authentication as well:

    http://code.google.com/apis/picasaweb/developers_guide_dotnet.html

    I ended up writing my own WordPress plugin last year:

    http://code.google.com/apis/picasaweb/developers_guide_dotnet.html

    Cheers,
    -Jeff

    Jeff Fisher

    November 18, 2008 at 1:17 pm

  2. Bah, I fail at clipboard:

    http://code.google.com/p/goldengate/

    Jeff Fisher

    November 18, 2008 at 1:17 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: