<

Why the Combination of FetchAll and RedirectURL in Google Apps Script is Bad

I had a problem with Google Apps Script (hereinafter referred to as GAS), so I thought I would leave it as a note.

What I tried to do

I was trying to collect the links written in tweets with a specific hashtag. I also wanted to filter these links by a specific domain. I thought about creating this as a RESTful API using GAS, which is easy to create.

What I tried

All links written on Twitter are shortened URLs. Therefore, I needed to access the shortened URL and go to get the redirected URL. In GAS, there is a request method called fetch. By setting the followRedirects option of this fetch to false and taking the location of the responseHeader, you can solve the problem (get the redirected URL).

https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app#advanced-parameters

Also, if you only use fetch for one request, it will be processed in series, which is very slow. By using featchAll, which allows multiple requests at the same time, you can process in parallel and get good performance. In other words, I was thinking of solving it with the following code.

FetchAll and RedirectURL
FetchAll and RedirectURL
let urlList: Array<string> = ["https://t.co/XXXX", "https://t.co/YYYY"];
const locationList: Array<string> = [];
while (true) {
  const requestList: Array<URLFetchRequest> = urlList.map((url: string) => {
    return {
      url: url,
      method: "get",
      followRedirects: false,
      muteHttpExceptions: true,
    };
  });
  const responseList: Array<HTTPResponse> = UrlFetchApp.fetchAll(requestList);
  urlList = [];
  responseList.forEach((response: HTTPResponse) => {
    const allHeaders: any = response.getAllHeaders();
    const location: string = allHeaders["Location"];
    if (location) {
      locationList.push(location);
      urlList.push(location);
    }
  });
  if (urlList.length === 0) {
    break;
  }
}
return locationList;
Addendum (20200228)

https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets

There was a urls in the Twitter API response. There was no explanation, but it seems to contain information about the links (shortened URL and original URL) posted on the Tweet.

"urls": [
          {
            "url": "https://t.co/Rbc9TF2s5X",
            "expanded_url": "https://twitter.com/i/web/status/1125490788736032770",
            "display_url": "twitter.com/i/web/status/1…",
            "indices": [
              117,
              140
            ]
          }
 ]

The problem

With this method, you will have to follow each Location one by one. Therefore, the processing cost is higher than automatically following redirects (followRedirects: true). Well, I'll turn a blind eye to that.

Next.

https://www.monotalk.xyz/blog/google-app-script-%E3%81%AE-urlfetchapp-%E3%81%AE-%E4%BE%8B%E5%A4%96%E3%83%8F%E3%83%B3%E3%83%89%E3%83%AA%E3%83%B3%E3%82%B0%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6/

fetch and fetchAll will cause an ExceptionError even if muteHttpExceptions: true is set. Then, for example, if you fetchAll 1000 URLs, you won't know which ones succeeded, which ones failed, and which ones were not executed.

FetchAll and RedirectURL (Error)
FetchAll and RedirectURL (Error)

I think it could be solved if Promise.allSettled could be used, but currently, Promises cannot be used.

The solutions I think of are:

  • Use fetch instead of fetchAll
  • Divide the number of requests in fetchAll into several chunks. (Not all at once, but divide)

In conclusion

After all, what I tried to do this time doesn't have the advantages of GAS, does it? GAS has the advantage of being able to easily integrate with GSuite.

However, this time I just wanted to create a small crawler. Of course, I think it can be created with GAS, but there are some things you have to compromise on.

If you can't compromise on that, you need to consider other means.

Lessons learned

  • Superficial
    • Do not retrieve the redirect destination URL when using fetchAll
  • Fundamental
    • Choose the tool suitable for the purpose

By the way, I'm thinking of rewriting this tool in golang, which can code parallel processing simply.

If it was helpful, support me with a ☕!

Share

Related tags