Why the Combination of FetchAll and RedirectURL in Google Apps Script is Bad
I had a problem with Google Apps Script (hereinafter referred to as GAS), so I thought I would leave it as a note.
What I tried to do
I was trying to collect the links written in tweets with a specific hashtag. I also wanted to filter these links by a specific domain. I thought about creating this as a RESTful API using GAS, which is easy to create.
What I tried
All links written on Twitter are shortened URLs.
Therefore, I needed to access the shortened URL and go to get the redirected URL.
In GAS, there is a request method called fetch. By setting the followRedirects
option of this fetch to false and taking the location of the responseHeader, you can solve the problem (get the redirected URL).
https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app#advanced-parameters
Also, if you only use fetch for one request, it will be processed in series, which is very slow. By using featchAll, which allows multiple requests at the same time, you can process in parallel and get good performance. In other words, I was thinking of solving it with the following code.
let urlList: Array<string> = ["https://t.co/XXXX", "https://t.co/YYYY"];
const locationList: Array<string> = [];
while (true) {
const requestList: Array<URLFetchRequest> = urlList.map((url: string) => {
return {
url: url,
method: "get",
followRedirects: false,
muteHttpExceptions: true,
};
});
const responseList: Array<HTTPResponse> = UrlFetchApp.fetchAll(requestList);
urlList = [];
responseList.forEach((response: HTTPResponse) => {
const allHeaders: any = response.getAllHeaders();
const location: string = allHeaders["Location"];
if (location) {
locationList.push(location);
urlList.push(location);
}
});
if (urlList.length === 0) {
break;
}
}
return locationList;
Addendum (20200228)
https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets
There was a urls
in the Twitter API response. There was no explanation, but it seems to contain information about the links (shortened URL and original URL) posted on the Tweet.
"urls": [
{
"url": "https://t.co/Rbc9TF2s5X",
"expanded_url": "https://twitter.com/i/web/status/1125490788736032770",
"display_url": "twitter.com/i/web/status/1…",
"indices": [
117,
140
]
}
]
The problem
With this method, you will have to follow each Location one by one.
Therefore, the processing cost is higher than automatically following redirects (followRedirects: true
). Well, I'll turn a blind eye to that.
Next.
fetch and fetchAll will cause an ExceptionError even if muteHttpExceptions: true
is set.
Then, for example, if you fetchAll 1000 URLs, you won't know which ones succeeded, which ones failed, and which ones were not executed.
I think it could be solved if Promise.allSettled could be used, but currently, Promises cannot be used.
The solutions I think of are:
- Use fetch instead of fetchAll
- Divide the number of requests in fetchAll into several chunks. (Not all at once, but divide)
In conclusion
After all, what I tried to do this time doesn't have the advantages of GAS, does it? GAS has the advantage of being able to easily integrate with GSuite.
However, this time I just wanted to create a small crawler. Of course, I think it can be created with GAS, but there are some things you have to compromise on.
If you can't compromise on that, you need to consider other means.
Lessons learned
- Superficial
- Do not retrieve the redirect destination URL when using fetchAll
- Fundamental
- Choose the tool suitable for the purpose
By the way, I'm thinking of rewriting this tool in golang, which can code parallel processing simply.
Share
Related tags
- Created a GAS Library, zoom-meeting-creator, to Automatically Generate Zoom Meetings
- Cotlin is a Tool for Collecting Links on Twitter, Discover Presentations from Around the World
- I tried creating rMinc, a service that registers GMail to GCalendar
- I want to test with Google Apps Script too! (Clasp + Typescript + Jest)
- My First Mobile App Development Attempt and Why I Gave Up
- Things I learned when I started using React at work
- Developing an oEmbed component with WebComponents and what I learned
- Thoughts on Using Ruby on Rails in Business
- Knowing the History Before Learning React
- Writing about what I learned from being infected with the Omicron variant
- Building a TikTok Scraping Infrastructure on GCP and the Challenges Faced
- Learning How Browsers Work
- It's become harder to "ask casually" since remote work started
- What I, an engineer in my late 20s, need to learn from now on
- Everything you need to know about Micro Frontends
- Introducing a Tool for Bulk Updating Account Images and What I Learned
- Everything I Learned About Micro Frontends
- Cotlin is a Tool for Collecting Links on Twitter, Discover Presentations from Around the World
- I tried creating rMinc, a service that registers GMail to GCalendar
- I Tried Making a One-Frame Comic Search Service Tiqav2 (Algolia + Cloudinary + Google Cloud Vision API)
- Sharing All Experiences of First-time Writing at Techbook Fest 7