This week, I want to share a mini-project I did this past summer with Google Trends, an awesome tool which allows anyone to collect data using Google search engine.
Data Collection
Google trends is a tool that shows search patterns of certain terms across time and space. In its basic use, we insert a search query and get results of its level of popularity over different time frames. It is also possible to break-down the results by geo-location.
The motivation for this project was to test the gtrendsR package which makes it easier to load query results into R and run various tests. I wanted to do some analysis on the COVID-19 global pandemic. Therefore, I focused my search query on a specific time frame – the primary months of lockdown in the US: March – July 2020.
The main search involved two terms, the first is one that I personally learned of its utility during the lockdown months: curbside. It is the option offered by various stores to shop online for groceries (or other products), and having a staff member bring it to my car without direct contact. In order to make comparisons, I searched for a second term and settled on backyard. My logic was that with many people 'stuck' at home, the backyard has become a central hub for spending time.
To recap my search query – I compared the popularity of the terms curbside and backyard across the continental US between March and July 2020.
Let’s talk some findings. For starters, I plot a general comparison between both terms. The figure below shows the relative interest (measured in number of searches – called hits) for either term across the entire time frame. Beginning with curbside, it seems that search patterns track with the timing of the lockdown – it spiked in late March as most states instituted some for of lockdowns, and dropped in late April – early May as most states eased the restrictions. With people limited to shopping online, it seems that many were seeking for options to make this essential task more efficient and safe (i.e. curbide).
The results for backyard are not as clear – there is a trend of an increase in searches for this term over time. It may reflect the lockdowns across the US, yet it is also a function of changes in weather. The warmer it gets outside, people will search for more options to utilize their private and safe spaces, and spend more time in the backyard.
To further clarify these findings, I ran another analysis. This time, I restricted the sample to searches which originated in either Texas (my current residence) and Ohio (another large, highly populated state, but with different weather conditions). Starting with the Texas data, search for curbside tracks nicely with the US data (previous analysis). The peak between mid-march and mid-April aligns almost perfectly with the institution of lockdown in Texas and easing of restrictions. The lack of changes in search for backyard can be attributed to the relative stability of the weather.
The Ohio plot strengthen the propositions for both terms. The search for curbside also fits nicely with the overall US data. Search for backyard better reflects the changes in weather – as the Spring season begins in late March and temperatures gradually increase, searches for backyard increased until the summer months.
Thus far, I showed how the search data is distributed over time. Now, I want to switch to spatial visualizations. Let’s stay with the main terms (curbside and backyard) and keep the same time frame (March-July 2020). However, this time, I map the data across the entire United States. The figure below displays which of the two terms was more popular in each state. As I showed in the plots above, both Texas and Ohio were dominated by the curbside term (along with Michigan, Minnesota, Pennsylvania, and Maine). Without additional information, it is hard to draw conclusions on this break-down, yet weather can be a logical reason in the case of some of the northern states.
Finally, I wanted to expand the test and evaluate other terms. I looked for terms that are more 'universal' across the entire country in search patters, but still fit within the COVID “era”. Thus, I did a query for the term Mask between March and July of 2020.
I map the results using a Choropleth map – this type of visualization shows the frequency of the results with a gradient, using a color scheme to display higher values for the relevant data. In the map below, states that are 'colored' in red (or close to it) had higher number of searches for the term Mask. It seems that in this case, the Pacific coast and Mountain region states lead the way, along with Pennsylvania and Massachusetts. I add a cities layer to the map by coding the geo-location data for several of the cities which had high values of searches for the term Mask (included in the google trends search results).
An interesting nugget that occurred to me while writing this post, and may be part of the explanation for the Mask term map – is there a relationship between partisanship and searching for the term Mask? Check the final 2020 elections map. There seems to be not insignificant similarity between several of the 'blue' states (especially in the west coast and mountain region), and the states in my map where search for Mask spiked during the lockdown months . I would not speculate that this is correlated in any way, but it is an interesting possibility to contemplate with more data.
To recap this project, Google Trends is a useful tool to collect relevant data across time and space. It provides results that offer many visualization options, and can be analyzed for certain trends in society. For analysis, I highly recommend to use the gtrendsR package which makes working with this type of data efficient and easy to visualize.
If you are interested in all my analyses and code (including additional cities level analysis), check my Github page.
Comments