Python Google Trends API

google

EDIT: FYI PEOPLE —> I RETIRED THE REPO AT BIT BUCKET, not to panic though, dreyco676 has a version for python 3 and it is working well as of 21/10/2014. Please see https://github.com/dreyco676/pytrends.

Hi once again, slight detour but thought I should share this. I found the original code on-line and with a few tweaks, managed to get it to do what I wanted and so here it is for anyone interested in this sort of thing. In a world where large datasets are becoming ever more available to the average Joe, Google are doing their bit by allowing you to see what historic rates of search terms have occurred over a given time period, essentially allowing one to look back at what people have been thinking about. You can try this for yourself by following this link to http://www.google.co.uk/trends homepage. As an example lets say I was curious to see how often people search for the term fruit, I would get the following display.

googletrend

(ARGH, since this blog is free I cannot embed the link!!) There are plenty of ways one can then analyses these forms of data whether it be sentiment indicators such as the stock market, movie hits based on search results, when are most babies born and other such seasonal traffic patterns just as some examples.

Anyway the code is simple python script and follows a simple example so check it out. It does require you to login to your Google account so that it can cache the cookies so if you don’t have one you can always get each “.csv” file you need directly from the Google Trends website posted above. EDIT 20/01/2014: You will not be able to login if you have Google additional security running, this is because you get a redirect java session that will wait for a password that is sent to you by mobile, thus the script will never know what that is and is not written in a way to accept it as input. Don’t turn your security off to use this script, thats just stupid, instead just try/make a different gmail account, sorted!!

# Download the git repo.
git clone https://github.com/dreyco676/pytrends.git
cd pytrends
./example.py

EDIT: –> As mentioned above please use the pytrends version on github. I will not be supporting mine. There is an example called example.py, run this and you sould download a search for “pizza”!

This simple example will go and grab a load of trend data provided in the python list and store each in a .csv file. You can also check out the repository directly at https://bitbucket.org/mattreid9956/google-trend-api. The formatting is such that you are returned the end of week date for the whole week and the trend value over that period, this is supposed to make life easier should one run an analysis later. One could easily modify this script to get the desired formatting, note that period in which you search will change the granularity of the time window. For instance searches for 3 months will return daily results, where as searches over a year will return the accumulated results over a given week. This is a little annoying an I don’t see why Google won’t allow daily results by default, maybe time to ask them! Watch this space…

32 comments on “Python Google Trends API

  1. vincent says:

    Hello I am new to python.I used these code but I had some problem.

    File “download.py”, line 97, in
    if getGoogleTrendData( search_queries = list_of_queries, date=”all”, geo=”US”, scale=”1″ ) :
    File “download.py”, line 85, in getGoogleTrendData
    getGTData(search_query = search_term, date = date, geo = geo, scale = scale )
    File “download.py”, line 26, in getGTData
    connector = pyGTrends( google_username, google_password )
    File “/home/tdog/vincent/google-trend-api/pyGTrends.py”, line 42, in __init__
    self._connect()
    File “/home/tdog/vincent/google-trend-api/pyGTrends.py”, line 59, in _connect
    raise Exception(“Cannot parse GALX out of login page”)

    Can you help me fix these problems? Thanks very much.

    • mattreid9956 says:

      Hi Vincent,

      OK I reproduce the exact same issue my end! The problem is that Google changed formatting of the html string so that when searching for this string the “type” was specified before the “name” fields. Anyway I have amended this and submitted it to the git repository so please update again now. Have a good new year let me know of any problems

  2. Franzi says:

    Hi, I’m an absolute newbie on python. Nevertheless I used the code because I need a lot of Google-Trends-data and I got the following output:

    Google username: Username123456
    Google password:
    ERROR:root:Could not find requested section………..] 0/32
    Traceback (most recent call last):
    File “download.py”, line 114, in
    if getGoogleTrendData( search_queries = list_of_queries, date=”all”, geo=”US”, scale=”1″ ) :
    File “download.py”, line 99, in getGoogleTrendData
    getGTData(search_query = search_term, date = date, geo = geo, scale = scale )
    File “download.py”, line 53, in getGTData
    data = connector.csv( section=’Main’ ).split(‘\n’)
    File “/Users/xxx/pyGTrends.py”, line 128, in csv
    raise Exception(“Could not find requested section”)
    Exception: Could not find requested section

    So, what’s the problem? I don’t get it, because I’m really, really new in this… Could you help?

    Thanks!

    (By the way: I saved the files (pyGTrends.py and download.py), wrote “phyton download.py” in my terminal on Mac (like it always works for example with this “Hello World”-thing) and then added my Google username and password. Is there anything to do that I didn’t do?)

    • mattreid9956 says:

      Hey Franzi,

      I will look into this tomorrow, right now I have no laptop nearby. I am very confused by the error you report and immediately it suggests I have introduced an error on my part so don’t worry about being a python newbie. I think no one ever stops learning!cheers I’ll write back soon

      • mattreid9956 says:

        Hi Franzi,

        I have a quick question? Do you have all of the security running on your Google account as in the authorisation Steps for devices etc? I realise now that I had turned mine on too somewhat recently and I believe this could be the issue since you cannot log in from the script. I tried running the script with a friends login details and it all worked perfectly as it should. I would try this yourself… Either with another account (its pretty simple to make a separate account just for this very thing) or ask a friend. As far as I can tell it is working ok and has been downloading for me when using a different account. i will make a note of this on the blog though!

        Cheers, let me know how it goes

        Matt

    • mattreid9956 says:

      Hi Franzi, So I have taken a look and the people at Google have changed the format again, I should become full time employed to keep this script running this is the second time in a few weeks :)! Anyways I will have to take a look at the weekend i have a fair bit to do right bow but rest arssured you didnt do anything wrong and in fact, it should be as simple as typing “python dowload.py”! As far as I can tell it grabs some html and not the actual output, so I will need to investigate this further.

      Sorry and I will post when its fixed, if anyone else has any ideas here is the place to post 🙂

  3. Conroy says:

    Before I go working on this, has it been fixed? If so, is the repository link the same?

    • mattreid9956 says:

      Hi Conroy,

      Please go ahead the link above works fine for me. The issue was that if you have Google authorisation activated on your account, everytime you login from a new device it will txt your phone providing you with a unique password to verify that device. Very clever and I would advise you turn this on in future. However it makes using this script impossible as it bounces back requesting authorisation which it is not setup to deal with. You must use an account without this additional security activated. I have a dummy account that I use for this alone.

      Let me know of any problems and good luck
      Matt

  4. JPG says:

    Hey, thanks for putting this together.
    I have managed to run download.py at the first try, however, it crashes after a few .csv files are downloaded. The error message is the same as the one described by Franzi on Jan 18.
    It behaves a bit weird, because sometimes can download 3 files, some other times just one etc. but the error is always the same. If you have a clue on what is happening I could help with the fix/coding, please let me know.

    • JPG says:

      Maybe is an uncaught exception, but what is cracking my mind is the fact that is failing for different search terms, not the same one every time.

      JP

    • JPG says:

      I added a try/except block to the getGTData call within getGoogleTrendData, now the script is running without crashing. It failed on 16 of the 32 files intermittent, meaning is not that google is blocking the script. I’m thinking on something like network instability or fast disconnection/connection cycle during the download of the data. This is indeed getting fun!

      • mattreid9956 says:

        Hey JPG. hmmmm… How strange indeed, so would you say it works on a completely random basis in your experience. So firstly you did the right thing with catching exceptions. I wrote the download.py as a quick wrapper to the main script pyGTrend.py which does all the data collection, I certainly didn’t build in any error protection to cope with your problem. I wonder what happens if you extend the sleep(2) from within the getGoogleTrendData, to something larger? That may indicate connection/disconnect issues, try setting it to 10. Then I would perhaps check the entries online of those that you’re trying to download. In fact if you send me the list of query terms you’re attempting to download ( and assuming you have not made an drastic changes to the script ) I will see if I get the same issue.

  5. JPG says:

    I’ve dug a bit deeper on this. The problem arises from the csv method of the class, well not really. Looking at the raw data I see that sometimes google does not send any data but an error message saying “quota limit reached, try later” (I’m translating from Spanish). When this is the case, the method csv fails to find the “Main” section for obvious reasons.
    I will try to find a fix for this, any advice is always welcome.

    JP

    • mattreid9956 says:

      Argh, I see. So did increasing the sleep time help this? Can you now actually download the files that you want?
      Clearly I can add a error catch method in the csv part to avoid it crashing and maybe some automatic incrementation of the sleep time if this occurs even once to avoid it happening the next time. This did once happen to me a while ago but it seems odd that it doesn’t happen all the time, for instance I can download 30 files no problem today with just a 1sec sleep time. Please if you come up with any better ideas, let me know. I will try take a look over the weekend.
      Matt

      • JPG says:

        Matt, I varied the sleep time up to 30 sec and the error persists, but always random. I am using the download.py almost as it is with the search terms you hard-coded.
        I’ve read that this may be related to cookies but its still pretty weird due to the random nature of the error.
        When I run the script I’m always able to collect 40-50% of the 32 .csv files you supposed to.
        Catching exception will not solve the problem iin this particular case.
        As soon as I have news I will post them.

        JP

  6. JPG says:

    Hi,

    I have modified download.py in order to log in just once and then look for all the different search terms, this wiped out all the problems, well most of them.
    Now I’m dealing with google answering in Spanish, meaning that I’m not able to find the section “Main”, this should be fairly easy to fix.
    It seems everything works OK know.

    JP

    • mattreid9956 says:

      Hi JP,

      Good news! So what do you want to do about the fixes? I am happy to add you to the git repo (I’ll send the invite to this email) and you can commit the changes or you can post it here. I leave it up to you.
      Matt

  7. JPG says:

    I will commit the changes as soon as I clean the script.

  8. Andrew Kogler says:

    Hey Matt,

    Thanks a bunch for this library! It’s been a bunch of help. However, I was hoping to gather some data with increased granularity and was hoping to reduce the time window. I looked through the source code for a little while and became a little stumped as to how to go about making that change. Any pointers or tips in regards to this?

    Andrew Kogler

    • mattreid9956 says:

      Hi Andrew. Unfortunately there they only allow you to download daily data at the highest granularity for a period of 90 days, or at least that’s what it used to be. By default anything over this is returned in the weekly time series format. I have read a few ways people think they can combine these 90 day results by reaveraging

      http://erikjohansson.blogspot.co.uk/2013/04/how-to-get-daily-google-trends-data-for.html?m=1

      But since we don’t know Googles exact method of weighting and normalisation I wouldn’t know if I believe this and have yet to prove it to myself rigorously. Sorry I can’t be of more help I found this a little disappointing myself. Maybe google sell the information on and that is why?…

      • Andrew Kogler says:

        I understand. Can’t complain as I didn’t even generate this code myself. Now, how do I need to pass the date parameters to the function? I was playing around with that and was very confused as to how you were generating/parsing them.

      • Erik says:

        Hi Mattreid, thanks for linking to my blog. A couple of comments from my own experience.

        I’ve tried two metods for vewing the data together. Either by converting the trends data to %-change and adding the time series together, or by starting from the weekly data and adding the daily data in the gaps between the weeks. The results will be markedly different depending on which method you use.

        To give you an idea of the difference, I’ve created an example (http://erikjohansson.blogspot.fi/2014/11/two-methods-for-combinging-dailt-google.html). There you can also find the R code for combining the 90-day series into longer ones.

        Hope that’s of help!

  9. mattreid9956 says:

    The download.py script really is only intended as an example to extract the weekly time series data. Hence the formatting, for instance say one line in the download would be

    ‘2012-11-18 – 2012-11-24,19’

    The formatting picks out the leading or trailing date, in my example it would pick “2012-11-24,19″ as the date and search value. You can modify this as you like. For different dates you should be able to replace the date=”all” string with say

    date=”2004″ ====> this would return you the whole on 2004 in a weekly time series( this is the earliest Google Trends will let you search)
    of
    date=”2004-1″ ====> this would return you January 2004 in a daily time series.

    I am not 100% sure how to search a range say 2004-01 -> 2004-03 and my downloads just exceeded my daily limit so can’t check. Either way, you can download each month you want one by one (this also saves you having to make sure you only have 90 days between 3 consecutive months.). I hope that helps.

    • Andrew Kogler says:

      Hey Matt,

      Sorry to bother again. Modified script to run a call for “2011-01” as the date, to generate daily data with my own queries but I can’t get around the following error:

      ERROR:root:Could not find requested section………..] 0/4
      Traceback (most recent call last):
      File “download.py”, line 115, in
      if getGoogleTrendData( search_queries = list_of_queries, date=”2011-01″, geo=”US”, scale=”1″ ) :
      File “download.py”, line 99, in getGoogleTrendData
      getGTData(search_query = search_term, date = date, geo = geo, scale = scale )
      File “download.py”, line 53, in getGTData
      data = connector.csv( section=’Main’ ).split(‘\n’)
      File “/home/andrew/Dropbox/DataMiningProject/google-trend-api/pyGTrends.py”, line 128, in csv
      raise Exception(“Could not find requested section”)
      Exception: Could not find requested section

      Played around with the code for hours now and still quite confused as to how to fix this. Do you happen to have any tips on how this could be remedied.

      Thanks,
      Andrew Kogler

  10. Sergey Saydometov says:

    Thank you for posting the code, Matt. I’m trying to download the report using some parameters but it doesn’t seem to work. For example, when I set the geo=’US’, it downloads the data for the US but when I try to narrow the geographical area to a state, like geo=”US-TX’, it simply downloads the default, which is the Worldwide data.

    Would you have any advice on how I can narrow down the geographical area?

    I appreciate your help.
    Sergey

    • mattreid9956 says:

      Hi there. Off the top of my was I am not sure, it depends on the HTML header path. Can you manually go to google trends website and download a csv file with the information you want?if so I will look into this further as the data should be present using the script as long as the HTML header path is correct, if not i don’t think it is possible. Let me know what you find, all the best.

  11. KS says:

    Hi, I’m relatively new to python, but I’m again getting the error that Franzi and JPG were mentioning earlier. Like JPG, the first time I ran download.py, I was able to download a few .csv files before it crashed. Now, however, it crashes with the error message “Could not find requested section” before downloading any files, every time. I’m not sure what changed, but I’m running the file exactly as downloaded. Any advice?

    Thanks!

  12. K_Dilkington says:

    Hi, I wanted to know if you’ve had a chance to implement multiple search times (at one time), for comparisons. I’m trying to modify the download.py script to add that with little luck. Also, is there a way to alter what category you’re searching for or is it just “all categories” by default?

  13. csinkpen says:

    Hi, I’m wondering if you’ve had a chance to work on adding multiple search terms for the same query (for comparisons). I’m trying to modify the download.py script with little luck. Also, is there any way to alter the category you’re searching for (for example, People & Society, instead of “all”). Thanks for any help you can provide.

  14. […] file “download.py”, line 97, getgoogletrenddata( search_queries = list_of_queries, date. Click here Financial python | studies finance python, Studies finance python ( dk) questions gamma . delta […]

  15. victor says:

    Hi there I have pb about keyword if i add 2 keywords follow document readme. After download file csv. i see 2 keywords in 1 column. What happen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s