Searching for Data ================== Aside from connecting to the filter stream, all other ways of collecting Twitter data are different searches of the full archive. Specifying the Query -------------------- Search queries are specified in an :ref:`event query file ` according to Twitter's search operator `syntax rules `_. See an example of a search query file `here `_. Using the Full Archive Search ----------------------------- Once the event query file is ready, the command for searching for tweets is .. code-block:: bash python -m twitter.search event_name The search can be cancelled at any time with :code:`CTRL+C`. Counting -------- Twitter's API allows you to request the total number of tweets that will be returned by a given query without actually retrieving the tweets themselves. This is helpful for estimating the size of a query and staying under the API's monthly quota. It is also convenient if only time series data is needed and not the full tweet data. To get the counts, simply use the :code:`get_counts` flag: .. code-block:: bash python -m twitter.search event_name --get_counts This accesses Twitter's `count endpoint `_ and returns time series count data in JSON files in the :code:`output` directory. If you have more than one query, then there will be one file per query, numbered in the same order that they appear in the input query file. You can set the :code:`granularity` of the time series data to be :code:`minute`, :code:`hour`, or :code:`day`: .. code-block:: bash python -m twitter.search event_name --get_counts -granularity day Search Parameters ----------------- The search has several optional parameters. These are specified as flags on the standard search command, for example: .. code-block:: bash python -m twitter.search event_name --get_convos --backfill -n_days_back 30 See the documentation on :ref:`updates and backfills `, and :ref:`conversation, quote, and timeline searches ` for additional example usage. +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Parameter | Description | +====================================+==========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ | config_f | The configuration file to use if not using the default | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | max_results_per_page | The maximum number of tweets to return per page of search. Defaults to the maximum of 500. The minimum must be 10 | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | get_counts | Whether to count the number of tweets that will be returned by the queries, in place of actually searching for the tweets | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | granularity | If counting, the granularity of the time series data, either "minute", "hour", or "day". Defaults to "hour" | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | get_convos | Whether to get the conversations for the event using the conversation IDs from a search/stream, or for a list input conversation IDs. If True, :code:`start_time` defaults to :code:`first_time` and :code:`end_time` defaults to :code:`last_time` | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | get_quotes | Whether to get tweets that quote those from a search. Note, this is not needed for tweets that were collected from a stream. If True, :code:`start_time` defaults to :code:`first_time` and :code:`end_time` defaults to :code:`last_time` | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | get_quotes_of_quotes | If a quote search has been run before, whether to get quote tweets of those quote tweets. This command can be run repeatedly, but becomes increasingly less efficient because *all* quote tweets, not just those from the prior run of :code:`get_quotes_of_quotes` will be searched | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | get_timelines | Whether to get user timelines for a search/stream, or for a list of input user handles or IDs. If True, :code:`start_time` defaults to :code:`first_time` with :code:`n_days_back=14`, and :code:`end_time` defaults to :code:`last_time`. Note, :code:`n_days_back` will still be overriden to 14 if :code:`start_time` is not manually set as a parameter | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | full_timelines | Whether to retrieve the full timelines of users. Defaults to False | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | user_ids_f | Filename of a newline delimited text file of user IDs or handles for collecting user timelines | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | convo_ids_f | Filename of a newline delimited text file of conversation IDs for collecting reply threads | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | update | Whether to update the dataset with tweets that have occurred since the last tweet time in the search/stream. If True, :code:`start_time` defaults to :code:`last_time` and :code:`end_time` defaults to :code:`now`. If updating conversations or timelines, the :code:`start_time` is set dynamically based on the latest tweet from each conversation or user. The parameters :code:`end_time` and :code:`n_days_after` can be used together to specify an end time other than to the present day. Cannot be done at the same time as a backfill, and quote searches cannot be updated | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | backfill | Whether to update the dataset with tweets that occurred before the earliest tweet time in the search/stream. If True, :code:`start_time` defaults to the beginning of the day (UTC) of the first tweet available for the event, and :code:`end_time` defaults to :code:`first_time`. If backfilling conversations or timelines, the :code:`end_time` is set dynamically based on the latest tweet from each conversation or user. The parameters :code:`start_time` and :code:`n_days_back` can be used together to specify a start time other than to the first day of the event. Cannot be done at the same time as an update, and quote searches cannot be backfilled | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | start_time | The start time of the search. Overrides any start time set in the event query file. Use :code:`first_time` to use the earliest tweet time recorded for an event, and :code:`last_time` to use the latest | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | end_time | The end time of the search. Overrides any end time set in the event query file. Use :code:`last_time` to use the latest tweet time recorded for an event, and :code:`first_time` to use the earliest. Use :code:`now` to use the current date | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | n_days_back | How many days back to start the search relative to :code:`start_time`. Note, it has to be :code:`start_time` passed manually as a parameter, not in the event query file. Defaults to 0 | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | n_days_after | How many days after to end the search relative to :code:`end_time`. Note, it has to be :code:`end_time` passed manually as a parameter, not in the event query file. Defaults to 0 | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | append / overwrite | Whether to append JSON tweets to an existing file for the event. By default, tweets are appended | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | write_count_files / no_count_files | Whether to write JSON time series count data when counting. Defaults to True if running a standard counting search. Defaults to False if running a timeline, conversation, or quote search (because they will produce many count files, one per entity being searched) | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | verbose / quiet | Whether to print information/updates to the console while running the stream. By default, information is printed | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | update_interval | How often to print updates of the number of tweets collected, in minutes | +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+