Searching for Data¶

Aside from connecting to the filter stream, all other ways of collecting Twitter data are different searches of the full archive.

Specifying the Query¶

Search queries are specified in an event query file according to Twitter’s search operator syntax rules.

See an example of a search query file here.

Using the Full Archive Search¶

Once the event query file is ready, the command for searching for tweets is

python -m twitter.search event_name

The search can be cancelled at any time with CTRL+C.

Counting¶

Twitter’s API allows you to request the total number of tweets that will be returned by a given query without actually retrieving the tweets themselves. This is helpful for estimating the size of a query and staying under the API’s monthly quota. It is also convenient if only time series data is needed and not the full tweet data.

To get the counts, simply use the get_counts flag:

python -m twitter.search event_name --get_counts

This accesses Twitter’s count endpoint and returns time series count data in JSON files in the output directory. If you have more than one query, then there will be one file per query, numbered in the same order that they appear in the input query file. You can set the granularity of the time series data to be minute, hour, or day:

python -m twitter.search event_name --get_counts -granularity day

Search Parameters¶

The search has several optional parameters. These are specified as flags on the standard search command, for example:

python -m twitter.search event_name --get_convos --backfill -n_days_back 30

See the documentation on updates and backfills, and conversation, quote, and timeline searches for additional example usage.

Parameter	Description
config_f	The configuration file to use if not using the default
max_results_per_page	The maximum number of tweets to return per page of search. Defaults to the maximum of 500. The minimum must be 10
get_counts	Whether to count the number of tweets that will be returned by the queries, in place of actually searching for the tweets
granularity	If counting, the granularity of the time series data, either “minute”, “hour”, or “day”. Defaults to “hour”
get_convos	Whether to get the conversations for the event using the conversation IDs from a search/stream, or for a list input conversation IDs. If True, `start_time` defaults to `first_time` and `end_time` defaults to `last_time`
get_quotes	Whether to get tweets that quote those from a search. Note, this is not needed for tweets that were collected from a stream. If True, `start_time` defaults to `first_time` and `end_time` defaults to `last_time`
get_quotes_of_quotes	If a quote search has been run before, whether to get quote tweets of those quote tweets. This command can be run repeatedly, but becomes increasingly less efficient because all quote tweets, not just those from the prior run of `get_quotes_of_quotes` will be searched
get_timelines	Whether to get user timelines for a search/stream, or for a list of input user handles or IDs. If True, `start_time` defaults to `first_time` with `n_days_back=14`, and `end_time` defaults to `last_time`. Note, `n_days_back` will still be overriden to 14 if `start_time` is not manually set as a parameter
full_timelines	Whether to retrieve the full timelines of users. Defaults to False
user_ids_f	Filename of a newline delimited text file of user IDs or handles for collecting user timelines
convo_ids_f	Filename of a newline delimited text file of conversation IDs for collecting reply threads
update	Whether to update the dataset with tweets that have occurred since the last tweet time in the search/stream. If True, `start_time` defaults to `last_time` and `end_time` defaults to `now`. If updating conversations or timelines, the `start_time` is set dynamically based on the latest tweet from each conversation or user. The parameters `end_time` and `n_days_after` can be used together to specify an end time other than to the present day. Cannot be done at the same time as a backfill, and quote searches cannot be updated
backfill	Whether to update the dataset with tweets that occurred before the earliest tweet time in the search/stream. If True, `start_time` defaults to the beginning of the day (UTC) of the first tweet available for the event, and `end_time` defaults to `first_time`. If backfilling conversations or timelines, the `end_time` is set dynamically based on the latest tweet from each conversation or user. The parameters `start_time` and `n_days_back` can be used together to specify a start time other than to the first day of the event. Cannot be done at the same time as an update, and quote searches cannot be backfilled
start_time	The start time of the search. Overrides any start time set in the event query file. Use `first_time` to use the earliest tweet time recorded for an event, and `last_time` to use the latest
end_time	The end time of the search. Overrides any end time set in the event query file. Use `last_time` to use the latest tweet time recorded for an event, and `first_time` to use the earliest. Use `now` to use the current date
n_days_back	How many days back to start the search relative to `start_time`. Note, it has to be `start_time` passed manually as a parameter, not in the event query file. Defaults to 0
n_days_after	How many days after to end the search relative to `end_time`. Note, it has to be `end_time` passed manually as a parameter, not in the event query file. Defaults to 0
append / overwrite	Whether to append JSON tweets to an existing file for the event. By default, tweets are appended
write_count_files / no_count_files	Whether to write JSON time series count data when counting. Defaults to True if running a standard counting search. Defaults to False if running a timeline, conversation, or quote search (because they will produce many count files, one per entity being searched)
verbose / quiet	Whether to print information/updates to the console while running the stream. By default, information is printed
update_interval	How often to print updates of the number of tweets collected, in minutes

Searching for Data¶

Specifying the Query¶

Using the Full Archive Search¶

Counting¶

Search Parameters¶

focalevents

Navigation

Related Topics