Data Across Queries

The focalevents tools are designed with multiple related queries in mind, whether they be from a stream, search, or auxiliary search for conversations, quotes, or timelines. There are five boolean fields in the tweets table to distinguish the query source of any given tweet:

  • from_stream,

  • from_search,

  • from_quote_search,

  • from_convo_search,

  • from_timeline_search,

All tweets that were returned by a particular query will be marked as True in the corresponding from_* field. Multiple columns can be True if a tweet was returned by more than one type of query. This allows you to distinguish the query source of different tweets, while still organizing them together through their event names.

Referenced Tweets

The tweets table distinguishes between tweets that are returned directly in response to a query from the API, and referenced tweets that are returned because they were retweeted, quoted, or replied to. There are five additional boolean fields corresponding to the ones above that indicate whether a tweet was referenced or not:

  • directly_from_stream

  • directly_from_search

  • directly_from_quote_search

  • directly_from_convo_search

  • directly_from_timeline_search

All tweets that were returned by a particular query, referenced or not, will be marked as True in the from_* field. Any tweet that was returned directly by a query (i.e. it is not just a referenced tweet) will be marked as True in the directly_from_* field. Tweets that are only referenced tweets then can be identified by looking for rows where from_* AND NOT directly_from_*.

Quote Tweet Matching in Streams and Searches

The filter stream matches on tweets that match a certain rule and quote tweets where the quoted tweet matches the rule. This means that if we did not previously see a quoted tweet in a stream (i.e. if we started our stream after the quoted tweet was posted), then that tweet will be marked as False in the directly_from_stream field, even though it may be the tweet with the keyword match. For this reason, it is recommended to backfill the stream tweets with a search query after the stream is done, so that quoted tweets that were matched by the stream will be marked as True in the directly_from_search field. This allows us to identify all directly relevant tweets by looking for those that are directly_from_stream AND directly_from_search.