Twitter Scraping, NLP, Automation

To facilitate the management of a few Twitter/X accounts I run, I developed a few automations to streamline the processes. These automations include –

  • Scraping source websites to get content using Selenium
  • Using lightweight natural language processing to turn that content into posts
  • Using Pandas dataframes to analyze and manipulate data
  • Automated scheduling directly on Twitter

These automations were built since the Twitter API free tier does not provide this functionality.

LIGHTING ACCOUNT

Example of automated post scheduling on Twitter

One of my accounts posts the meaning of the city’s building’s light colors every evening. To get some of this data, I scrape information from a website using Selenium. This process first gathers all the dates and lighting meanings. It then uses lightweight natural language processing to turn that data into posts. This processing includes –

  • Finding the dates and turning them into Datetime objects
  • Rephrasing and changing the tense of lighting meanings to a standard post format
  • Finding color words in the meanings to add specific color swatch emojis

After creating the posts above, the script sets the post time to the time of sunset on the specific date using the Suntime library. It then reads in the Excel file of all previously created posts as a dataframe and tries to add these new posts.

For each potentially new post in the future, it checks if that post is already in the Excel dataframe. To do this, it checks if the post’s time matches any of the times in the Excel dataframe. To avoid duplicating posts if calculations for sunset times don’t match exactly, the script checks if any post is within a buffer time of an hour on either end. If no post matches, the script adds the post to the dataframe. And it exports the file back to an Excel format afterwards.

Using Excel rather than a database was a design choice made for ease of use. Posts must be manually changed for various reasons after the initial script runs. This process is much easier in the Excel GUI rather than querying and updating a small database.

A second script manages scheduling the posts from the Excel file. It first logs in to Twitter and scrapes all the scheduled posts for the account. It then reads the posts from the Excel file. Any future post that has not yet been scheduled is then systematically scheduled into Twitter.

Another script manages followers for the account. It scrapes the accounts that follow this account and identifies opportunities to either connect with related accounts or unfollow inactive accounts. It uses a similar approach to search Twitter and related accounts to identify accounts that are valuable to connect with.

Languages

Python, HTML, CSS

Technology

Selenium, Pandas, Numpy, Suntime