An Introduction To Python & Machine Learning For Technical SEO

Python is used to power platforms, analyze data, and run machine learning models. Begin with Python for technical SEO.

Since I first discussed how Python is being used in the SEO space two years ago, it has grown in popularity, and many people have begun to use and see the benefits of using it in their day-to-day roles.

It’s really exciting to see so many SEOs share their experiences, the cool scripts they’ve written, and how it’s affected their jobs.

It wouldn’t be right for me to publish this without mentioning Hamlet Batista’s influence on me and so many others. He got a kick out of seeing people learn and use Python.

I’m sure he’d be overjoyed to see so many people sharing their Python journeys and all of the amazing scripts they’ve created.

What Exactly Is Python?

Python, in a nutshell, is an object-oriented interactive programming language that is interpreted line by line.

Python is popular due to the increased productivity it provides, as well as its simple and easy-to-learn syntax, advanced readability, and support for a variety of modules and libraries.

Python is used by some of the world’s largest organizations to power their platforms, perform data analysis and run machine learning models as proof of this.

Python has been publicly stated as an important part of the growth of companies such as Google, YouTube, Netflix, NASA, Spotify, and IBM due to its simplicity, speed, and scalability.

Python was used to write Google’s first web crawler, and it is still one of the company’s official server-side languages.

How to Execute Python

Python scripts can be run in a variety of ways, depending on what works best for you.

Most systems come pre-installed with Python; this is most likely Python 3, but you can find out which version you have by typing the python –version in your terminal.

If you have Python 2, you can upgrade to Python 3 by downloading Python 3 from the Python website. Python 2 was officially deprecated in 2020, and there are some syntax differences between the two, so it is best to use Python 3.

Python can be run from a terminal, command line IDE (Integrated Development Environment), or desktop-based platforms such as Pycharm or VSCode. You can also use cloud-based alternatives such as:

  • Google Colab
  • Jupyter Notebooks

These make it easier for beginners to learn and test code elements line by line, as well as share and collaborate with your team.

How to Study Python

Python can be learned using a variety of online tools, and the best method for you depends on your learning style. For example, if you are a visual learner who enjoys following along with video coding, freeCodeCamp is an excellent place to begin.

If you prefer a more project-based learning style, Codecademy and Sololearn are great places to start. These websites also allow you to keep track of your progress and start a project portfolio.

Some sites, such as CodeCombat and Checkio, gamify the learning journey; these are a great way to build a habit of coding every day in a fun way.

If you prefer to code in real-time alongside an instructor and identify as a woman or non-binary person, you can also sign up for a free 8-week course with Code First Girls (disclaimer, I work for Code First Girls).

Once you’ve mastered the fundamentals of Python, the best thing you can do is start working on projects, either your own or by expanding on one of the many scripts shared by the Python community.

These projects do not have to be related to SEO, but having practical examples to use when working on projects can be useful at times.

If you’re interested in the data analysis side of Python, you should check out and use the free datasets on Kaggle.

Python Libraries

Python’s main strength is in its libraries, which enable a variety of additional functions such as:

  • Data extraction.
  • Analysis and preparation.
  • Scientific computing.
  • Natural language processing.
  • Machine learning.

Some useful libraries for data analysis and automation tasks in SEO include:

Pandas: A data manipulation and analysis tool.
NumPy is a Python library that can be used for scientific computing.
SciPy: A Python library for scientific and technical computing.
SciKit Learn is a machine learning framework for data mining and analysis.
Pandas: A data manipulation and analysis tool.
SpaCy is an outstanding natural language processing library.
Requests is a library that allows you to make HTTP requests.
Beautiful Soup: A data extraction tool for HTML and XML files.
Matplotlib: For creating data visualizations.

Why is Python so popular among SEOs?

While understanding the languages that power the websites we work on (such as HTML, CSS, and JavaScript) is important, Python provides many automation opportunities for low-level tasks that would normally take us several hours to complete.

Python empowers SEO professionals in a variety of ways, not only by allowing us to automate repetitive tasks but also by allowing us to extract and analyze large data sets.

Because the amount of data marketers works with is only increasing, being able to efficiently analyze it will allow them to solve many complex problems in less time.

This saves us time and allows us to be more efficient in performing other important SEO tasks. Python’s popularity among SEO professionals has grown as a result of these factors.

Understanding data better will not only help us do our jobs better, but it will also allow us to make data-driven decisions.

These decisions will allow us to provide concrete insights to our clients and stakeholders while also increasing our confidence in the recommendations we implement.

Read How to get a file extension in Python?

The Advantages of Python Automation

While Python will not be able to mimic human, emotion-driven strategies, it can be used to automate a wide range of time-consuming tasks.

This list of tasks you can automate with Python is constantly growing, but it currently includes:

  • Detecting user intent.
  • Mapping URLs in advance of a migration.
  • Internal link examination.
  • Conducting keyword research
  • Image optimization.
  • Website scraping

How to Integrate Python Into Your SEO Workflow

The best way to incorporate Python into your workflow is to consider what can be automated, particularly tedious and time-consuming tasks.

Alternatively, consider how you can deal with and draw conclusions from the data you have at your disposal more efficiently.

A good place to start is to experiment with data from your website that you already have access to, such as from a site crawl or your analytics tool.

When learning, don’t be afraid to take inspiration from other people’s scripts, experiment, and even break something, as this is often the best way to learn.

Finding the root cause of a problem and figuring out how to fix it is a big part of what we do as SEOs, and it’s the same when learning and using Python.

There are also numerous articles from other SEOs that provide practical examples of how they use Python for SEO-related tasks. To learn more about some of these, I recommend visiting SEO Pythonistas.

Exemplifications of Python Use

Are you ready to dive into Python?

Here are a few scripts that I’ve found useful for a variety of tasks, along with a brief description of how each one works and the problems they solve.

Relevancy of Redirect

The first practical application of Python is to determine whether the redirect mapping implemented for migration is correct by writing a redirect relevancy script.

This entails crawling your site before and after migration and segmenting the various categories based on their URL structure.

Then, using some of Python’s built-in comparison operators, you can determine whether the folder and depth of each page have remained the same or changed as a result of the migration.

The script will compare each of your URLs before and after migration to see if they are the same, and the results will be output to a new table that will state True if they are the same, or False if they have changed.

You can also use the Python library Pandas to create a pivot table that displays a count of how many URLs match and how many have changed for each category.

This will allow you to investigate any non-matching categories or URLs and review the redirect rules that have been set up.

Analysis of Internal Links

Python is another useful script that uses crawl data to perform internal link analysis.

This will enable you to identify the sections of your site with the most internal links, as well as opportunities to improve internal linking for various sections.

This time, segmentation will be used to determine the different categories of the URLs, and pivot tables will be used to export a count of the number of internal links to each category on the site.

Pythia Image Captioning

This is the first script that introduced me to the language and sparked my interest in learning it.

This script generates a caption for an image URL using Pythia, a modular deep learning framework developed by Facebook.

This caption can then be used for images that currently lack alt tags, which are important for image accessibility and search.

The script is built on a bottom-up and top-down mechanism that computes results by focusing attention on different elements within an image.

Attention is weighted to individual pixels within the image for each word generated, outlining the region with the most attention.

This script is simple to use because it can be run directly from Google Colab and does not require any direct coding.

Once you’ve saved a copy of the necessary code to your personal Google Colab drive, you can run all cells, which will perform each step for you.

This will download the data sources required to run the process as well as automatically complete all of the steps that would normally have to be completed manually.

All libraries, for example, will be installed, classes will be created, and functions will be assigned.

This will create a field for you to enter your image URL and a button to caption the image.

After that, a caption will be provided for each image, which can be used directly as an alt tag or to inspire the creation of one.

This script is demonstrated in action in Hamlet’s comprehensive guide to generating text from images with Python.


Python is also great for working with APIs, such as Google’s Page Speed Insights API. This allows you to measure key performance metrics at scale, saving you time from having to test each URL individually.

Using a CSV file containing all of the URLs to test, you can run each through the API and generate a response object containing all of the metrics for each URL.

The specific metrics, such as LCP, CLS, and FID, can then be extracted and displayed in a table for each URL.

You can also use the API to get a list of all third-party blocking tags or unused CSS and JS files on each page, as well as a list of all layout shifting elements for each page, the largest contentful paint element, and a list of all third-party blocking tags or unused CSS and JS files on each page.

Other Alternatives

These are just a few examples of the many automation and optimization possibilities available with Python scripts, which include:

  • Image optimization.
  • Combining datasets to reach even more conclusive conclusions.
  • Validation of hreflang.
  • Calculate keyword growth.
  • GSC data collection.
  • Conducting a competitor analysis

Providing Support for Machine Learning

Python, because of its simple, intuitive, and accessible syntax, is also a popular language for powering machine learning applications.

Furthermore, there are a plethora of useful libraries that can be used when working with and training machine learning models.

What Exactly Is Machine Learning?

Machine learning is defined as “an application of artificial intelligence that provides systems with the ability to automatically learn and improve from experience without the need for explicit programming” (a full definition can be found here).

Machine learning is frequently used to identify patterns in data that can then be used to make predictions.

There are two types of machine learning. The first is supervised learning, which is trained on labeled data and involves feeding a training set with the desired output.

When reading the data, the learning algorithm is thus already given the answer. When training the model, the correct outcome for each data point is explicitly labeled.

Unsupervised learning, on the other hand, is trained using unlabeled data, allowing the algorithm to act on the data without supervision. This is frequently used to test the system’s capabilities or when you don’t have pre-labeled data.

Machine Learning and Python

Python, when used in conjunction with machine learning, can be used to power scripts for training a dataset before summarizing and visualizing the data.

The model will then evaluate the algorithms in order to make predictions.

Examples of Real-World Machine Learning

The use of machine learning on the web is growing all the time, with new models being developed and training data becoming more widely available on a daily basis. In some cases, we are also used to assist in their training.

Examples of real-world machine learning include:

  • The RankBrain algorithm from Google.
  • Deep Voice is a Baidu program.
  • Twitter’s timelines are curated.
  • Recommendations for Netflix and Spotify
  • Einstein is a Salesforce feature.

It’s no surprise that machine learning models are being used to make marketers’ lives easier because of their ability to solve complex problems.

As Britney Muller puts it:

“Machine Learning is becoming more accessible and will free us up to work on higher-level strategy.”

This will allow you to spend more time solving problems rather than just identifying them.

Here are some examples of machine learning models used in SEO:

  • Content quality assessment.
  • Identifying keyword opportunities and gaps.
  • Getting a better understanding of user engagement.
  • Title tag optimization.
  • Creating meta descriptions automatically.
  • I’m trancribing audio.

Here are some examples of Machine Learning in use for SEO tasks that you may have come across.

Predictive Prefetching

Guess.js and other tools build machine learning models based on user navigation patterns from website analytics to predict which pages users are most likely to visit next and prefetch the resources that will need loading.

Predicting the next piece of content a user is likely to want to view and adjusting user experience to account for this are other examples of this in practice.

In addition to predicting which widgets a user is likely to interact with and tailoring a more personalized experience with this in mind.

Internal Connectivity

Machine learning can assist with internal linking in two ways.

The first step is to update broken links, which can be accomplished by crawling for broken internal links, then using an algorithm to suggest the most accurate replacement page and replacing broken internal links.

The other, based on big data, suggests relevant internal linking. These tools employ fine-tuned algorithms to constantly acquire new information in order to suggest more internal links after some time.

As an article is being written, they begin to suggest relevant internal links.

Content Caliber

The following example is enhancing content quality by predicting what users and search engines prefer. You can accomplish this by developing a model that generates insights on the most important factors.

Search volume and traffic, conversion rate, internal links, bounce rate, time on page, and word count are examples of such factors.

The important factors will then be used to train a machine learning model, which will generate a content quality score for each page.

The User Experience

Machine learning is also being used to help improve user experience, and there are numerous examples. For example, Instagram uses sentiment analysis to identify and address bullying language.

Twitter also uses it for image cropping to ensure that images are cropped to display the most important part, such as the text.

The text on these images is in different places, but Twitter crops them to show the text in the preview. This machine learning model was trained on thousands of images and began in this manner before identifying the most important part of the image.

Computer vision is also being used to improve user experience by automatically identifying what is in an image, as well as to make images more accessible by explaining what an image is to users.


I hope this has piqued your interest in learning Python and investigating how it can assist you in automating tasks and analyzing complex data to increase your efficiency.

Finally, keep in mind that you don’t have to learn Python to be a good SEO, but if you’re intrigued or interested, I hope you have fun learning and implementing some Python scripts into your workflow.

Need help with our free SEO tools? Try our free Code to Text Ratio CheckerBroken Links FinderOnline Ping Website Tool.

Learn more from SEO and read Are .gov Links Considered a Google Ranking Factor?

Related Articles


Back to top button

Adblock Detected

Don't miss the best oppertunities.