{"id":1876,"date":"2021-11-23T09:04:00","date_gmt":"2021-11-23T09:04:00","guid":{"rendered":"https:\/\/seopolarity.com\/blog\/?p=1876"},"modified":"2021-11-25T09:39:02","modified_gmt":"2021-11-25T09:39:02","slug":"a-data-science-approach-to-internal-link-structure-optimization","status":"publish","type":"post","link":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/","title":{"rendered":"A Data Science Approach to Internal Link Structure Optimization"},"content":{"rendered":"\n<p>Internal linking optimization is critical if you want your site pages to have enough authority to rank for their target keywords. Internal linking refers to pages on your website that receive links from other pages.<\/p>\n\n\n\n<p>This is significant because it is the basis for Google and other search engines determining the importance of the page concerning other pages on your website.<\/p>\n\n\n\n<p>It also influences how likely a user is to find content on your site. The Google PageRank algorithm is based on content discovery.<\/p>\n\n\n\n<p>Today, we&#8217;re looking into a data-driven approach to improving a website&#8217;s internal linking for better technical site SEO. That is, the distribution of internal domain authority should be optimized based on the site structure.<\/p>\n\n\n\n<p>Read <a href=\"https:\/\/fileproinfo.com\/blog\/a-beginners-guide-to-digital-marketing\/2021\/\" target=\"_blank\" rel=\"noreferrer noopener\">A Beginner\u2019s Guide to Digital Marketing<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-using-data-science-to-improve-internal-link-structures\">Using Data Science to Improve Internal Link Structures<\/h2>\n\n\n\n<p>Our data-driven approach will concentrate on just one aspect of optimizing internal link architecture: modeling the distribution of internal links by site depth and then targeting the pages that are lacking links for their specific site depth.<\/p>\n\n\n\n<p>We begin by importing the libraries and data, then we clean up the column names before previewing them:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import pandas as pd\r\nimport numpy as np\r\nsite_name = 'ON24'\r\nsite_filename = 'on24'\r\nwebsite = 'www.on24.com'\r\n\r\n# import Crawl Data\r\ncrawl_data = pd.read_csv('data\/'+ site_filename + '_crawl.csv')\r\ncrawl_data.columns = crawl_data.columns.str.replace(' ','_')\r\ncrawl_data.columns = crawl_data.columns.str.replace('.','')\r\ncrawl_data.columns = crawl_data.columns.str.replace('(','')\r\ncrawl_data.columns = crawl_data.columns.str.replace(')','')\r\ncrawl_data.columns = map(str.lower, crawl_data.columns)\r\nprint(crawl_data.shape)\r\nprint(crawl_data.dtypes)\r\nCrawl_data\r\n\r\n(8611, 104)\r\n\r\nurl                          object\r\nbase_url                     object\r\ncrawl_depth                  object\r\ncrawl_status                 object\r\nhost                         object\r\n                             ...   \r\nredirect_type                object\r\nredirect_url                 object\r\nredirect_url_status          object\r\nredirect_url_status_code     object\r\nunnamed:_103                float64\r\nLength: 104, dtype: object<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"441\" src=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_crawl_data-1024x441.png\" alt=\"\" class=\"wp-image-1880\" srcset=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_crawl_data-1024x441.png 1024w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_crawl_data-300x129.png 300w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_crawl_data-768x331.png 768w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_crawl_data-1536x661.png 1536w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_crawl_data.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>The data imported from the Sitebulb desktop crawler application is shown above as a preview. There are over 8,000 rows, and not all of them will be unique to the domain, as resource URLs and external outbound link URLs will be included.<\/p>\n\n\n\n<p>We also have over 100 columns that aren&#8217;t required, so some column selection will be necessary.<\/p>\n\n\n\n<p>But, before we get there, let&#8217;s take a look at how many site levels there are:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">crawl_depth\r\n0             1\r\n1            70\r\n10            5\r\n11            1\r\n12            1\r\n13            2\r\n14            1\r\n2           303\r\n3           378\r\n4           347\r\n5           253\r\n6           194\r\n7            96\r\n8            33\r\n9            19\r\nNot Set    2351\r\ndtype: int64<\/pre>\n\n\n\n<p>So, as we can see from the above, there are 14 site levels, the majority of which are found in the XML sitemap rather than the site architecture.<\/p>\n\n\n\n<p>You&#8217;ll notice that Pandas (the Python data-handling package) sorts the site levels by digit.<\/p>\n\n\n\n<p>This is because the site levels are currently charactered strings rather than numeric. This will be changed in later code because it affects data visualization (&#8216;viz&#8217;).<\/p>\n\n\n\n<p>We&#8217;ll now filter the rows and select the columns.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Filter for redirected and live links<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">redir_live_urls = crawl_data[['url', 'crawl_depth', 'http_status_code', 'indexable_status', 'no_internal_links_to_url', 'host', 'title']]\nredir_live_urls = redir_live_urls.loc[redir_live_urls.http_status_code.str.startswith(('2'), na=False)]\nredir_live_urls['crawl_depth'] = redir_live_urls['crawl_depth'].astype('category')\nredir_live_urls['crawl_depth'] = redir_live_urls['crawl_depth'].cat.reorder_categories(['0', '1', '2', '3', '4',\n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'5', '6', '7', '8', '9',\n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'10', '11', '12', '13', '14',\n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'Not Set',\n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;])\nredir_live_urls = redir_live_urls.loc[redir_live_urls.host == website]\ndel redir_live_urls['host']\nprint(redir_live_urls.shape)\nRedir_live_urls\n\n(4055, 6)<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"335\" src=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/2_redir_live_urls-1024x335.png\" alt=\"\" class=\"wp-image-1881\" srcset=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/2_redir_live_urls-1024x335.png 1024w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/2_redir_live_urls-300x98.png 300w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/2_redir_live_urls-768x252.png 768w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/2_redir_live_urls-1536x503.png 1536w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/2_redir_live_urls.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>We now have a more streamlined data frame after filtering rows for indexable URLs and selecting the relevant columns (think Pandas version of a spreadsheet tab).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Investigating the Distribution of Internal\u00a0Links<\/h2>\n\n\n\n<p>We&#8217;re now ready to data visualize the data and see how the internal links are distributed overall and by site depth.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">from plotnine import *\r\nimport matplotlib.pyplot as plt\r\npd.set_option('display.max_colwidth', None)\r\n%matplotlib inline\r\n\r\n# Distribution of internal links to URL by site level\r\nove_intlink_dist_plt = (ggplot(redir_live_urls, aes(x = 'no_internal_links_to_url')) + \r\n                    geom_histogram(fill = 'blue', alpha = 0.6, bins = 7) +\r\n                    labs(y = '# Internal Links to URL') + \r\n                    theme_classic() +            \r\n                    theme(legend_position = 'none')\r\n                   )\r\n\r\nove_intlink_dist_plt<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"977\" src=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_overall_intlink_dist_plt-1024x977.png\" alt=\"\" class=\"wp-image-1882\" srcset=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_overall_intlink_dist_plt-1024x977.png 1024w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_overall_intlink_dist_plt-300x286.png 300w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_overall_intlink_dist_plt-768x733.png 768w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_overall_intlink_dist_plt-1536x1466.png 1536w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_overall_intlink_dist_plt.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>We can see from the above that most pages have no links, so improving internal linking would be a significant opportunity to improve SEO here.<\/p>\n\n\n\n<p>Let&#8217;s look at some statistics at the site level.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">crawl_depth\r\n0             1\r\n1            70\r\n10            5\r\n11            1\r\n12            1\r\n13            2\r\n14            1\r\n2           303\r\n3           378\r\n4           347\r\n5           253\r\n6           194\r\n7            96\r\n8            33\r\n9            19\r\nNot Set    2351\r\ndtype: int64<\/pre>\n\n\n\n<p>The table above depicts the approximate distribution of internal links by site level, including the average (mean) and median values (50 percent quantile).<\/p>\n\n\n\n<p>This is in addition to the variation within the site level (std for standard deviation), which tells us how close the pages within the site level are to the average; i.e., how consistent the internal link distribution is with the average.<\/p>\n\n\n\n<p>Except for the home page (crawl depth 0) and the first level pages (crawl depth 1), we can deduce that the average by site-level ranges from 0 to 4 per URL.<\/p>\n\n\n\n<p>For a more visual approach, consider:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Distribution of internal links to URL by site level\r\nintlink_dist_plt = (ggplot(redir_live_urls, aes(x = 'crawl_depth', y = 'no_internal_links_to_url')) + \r\n                    geom_boxplot(fill = 'blue', alpha = 0.8) +\r\n                    labs(y = '# Internal Links to URL', x = 'Site Level') + \r\n                    theme_classic() +            \r\n                    theme(legend_position = 'none')\r\n                   )\r\n\r\nintlink_dist_plt.save(filename = 'images\/1_intlink_dist_plt.png', height=5, width=5, units = 'in', dpi=1000)\r\nintlink_dist_plt<\/pre>\n\n\n\n<p>The plot above confirms our previous observations that the home page and the pages directly linked from it receive the lion&#8217;s share of the links.<\/p>\n\n\n\n<p>With the scales as they are, we don&#8217;t have a good idea of how the lower levels are distributed. We&#8217;ll fix this by taking the y axis logarithm:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Distribution of internal links to URL by site level\r\nfrom mizani.formatters import comma_format\r\n\r\nintlink_dist_plt = (ggplot(redir_live_urls, aes(x = 'crawl_depth', y = 'no_internal_links_to_url')) + \r\n                    geom_boxplot(fill = 'blue', alpha = 0.8) +\r\n                    labs(y = '# Internal Links to URL', x = 'Site Level') + \r\n                    scale_y_log10(labels = comma_format()) + \r\n                    theme_classic() +            \r\n                    theme(legend_position = 'none')\r\n                   )\r\n\r\nintlink_dist_plt.save(filename = 'images\/1_log_intlink_dist_plt.png', height=5, width=5, units = 'in', dpi=1000)\r\nintlink_dist_plt<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"944\" src=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_intlink_dist_plt-1024x944.png\" alt=\"\" class=\"wp-image-1883\" srcset=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_intlink_dist_plt-1024x944.png 1024w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_intlink_dist_plt-300x277.png 300w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_intlink_dist_plt-768x708.png 768w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_intlink_dist_plt-1536x1416.png 1536w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_intlink_dist_plt.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>The graph above depicts the same distribution of the links in a logarithmic view, which helps us confirm the distribution averages for the lower levels. This is much easier to picture.<\/p>\n\n\n\n<p>The disparity between the first two site levels and the remaining site suggests a skewed distribution.<\/p>\n\n\n\n<p>As a result, I&#8217;ll logarithmize the internal links to help normalize the distribution.<\/p>\n\n\n\n<p>We now have the normalized number of links, which we will depict:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Distribution of internal links to URL by site level\r\nintlink_dist_plt = (ggplot(redir_live_urls, aes(x = 'crawl_depth', y = 'log_intlinks')) + \r\n                    geom_boxplot(fill = 'blue', alpha = 0.8) +\r\n                    labs(y = '# Log Internal Links to URL', x = 'Site Level') + \r\n                    #scale_y_log10(labels = comma_format()) + \r\n                    theme_classic() +            \r\n                    theme(legend_position = 'none')\r\n                   )\r\n\r\nintlink_dist_plt<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"991\" src=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1c_loglinks_dist_plt-1024x991.png\" alt=\"\" class=\"wp-image-1884\" srcset=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1c_loglinks_dist_plt-1024x991.png 1024w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1c_loglinks_dist_plt-300x290.png 300w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1c_loglinks_dist_plt-768x744.png 768w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1c_loglinks_dist_plt-1536x1487.png 1536w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1c_loglinks_dist_plt.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>The distribution appears to be less skewed from the above, as the boxes (interquartile ranges) have a more gradual step change from site to site.<\/p>\n\n\n\n<p>This sets us up nicely for analyzing the data and determining which URLs are under-optimized in terms of internal links.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Quantifying the Problems<\/h2>\n\n\n\n<p>For each site depth, the code below will compute the lower 35th quantile (data science term for percentile).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># internal links in under\/over indexing at site level\r\n# count of URLs under indexed for internal link counts\r\n\r\nquantiled_intlinks = redir_live_urls.groupby('crawl_depth').agg({'log_intlinks': \r\n                                                                 [quantile_lower]}).reset_index()\r\nquantiled_intlinks = quantiled_intlinks.rename(columns = {'crawl_depth_': 'crawl_depth', \r\n                                                          'log_intlinks_quantile_lower': 'sd_intlink_lowqua'})\r\nquantiled_intlinks<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"474\" height=\"940\" src=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/2_lowqua.png\" alt=\"\" class=\"wp-image-1885\" srcset=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/2_lowqua.png 474w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/2_lowqua-151x300.png 151w\" sizes=\"auto, (max-width: 474px) 100vw, 474px\" \/><\/figure><\/div>\n\n\n\n<p>The calculations are shown above. At this point, the numbers are meaningless to an SEO practitioner because they are arbitrary and serve only to provide a cut-off for under-linked URLs at each site level.<\/p>\n\n\n\n<p>Now that we have the table, we&#8217;ll combine it with the main data set to determine whether the URL is under-linked row by row.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># join quantiles to main df and then count\r\nredir_live_urls_underidx = redir_live_urls.merge(quantiled_intlinks, on = 'crawl_depth', how = 'left')\r\n\r\nredir_live_urls_underidx['sd_int_uidx'] = redir_live_urls_underidx.apply(sd_intlinkscount_underover, axis=1)\r\nredir_live_urls_underidx['sd_int_uidx'] = np.where(redir_live_urls_underidx['crawl_depth'] == 'Not Set', 1,\r\n                                                   redir_live_urls_underidx['sd_int_uidx'])\r\n\r\nredir_live_urls_underidx<\/pre>\n\n\n\n<p>We now have a data frame with each URL marked as under-linked as a 1 in the &#8220;sd int uidx&#8217; column.<\/p>\n\n\n\n<p>This allows us to calculate the number of under-linked site pages by site depth:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Summarise int_udx by site level\r\nintlinks_agged = redir_live_urls_underidx.groupby('crawl_depth').agg({'sd_int_uidx': ['sum', 'count']}).reset_index()\r\nintlinks_agged = intlinks_agged.rename(columns = {'crawl_depth_': 'crawl_depth'})\r\nintlinks_agged['sd_uidx_prop'] = intlinks_agged.sd_int_uidx_sum \/ intlinks_agged.sd_int_uidx_count * 100\r\nprint(intlinks_agged)<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"> &nbsp;crawl_depth&nbsp; sd_int_uidx_sum&nbsp; sd_int_uidx_count&nbsp; sd_uidx_prop\n0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp; 0.000000\n1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 41 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 70 &nbsp; &nbsp; 58.571429\n2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 66&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 303 &nbsp; &nbsp; 21.782178\n3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 110&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 378 &nbsp; &nbsp; 29.100529\n4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 109&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 347 &nbsp; &nbsp; 31.412104\n5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 68&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 253 &nbsp; &nbsp; 26.877470\n6&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 63&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 194 &nbsp; &nbsp; 32.474227\n7&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 7&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 9 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 96&nbsp; &nbsp; &nbsp; 9.375000\n8&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 8&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 33 &nbsp; &nbsp; 18.181818\n9&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 9&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 19 &nbsp; &nbsp; 31.578947\n10&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 10&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5&nbsp; &nbsp; &nbsp; 0.000000\n11&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 11&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp; 0.000000\n12&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 12&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp; 0.000000\n13&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 13&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp; 0.000000\n14&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 14&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp; 0.000000\n15 &nbsp; &nbsp; Not Set &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2351 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2351&nbsp; &nbsp; 100.000000<\/pre>\n\n\n\n<p>We can now see that, even though the site depth 1 page has a higher than average number of links per URL, 41 pages are under-linked.<\/p>\n\n\n\n<p>To be more specific:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># plot the table\ndepth_uidx_plt = (ggplot(intlinks_agged, aes(x = 'crawl_depth', y = 'sd_int_uidx_sum')) + \n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;geom_bar(stat = 'identity', fill = 'blue', alpha = 0.8) +\n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;labs(y = '# Under Linked URLs', x = 'Site Level') +&nbsp;\n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;scale_y_log10() +&nbsp;\n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;theme_classic() +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;theme(legend_position = 'none')\n &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)\n\ndepth_uidx_plt.save(filename = 'images\/1_depth_uidx_plt.png', height=5, width=5, units = 'in', dpi=1000)\ndepth_uidx_plt<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"959\" src=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_depth_uidx_plt-1024x959.png\" alt=\"\" class=\"wp-image-1886\" srcset=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_depth_uidx_plt-1024x959.png 1024w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_depth_uidx_plt-300x281.png 300w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_depth_uidx_plt-768x720.png 768w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_depth_uidx_plt-1536x1439.png 1536w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/1_depth_uidx_plt.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>The distribution of under-linked URLs appears normal, except the XML sitemap URLs, as indicated by the near bell shape. The majority of the unlinked URLs are in site levels 3 and 4.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Exporting a List of Underlying URLs<\/h2>\n\n\n\n<p>Now that we&#8217;ve identified the under-linked URLs by site level, we can export the data and devise creative solutions to bridge the site depth gaps, as shown below.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># data dump of under performing backlinks\nunderlinked_urls = redir_live_urls_underidx.loc[redir_live_urls_underidx.sd_int_uidx == 1]\nunderlinked_urls = underlinked_urls.sort_values(['crawl_depth', 'no_internal_links_to_url'])\nunderlinked_urls.to_csv('exports\/underlinked_urls.csv')\nunderlinked_urls<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"442\" src=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/3_export_underlinked_urls-1024x442.png\" alt=\"\" class=\"wp-image-1887\" srcset=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/3_export_underlinked_urls-1024x442.png 1024w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/3_export_underlinked_urls-300x130.png 300w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/3_export_underlinked_urls-768x332.png 768w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/3_export_underlinked_urls-1536x663.png 1536w, https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/3_export_underlinked_urls.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Other Internal Linking Data Science Techniques<\/h2>\n\n\n\n<p>We briefly discussed why improving a site&#8217;s internal links is important before delving into how internal links are distributed across the site by site level.<\/p>\n\n\n\n<p>Then, before exporting the results for recommendations, we quantified the extent of the under-linking problem both numerically and visually.<\/p>\n\n\n\n<p>Naturally, site-level internal links are only one aspect of internal links that can be statistically explored and analyzed.<\/p>\n\n\n\n<p>Other factors that could be used to apply data science techniques to internal links include, but are not limited to:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Page-level authority offsite.<\/li><li>The importance of anchor text.<\/li><li>Intention to search.<\/li><li>Look for the user journey.<\/li><\/ul>\n\n\n\n<p>Need help with our free SEO tools? Try our free&nbsp;<a href=\"https:\/\/seopolarity.com\/plagiarism-checker\" target=\"_blank\" rel=\"noreferrer noopener\">Plagiarism Checker<\/a>,&nbsp;<a href=\"https:\/\/seopolarity.com\/article-rewriter\" target=\"_blank\" rel=\"noreferrer noopener\">Article Rewriter<\/a>,&nbsp;<a href=\"https:\/\/seopolarity.com\/word-counter\" target=\"_blank\" rel=\"noreferrer noopener\">Word Counter<\/a>.<a href=\"https:\/\/seopolarity.com\/article-rewriter\"><\/a><ins><\/ins><\/p>\n\n\n\n<p>Learn more from <a href=\"https:\/\/seopolarity.com\/blog\/category\/seo\/\">SEO <\/a>and read <a href=\"https:\/\/seopolarity.com\/blog\/11-methods-for-creating-a-google-algorithm-update-resistant-seo-strategy\/2021\/\">11 Methods for Creating a Google Algorithm Update Resistant SEO Strategy<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn the data-driven approach to improving a website&#8217;s internal linking for more effective technical site SEO.<\/p>\n","protected":false},"author":1,"featured_media":1889,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[24],"tags":[],"class_list":["post-1876","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-seo"],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.9 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A Data Science Approach to Internal Link Structure Optimization | SEOPolarity<\/title>\n<meta name=\"description\" content=\"Learn the data-driven approach to improving a website&#039;s internal linking for more effective technical site SEO.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Data Science Approach to Internal Link Structure Optimization\" \/>\n<meta property=\"og:description\" content=\"Learn the data-driven approach to improving a website&#039;s internal linking for more effective technical site SEO.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/\" \/>\n<meta property=\"og:site_name\" content=\"SEOPolarity\" \/>\n<meta property=\"article:published_time\" content=\"2021-11-23T09:04:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-11-25T09:39:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"675\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"seoadmin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"A Data Science Approach to Internal Link Structure Optimization\" \/>\n<meta name=\"twitter:description\" content=\"Learn the data-driven approach to improving a website&#039;s internal linking for more effective technical site SEO.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"seoadmin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/\"},\"author\":{\"name\":\"seoadmin\",\"@id\":\"https:\/\/seopolarity.com\/blog\/#\/schema\/person\/95a6a9a6680ce217386574a4984fa538\"},\"headline\":\"A Data Science Approach to Internal Link Structure Optimization\",\"datePublished\":\"2021-11-23T09:04:00+00:00\",\"dateModified\":\"2021-11-25T09:39:02+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/\"},\"wordCount\":1044,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg\",\"articleSection\":[\"SEO\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/\",\"url\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/\",\"name\":\"A Data Science Approach to Internal Link Structure Optimization | SEOPolarity\",\"isPartOf\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg\",\"datePublished\":\"2021-11-23T09:04:00+00:00\",\"dateModified\":\"2021-11-25T09:39:02+00:00\",\"description\":\"Learn the data-driven approach to improving a website's internal linking for more effective technical site SEO.\",\"breadcrumb\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#primaryimage\",\"url\":\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg\",\"contentUrl\":\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg\",\"width\":1200,\"height\":675,\"caption\":\"A Data Science Approach to Internal Link Structure Optimization\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/seopolarity.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A Data Science Approach to Internal Link Structure Optimization\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/seopolarity.com\/blog\/#website\",\"url\":\"https:\/\/seopolarity.com\/blog\/\",\"name\":\"SEO Polarity Blog\",\"description\":\"Free Online SEO Tools &amp; Blogs\",\"publisher\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/seopolarity.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/seopolarity.com\/blog\/#organization\",\"name\":\"SEO Polarity\",\"url\":\"https:\/\/seopolarity.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/seopolarity.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/10\/seopolarity-logo-header.png\",\"contentUrl\":\"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/10\/seopolarity-logo-header.png\",\"width\":193,\"height\":30,\"caption\":\"SEO Polarity\"},\"image\":{\"@id\":\"https:\/\/seopolarity.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/seopolarity.com\/blog\/#\/schema\/person\/95a6a9a6680ce217386574a4984fa538\",\"name\":\"seoadmin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/seopolarity.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/779772115dd1b79d5fbb91a1e2d3acb5318c780d6986d556300e1902f867ee5b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/779772115dd1b79d5fbb91a1e2d3acb5318c780d6986d556300e1902f867ee5b?s=96&d=mm&r=g\",\"caption\":\"seoadmin\"},\"sameAs\":[\"https:\/\/seopolarity.com\/blog\"],\"url\":\"https:\/\/seopolarity.com\/blog\/author\/seoadmin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Data Science Approach to Internal Link Structure Optimization | SEOPolarity","description":"Learn the data-driven approach to improving a website's internal linking for more effective technical site SEO.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/","og_locale":"en_US","og_type":"article","og_title":"A Data Science Approach to Internal Link Structure Optimization","og_description":"Learn the data-driven approach to improving a website's internal linking for more effective technical site SEO.","og_url":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/","og_site_name":"SEOPolarity","article_published_time":"2021-11-23T09:04:00+00:00","article_modified_time":"2021-11-25T09:39:02+00:00","og_image":[{"width":1200,"height":675,"url":"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg","type":"image\/jpeg"}],"author":"seoadmin","twitter_card":"summary_large_image","twitter_title":"A Data Science Approach to Internal Link Structure Optimization","twitter_description":"Learn the data-driven approach to improving a website's internal linking for more effective technical site SEO.","twitter_image":"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg","twitter_misc":{"Written by":"seoadmin","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#article","isPartOf":{"@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/"},"author":{"name":"seoadmin","@id":"https:\/\/seopolarity.com\/blog\/#\/schema\/person\/95a6a9a6680ce217386574a4984fa538"},"headline":"A Data Science Approach to Internal Link Structure Optimization","datePublished":"2021-11-23T09:04:00+00:00","dateModified":"2021-11-25T09:39:02+00:00","mainEntityOfPage":{"@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/"},"wordCount":1044,"commentCount":2,"publisher":{"@id":"https:\/\/seopolarity.com\/blog\/#organization"},"image":{"@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#primaryimage"},"thumbnailUrl":"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg","articleSection":["SEO"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/","url":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/","name":"A Data Science Approach to Internal Link Structure Optimization | SEOPolarity","isPartOf":{"@id":"https:\/\/seopolarity.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#primaryimage"},"image":{"@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#primaryimage"},"thumbnailUrl":"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg","datePublished":"2021-11-23T09:04:00+00:00","dateModified":"2021-11-25T09:39:02+00:00","description":"Learn the data-driven approach to improving a website's internal linking for more effective technical site SEO.","breadcrumb":{"@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#primaryimage","url":"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg","contentUrl":"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg","width":1200,"height":675,"caption":"A Data Science Approach to Internal Link Structure Optimization"},{"@type":"BreadcrumbList","@id":"https:\/\/seopolarity.com\/blog\/a-data-science-approach-to-internal-link-structure-optimization\/2021\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/seopolarity.com\/blog\/"},{"@type":"ListItem","position":2,"name":"A Data Science Approach to Internal Link Structure Optimization"}]},{"@type":"WebSite","@id":"https:\/\/seopolarity.com\/blog\/#website","url":"https:\/\/seopolarity.com\/blog\/","name":"SEO Polarity Blog","description":"Free Online SEO Tools &amp; Blogs","publisher":{"@id":"https:\/\/seopolarity.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/seopolarity.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/seopolarity.com\/blog\/#organization","name":"SEO Polarity","url":"https:\/\/seopolarity.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/seopolarity.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/10\/seopolarity-logo-header.png","contentUrl":"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/10\/seopolarity-logo-header.png","width":193,"height":30,"caption":"SEO Polarity"},"image":{"@id":"https:\/\/seopolarity.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/seopolarity.com\/blog\/#\/schema\/person\/95a6a9a6680ce217386574a4984fa538","name":"seoadmin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/seopolarity.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/779772115dd1b79d5fbb91a1e2d3acb5318c780d6986d556300e1902f867ee5b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/779772115dd1b79d5fbb91a1e2d3acb5318c780d6986d556300e1902f867ee5b?s=96&d=mm&r=g","caption":"seoadmin"},"sameAs":["https:\/\/seopolarity.com\/blog"],"url":"https:\/\/seopolarity.com\/blog\/author\/seoadmin\/"}]}},"jetpack_featured_media_url":"https:\/\/seopolarity.com\/blog\/wp-content\/uploads\/2021\/11\/a-data-science-approach-to-internal-link-structure-optimization.jpg","jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/posts\/1876","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/comments?post=1876"}],"version-history":[{"count":4,"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/posts\/1876\/revisions"}],"predecessor-version":[{"id":1888,"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/posts\/1876\/revisions\/1888"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/media\/1889"}],"wp:attachment":[{"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/media?parent=1876"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/categories?post=1876"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/seopolarity.com\/blog\/wp-json\/wp\/v2\/tags?post=1876"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}