Screaming Frog: Business Owners Guide To SEO

The first episode in the new series of the Click Intelligence Notebook focusing on “Business Owners Guide To SEO”. In the first episode James Owen talks about Screaming Frog, the desktop based web crawler. Free download of Screaming Frog Transcription 00:00 Hi everyone, and welcome to the new series of the Click Intelligence Notebook. In the…

Reading Time: 9 minutes

Read on

Free download of Screaming Frog

Transcription

00:00 Hi everyone, and welcome to the new series of the Click Intelligence Notebook. In the new series, we’re focusing on the business owner’s guide to SEO.

In this video, I’m going to be talking about Screaming Frog.

Screaming Frog is a desktop-based web crawler that allows you to crawl, explore and, investigate website URL data. This video I’m going to be talking to you about the basic feature of Screaming Frog, along with some advanced features. So, let’s jump right into it.

Screaming Frog has two license levels. We’ve got a free version and the paid version. The free version allows you to crawl up to five-hundred pages, but doesn’t let you get involved in any interesting configuration in their settings. So, it’s great to get started with if you’re just starting out your journey in SEO or online marketing, but you’re going to need a full license, the paid version, if you want to get the advanced features.

In this video, I’m going to be looking at the paid version. For this example, I’m using pcworld.co.uk. To start crawling, you cut and paste the URL you are about to crawl into the search boxA search box is an empty text field that allows the user to type a query for the website. and click start. After a few seconds, you will start to see some of the URL has been crawled, and the data has started to be collected. This is kind of the main kind of dashboard, and kind of the background of what you are going to see throughout the current process.

So, let’s talk us through this. So, at the top navigation, we’ve got address, content, status code, status type, title one, title one length, title pixel width, meta descriptionWebsite meta descriptions are used to inform web users on search engines about the content of webpages., meta description length, and so on and so forth. We’ve got keywords also, we’ve talked about H1s, H2s, and also meta robots which, again, are very important, and canonical information. And down to here, any other redirect information, redirect URL.

On the right-hand side we have total of the data type we are collecting, and just below, if you highlight a URL, you are then presented with the information here in terms of 301s, and any other information that Screaming Frog has been able to pick up. So, we’ll stop it right there, as we’ve got some data to play with here.

So, at the moment, we’ve got this as internal only, so this is only going to show us URLs that are internal URLs of that domain. So, only internal URLs that start with pcworld.co.uk/. If we want to see what URLs are being crawled then pointing to other websites, as you can see here, PC World are using pcworld.cdns, and .dixons.com, which is going to be an image, and so good use of cdns there for page load and speed.

If you go down here you’ve Curry’s, and you get the right idea, you can see exactly where they’ve got linksHyperlinks, also known as links, are the connection points on a webpage that take you to other webpages. pointing from their domains to other domains. Perfect.

Got protocol, response codes, URL, page titles if you don’t want to see that, meta descriptions, meta keywordsMeta keywordsare is an HTML tag that helps search engines categorize what your page is about., H1s, H2s, images, Hreflang, Ajax customer list and search console, if you wanted that information.

You can actually also export this information, literally just this table you can see on screen at any time by clicking on the export button here.

So, let’s go down and start looking at a bit more of the advanced features. So at the moment I’ve got this as a straight crawl. I’m not asking Screaming Frog to do to anything special. I’m just literally doing a very basic crawl. So what happens let’s say if we don’t want to view images, we don’t want to view any of the files, but just the raw HTML, kind of landing pages that would view on a website.

So, to do that, we’ve got to make sure we have a sitemapA site map is a list of pages on your site and their relative importance. Use sitemaps to make it easier for visitors to find what they're looking for, and to ensure that search engines can find all the information they need.. So, PC World kindly have enough have a sitemap, and so we’re going to clear this here, and what we do here, we go to mode, list, and then we have an option here, so from file, enter manually, paste, download sitemap, download sitemap index.

PC World don’t actually have a sitemap index; they just literally have a download sitemap. URL, which is this. So, in this scenario, let’s say you did have lots of indoor sitemaps for your categories, you will then be able to literally select a category sitemap and put it in here to crawl a certain category you wanted to crawl. PC World don’t have that feature for a sitemap, so sitemaps aren’t split into a granular fashion, so we would literally have a sitemap.xml URL, and we would click on this, and it literally brings me twelve-thousand nine-hundred and thirty-four URLs that are in that sitemap. This is literally just for sitemap URLs; this is not the images or any other files, comment file, URLs.

So, we’re going to go like that again. If I was PC World I could easily switch on my sitemaps. See some 301s there, some 302s and some 404s in there. As everyone knows, you will want to make sure you only have two two, sorry, two two-hundreds, sites go two-hundreds where you can do.

So this is a good example of kind of how you would, y’know, give structured information for Screaming Frog to crawl. Another example would be where, let’s say, you’ve got, y’know, you’ve got your sitemap, like so, let’s move it over, in that Excel format and you only wanted to, let’s say, crawl, let’s say, top hundred or something. So, you’d select the top hundred. Well, there’s probably more than a hundred now. Okay, it doesn’t really matter. A hundred odd. Let’s close that down.

I’m going to stop this again, clear. I’m going to upload. We’re going to go paste this time. I’ve just pasted three-hundred or four-hundred URLs from Excel, and, so you can see, that might be useful, let’s say you’ve got all your download crawl and you start splitting out all your URLs into sitemaps yourself. You do that and you kind of click okay. Let it crawl the three-hundred. It might take a bit of time. I’ll show you in a minute, in a little while, how to actually speed up Screaming Frog so you can do this process. So, let’s say that’s fine, we can stop there.

Let’s say you want to create a sitemap out of that data now. You click on sitemaps, create xml sitemap. Yeah, that’s fine. Settings is cool. Desktop, sitemap. Yeah, let’s call it one, for instance, or you could do it as product number pages. Whatever you want to call it. And then you go Save. And that then goes on desktop in a nice, clean xml format. HML file for you, in a xml file for you. So, that’s really useful.

Let’s just jump into some additional features here. So let’s go over to clear, mode, spider. Let’s just grab PC World again, because I want to start looking at, potentially, okay, we can do a normal crawl, but actually we can introduce some additional features. So, let that, get some data there, so let’s do configuration.

So, this is a really useful feature. Yes, you can tell Screaming Frog to only to crawl certain URLs if you only input that information in Screaming Frog, but then you can actually say to Screaming Frog “I don’t want to view images. Don’t care about CSS. Don’t want to JavaScriptJavascript is one of the web’s key programming languages but coding it incorrectly can cost you in technical SEO terms. In order to avoid common mistakes, and understand how Java operates, here’s a quick guide to coding Java the right way.. Don’t really care about external links, either.” And you can start to, let’s say, don’t crawl canonicals. There we go, okay, again, we will resume, you would actually get a far less amount of data and you can rethink everything down to what you actually care about.

So, let’s start going over it again. Let’s rethink the configuration. Let’s just click on the other stuff again, because, we don’t really care, that’s fine. We can talk about limits as well, so Limit Search Total, Limit Search Depth, so if you only want kind of start, if you only want to go down three of four categories on the whole website, you can kind of conform it to do what you need to do. Max Length of the Crawl, Limit Max Folder Depth, Limit Number of Query Strings, and so on and so forth. Rendering, so you can change the rendering there. And then a lot more advanced, so Respect NorelsPorev, so, y’know, if you have got a really recent website, y’know, probably your standard features will do you really well. If you have got more than ten-thousand URLs, and you want to start getting really, kind of, advance with your crawling, then you would kind of delve into this and kind of make some configuration changes.

So, if I was you, I normally hang around in the basics. Play around a bit with my limits. A couple of clients I’d probably work on advanced, but normally it’s just basic limits will get the job done for you.

I think one recent example was one when I used the basic configuration, is where a client’s come along and said “Look, we’ve got a new website. It’s on a station server; Google can’t view it, but can you start with a migration job?” That’s, y’know, completely fine.

So, at that point, what you want to be looking at is, y’know, if it’s being blocked by Google, and you put it in Screaming Frog, you’re basically going to get nothing, it will be zero, because Screaming Frog won’t crawl anything back that Google can’t crawl. However, what you can do in here is to say that you want to not to follow ‘internal nofollowThe nofollow attribute is an HTML tag that tells search engine crawlers not to follow a link..’ And, y’know, not to follow ‘external nofollow.’ And basically click on to say that you actually want to follow it. So then, actually, at that point when you go and click on the start you can then basically, Screaming Frog will then ignore all the no follows and basically follow the whole URL. So that’s a really handy tip. So, when you have a station site, and you think “Well, I can’t crawl this because Google can’t crawl it”, well actually you can configure it so Screaming Frog can crawl it, and you can get all the data that you want and you can then export it, and from there you can start working on your kind of 301 migration document. So, that’s really useful. So, we’ve talked about kind of the basic limits for entering advanced kind of preferences within a site configuration.

Let’s jump into the speed, which is quite important. So, if you’ve got five-hundred URLs you are probably going to sell it off, go and get a cup of tea, come back, it’s done. If you’ve got more, let’s say between ten-thousand to a million URLs, it’s going to take some time. You have probably already, y’know, split out your sitemaps and then you work by crawling each sitemap as you go along. But if you want to speed things up, you can increase the amount of threads, so you can go to my fifty, and it will be a lot, lot quicker, and you’ll see down here… well, there you go, it speeds up nicely for you.

Again, of course, y’know, on an average website it should be fine. If there are any issues with the URLs or there’s lots of redirectsA redirect is a web page that loads in place of another web page., there’s lots of canonicals, it’s going to slow down the situation. But, there are things you can do with speed that I’ve said there. So, that’s useful.

So, you’ve got the speed, let’s go into HTTP header. So, let’s say you want to change the… I want to be GooglebotGooglebot is a web crawler. It follows links and indexes pages for search engines to make it easier for users to find what they're looking for.; I want to be BingBot, I want to be Slurp. So, you have the option to basically change your user agent, so you can check how your website and how it crawls to different user agents, so that’s useful.

And then last thing, I think, we want to discuss is probably the proxies, so if you feel that you’re kind of hitting a website quite often, you can always put the address and that kind of AIP address in there, as well.

screenshot of screaming frog home page

Other things to kind of note on this is the fact that it’s pretty much unlimited up to a point. If you’re on the website and it has kind of ten million URLs, you’re going to split down the sitemaps within this to be able for Screaming Frog to crawl. It will have issues with memory, but there is a really good FAQ section on the Screaming Frog website that tells you how to increase your memory, but you will have to save and then export and then restart from where you started before, so basically your PC doesn’t freeze.

So, I think that is kind of the advanced features and basic features of Screaming Frog. It’s a great kind of crawling piece of software. I use it on a daily basis. It’s great for finding kind of major problems, and, yeah, it saves me a huge amount of time.

So, thank you for listening, and we’ll catch you on the next video. Thank you. 15:13

James Owen

James has been involved in SEO and digital marketing projects since 2007. James has led many SEO projects for well-known brands in Travel, Gaming and Retail such as Jackpotjoy, Marriott, Intercontinental Hotels, Hotels.com, Expedia, Betway, Gumtree, 888, Ax Paris, Ebyuer, Ebay, Hotels combined, Smyths toys, love honey and Pearson to name a few. James has also been a speaker at SEO and digital marketing conferences and events such as Brighton SEO.

View all Downloads

Downloads

Google Guides SEO

Search in 2022: Key Findings So Far

The search engine landscape is constantly changing; search in 2022 is no exception. New trends and algorithm updates occur all the time in the search engine world.

Download →