There is a big value in understanding the hyperlink structure of our website as I pointed out in my PhD thesis. At the time I was doing it, getting the hyperlink structure of a website and visualize it was a tedious task. However, nowadays there are a bunch of free tools that facilitate this task.
In this post I will show how to visualize the link structure of a website using three tools:
- Screaming Frog SEO Spider to crawl the target website
- Google Refine to clean up data and obtain a file that can be visualized
- Gephi: to manipulate a visualize the graph of the website
Screaming Frog SEO Spider is extremely easy to use. Just type in the search box the website you want and press start
Once the software has crawled the target site, proceed to the extraction of the inlinks of the site using “Advanced Export” -> “Success (2xx) in Links”.
The CSV provided by Fro SEO Spider contains too much information for what we need, which is a CSV file with two columns: “destination page”;”target page”. Google Refine can help us in this process as can be seen in the following video
Gephi will be able to interpret our filtered CSV file and generate a graph out of it. There are a bunch of tutorials on the software’s website that you can read to layout and explore your graph. In my case, I’ve calculated the statistics “Modularity” which will help to color the nodes according to the cluster they belong to, and “Avg. Path Length”, which also calculates the Betweenness centrality, a measure that helps to discover the most relevant nodes in our graph according to their connectivity.
UPDATE: In order to use the SVGPan library with our Gephi generated SVG file, the tag “<script xlink:href=”SVGPan.js”/>” must be added to the SVG file. Moreover, fot SVGPan to work, the tag <g> with id “viewport” must be used”<g id=”viewport”>” encapsulating all the <g> objects existing in the file. See the source code of the file for more details.