Showing posts with label SEO. Show all posts
Showing posts with label SEO. Show all posts

Monday, January 05, 2015

Find Orphan pages on website

You have a website and want to be sure that there are no Orphan pages? Not sure how to develop it? If so - you found a right place to go :), let's talk about most important steps.
I assume you have a list of all pages you want to verify (otherwise - it will be your first task)

Here is a logic/snippets
  1. We need functionality that can extract HTML from a web page. Later we will scan it and get internal links.
    private String getPageContent(String pageurl) throws Exception {
       StringBuffer buf = new StringBuffer();
       URL url = new URL(pageurl);
       InputStream is = url.openConnection().getInputStream();
       BufferedReader reader = new BufferedReader( new InputStreamReader( is )  );
       String line = null;
       while( ( line = reader.readLine() ) != null )  {
          buf.append(line);
       }
       reader.close();  
       return buf.toString();
    }
  2. Make a logic that can deal with DOM. I use jsoup to manipulate with HTML and I really recommend it (easy and fast). The method below select all links that begin with baseurl (it's domain of your website), in that way we can cut all external links and get only internal links.
    private List<string> getAllInernalLinks(String html, String baseurl) throws Exception {
       List<string> res = new ArrayList<string>();
       String select = "a[href^="+baseurl+"]";
       org.jsoup.nodes.Document dom = Jsoup.parse(html);
       Elements links = dom.select(select);
    
       for (Element link : links) {
          String src = link.attr("href");
          res.add(src);
       }
       return res;
    }
  3. Now we must build a List with all internal links from all pages on your website.
    String List<string> alllinks = getAllInernalLinks(html_from_all_pages, baseurl);
  4. We need to make sure that pageurl can be found in alllinks more then in pagelinks (to avoid case when page has link to itself).
    private boolean isOrphan(List<string> pagelinks, List<string> alllinks, String url) throws Exception {
       if (Collections.frequency(alllinks, url) > Collections.frequency(pagelinks, url)) {
          return false;
       }
       return true;
    }

Tuesday, November 01, 2011

Solution for Lotus Domino to the trailing slash problem

2 months ago I've posted article about solution which can solve our problem in Domino with last trailing slash in URL. As I mentioned in my previous post, Domino does not care about url with/without last trailing slash, both way would work for Domino. Well, some of you maybe even find this feature useful because it gives less problems, however what if we talk about SEO (Search Engine Optimisation)?

There is actually article from google about to slash or not to slash, where they explain that they handle similar URL with and without last trailing slash as 2 different URL, they also propose ways how to solve this. If you ask what is bad here, answer is quite simple: just image that each URL has it's power/score for google. In case if we have only 1 possible URL for our page all score go to it, otherwise we will split them (high score will go to URL which is more often used by people in web).

Now about solution we implemented on our website. We used DSAPI to solve it as it does not do affect speed so much as different solutions. If you want to read more deeply about technical staff, you may want to read this article: solution to the trailing slash problem.

Yes it works like it should, no impact to page load's speed which was "very-very" important to us. And I heared google start to love us a bit more after that.

Related topics
DSAPI for Domino
Rewriting URL in Domino using DSAPI
Replacement for DSAPI in Java/XPages

Tuesday, September 20, 2011

Page-layout rel = «next | prev»

Along with the attribute rel = «canonical» to specify a 'search robot' to duplicate content, it is now possible to use the HTML reference value rel = "next" and rel = "prev" to indicate the position of the current page with neighbors in the navigation box. F.x. you have article splitted on 10 pages and you use next/prev buttons in order to move from one page to another. If you want to be sure that most of 'score' from 'google spider' will go to main page red="next"/"prev" is the very good options.

Here is more details about our example


Now about rel="next" and rel="prev"
www.example.com/article?id=123page=1
<link rel="next" href="http://www.example.com/article?id=123&page=2" />
www.example.com/article?id=123page=2
<link rel="next" href="http://www.example.com/article?id=123&page=3" />
<link rel="prev" href="http://www.example.com/article?id=123&page=1" />
www.example.com/article?id=123page=3
<link rel="next" href="http://www.example.com/article?id=123&page=4" />
<link rel="prev" href="http://www.example.com/article?id=123&page=2" />www.example.com/article?id=123page=10
<link rel="prev" href="http://www.example.com/article?id=123&page=9" />


Few notes:

  • First page should have only rel=«next». 
  • All pages between 1-st and last usually should have both only rel=«next» and rel="prev".
  • Last page should have olny rel=«prev». 
  • Allowed to use the value rel=«previous» as an alternative to rel=”prev”. Not sure if it has sence :) 
  • In case if you fail with that Google will continue to index your content with own heuristic logic.

Tuesday, August 02, 2011

Domino resolve URL with and without trailing as same URL, wrong?

We are faced with a problem (for us) with Domino. The problem is that Domino processes those 2 URL as similar same URL:

http://host/0/unid
http://host/0/unid/

the difference in 'trailing' (for those who thinks in same way as Domino :]). Well, it would be not a problem at all if you do internal staff, even very useful, problems start when we talk about SEO.

It actually affects our websites, as f.x. google 'eats' 2 URL (I mentioned above) as 2 different URL so their 'value' will be split on 2 part + 2 URL with same content also very bad as it is duplicate.

We should be able to either make 301's from the NOT CORRECT to the CORRECT URLs, or respond with a 404 (not found). Both options would be good.

Does anybody know ways how to do it?

[update from 28 October 2011] we've found solution with DSAPI filter, here you can read Solution to the trailing slash problem in Lotus Domino

Thursday, January 06, 2011

Improving SEO for blogspot bloggers. Lesson #1.

Everybody wants to have at least not worst SEO on own blogs. In case if you have you own blog on Domino or another platform, use your hands and do updates but if you are one who use blogspot, here it couple tips I would like to suggest to do. It is quite simple and fast.

1. Title tag. Check it on main page, does it says what you wanted?
looks on different pages as well. If you want to update it go to Design of your blog and open it as HTML, find area with title and replace it on such one

<b:if cond='data:blog.pageType == &quot;item&quot;'>
<title><data:blog.pageName/> | <data:blog.title/></title>
<b:else/>
<title><data:blog.pageTitle/> | Your additional keywords</title>
</b:if>

Actually you are free to put there whatever you want. Mostly all says that ideal length for title tag is 69-70 chars. So remember that and try to make it near to that.

2. By default blogspot does not include meta tags such as description and keywords, so we have to add that manually (at least my blog did not have these tags). Good length for this tags are: meta description ~156, meta keywords is about 180 chars. Here is snapshot of what you need to add to your template.

<b:if cond='data:blog.url == data:blog.homepageUrl'>
<meta content='bla bla bla' name='description'/>
<meta content='lotus, motus, fotus, focus, etc' name='keywords'/>

if you read logic carefully you will see that it says: add meta tags only for main page, that's because otherwise we will get these tags on all pages and that's very bad, google does not like same meta description on pages. that's duplicate and you will decrease PR. Later I will add some information how we can have unique meta tags on pages.

3. sitemap. I believe it is not necessary as we are already you google so it does this automatically, but anywhere, better to do :), I hope you have already enabled RSS on your blog. So just open Google Web Master add you site there and find menu Sitemap and try to add next /atom.xml?redirect=false&start-index=1&max-results=300you can update max-results as u wish. If everything went fine you should see Status 'OK' and number of of URL should be same as you have in you atom.xml.

4. that's is not linked to blogspot problem, but just remember that:
- Images should have title and alt.
- Tag <strong> is better that <b>, but as I see it requires go fix it into HTML of your entry.

Summary
These tips are legal and easy to implement, but remember, unique and interesting content is the key.