By conshelfwebguy on Monday, 02 February 2015
Posted in Technical Issues
Replies 7
Likes 0
Views 512
Votes 0
Hello,

This is getting really frustrating, the duplicate content issues within EasyBlog are making me want to drop this platform altogether. Based on the scan I ran on siteliner: http://www.siteliner.com/www.oceannews.com?siteliner=site-duplicate&siteliner-sort=match_words&siteliner-from=1&siteliner-message=frequency

I have 54% of my content flagged as duplicate. When you do a google search site:http://www.oceannews.com you can clearly see Google is indexing theses pages as well.

This is a huge problem that I need resolved ASAP. Please help.

Thank you,
John
Hello,

Hm, not really sure how to read the listings at http://www.siteliner.com/www.oceannews.com?siteliner=site-duplicate&siteliner-sort=match_words&siteliner-from=1&siteliner-message=frequency , mind providing some of the URLs that are marked as duplicate?

Also, what sitemap extension or how are the sitemaps being generated?
·
Tuesday, 03 February 2015 01:56
·
0 Likes
·
0 Votes
·
0 Comments
·
Hello,

We are using JSitemap Pro, here's a link to the XML version: https://oceannews.com/sitemap/xml?lang=en

To view the duplicate content on the urls in that list, you should be able to click the links and it will show you what is matched.

If you search site:http://www.oceannews.com you can see this link still showing up: https://www.oceannews.com/easyblog

and a bunch of pagination pages showing up in the index:
https://www.oceannews.com/defense/Page-54
https://www.oceannews.com/defense/Page-35
https://www.oceannews.com/defense/Page-9
https://www.oceannews.com/defense/Page-59
https://www.oceannews.com/defense/Page-44
and it goes on and on.

Thank you for taking a look into this. Very much appreciated.

John
·
Tuesday, 03 February 2015 02:07
·
0 Likes
·
0 Votes
·
0 Comments
·
Hello John,

I have been checking your site and it does seem like most of these urls are actually being generated by SH404. For instance, the menu "easyblog" doesn't exist on the site but when you access it via https://www.oceannews.com/easyblog , SH404 seems to be thinking that you are trying to access EasyBlog.

Can you please try to purge the cache in SH404?
·
Tuesday, 03 February 2015 15:39
·
0 Likes
·
0 Votes
·
0 Comments
·
I deleted all of the easyblog/xxxxx/xxxx URLs - I didn't want to purge all of the URLS because of the number of one that are indexed.

The biggest issue I see is Google indexing ALL of the pagination pages. It should only be indexing the actual articles and menu items.

The blogger listing URLS are another issue, they seem to get indexed and flagged as duplicate: https://www.oceannews.com/news/blogger/listings/ocean-news-and-technology
·
Tuesday, 03 February 2015 22:11
·
0 Likes
·
0 Votes
·
0 Comments
·
Hello,

Although this is a different issue, but very related. The print pages should not be picked up by the index, but they are. This should be on all print pages: <META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> --- How can this be achieved?

Thank you,
John
·
Tuesday, 03 February 2015 23:17
·
0 Likes
·
0 Votes
·
0 Comments
·
Hello John,

By default Easyblog has already added the meta for print page as you can see here: http://screen.stackideas.com/2015-02-04_1149.png .
·
Wednesday, 04 February 2015 11:54
·
0 Likes
·
0 Votes
·
0 Comments
·
Hello John,

I think the biggest issue that you face / have right now is the "URLs" that are being cached on SH404. Most of those duplicate links that you have reported seems to be "older" urls which was cached by SH404.

As for the pagination, I have applied some tweaks on your site (Which is what we will be adding internally as well) so that if you are on any other pages other than the first page, it would add the noindex on the page.
·
Wednesday, 04 February 2015 12:47
·
0 Likes
·
0 Votes
·
0 Comments
·
View Full Post