All Stories
Showing posts with label Web Design. Show all posts
Showing posts with label Web Design. Show all posts
Doctype is a special declaration at the very top of your webpage source, right above the <HTML> tag, that informs validators the rules in which to validate your page using, and for modern browsers (IE6+, Firefox, NS6+, Opera, IE5 Mac), whether to display your page in Quirks or Standards mode.
Below lists the major doctypes you can deploy on your webpage. All of them enters modern browsers into "Standards" mode when used.

 

HTML 5 doctype

HTML 5 advocates the use of the very simple doctype:
<!DOCTYPE HTML>
In fact, it refers to doctypes as a "mostly useless, but required, header" whose purpose is just to ensure browsers render web pages in the correct, standards compliant mode. The above doctype will do that, including in IE8. Ideally this should be your first choice for a doctype unless you need your webpages to validate in pre HTML 5 versions of the W3C validator (which may still be the case at the time of writing). For future proofing your web pages, however, this is the doctype to go with.

HTML 4.01 Transitional, Strict, Frameset

HTML 4.01 transitional doctype supports all attributes of HTML 4.01, presentational attributes, deprecated elements, and link targets. It is meant to be used for webpages that are transitioning to HTML 4.01 strict:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
HTML 4.01 Strict is a trimmed down version of HTML 4.01 with emphasis on structure over presentation. Deprecated elements and attributes (including most presentational attributes), frames, and link targets are not allowed. CSS should be used to style all elements:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
HTML 4.01 frameset is identical to Transitional above, except for the use of <frameset> over <body>:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">

XHTML 1.0 Transitional, Strict, Frameset

Use XHTML 1.0 Transitional when your webpage conforms to basic XHTML rules, but still uses some HTML presentational tags for the sake of viewers that don't support CSS:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 Use XHTML 1.0 Strict when your webpage conforms to XHTML rules and uses CSS for full separation between content and presentation:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML 1.0 frameset is identical to Transitional above, except in the use of the <frameset> tag over <body>:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

XHTML 1.1 DTD

XHTML 1.1 declaration. Visit the WC3 site for an overview and what's changed from 1.0:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

Useful Links

How to adding a doctype to your webpage

Doctype is a special declaration at the very top of your webpage source, right above the <HTML> tag, that informs validators the rules in which to validate your page using, and for modern browsers (IE6+, Firefox, NS6+, Opera, IE5 Mac), whether to display your page in Quirks or Standards mode.
Below lists the major doctypes you can deploy on your webpage. All of them enters modern browsers into "Standards" mode when used.

 

HTML 5 doctype

HTML 5 advocates the use of the very simple doctype:
<!DOCTYPE HTML>
In fact, it refers to doctypes as a "mostly useless, but required, header" whose purpose is just to ensure browsers render web pages in the correct, standards compliant mode. The above doctype will do that, including in IE8. Ideally this should be your first choice for a doctype unless you need your webpages to validate in pre HTML 5 versions of the W3C validator (which may still be the case at the time of writing). For future proofing your web pages, however, this is the doctype to go with.

HTML 4.01 Transitional, Strict, Frameset

HTML 4.01 transitional doctype supports all attributes of HTML 4.01, presentational attributes, deprecated elements, and link targets. It is meant to be used for webpages that are transitioning to HTML 4.01 strict:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
HTML 4.01 Strict is a trimmed down version of HTML 4.01 with emphasis on structure over presentation. Deprecated elements and attributes (including most presentational attributes), frames, and link targets are not allowed. CSS should be used to style all elements:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
HTML 4.01 frameset is identical to Transitional above, except for the use of <frameset> over <body>:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">

XHTML 1.0 Transitional, Strict, Frameset

Use XHTML 1.0 Transitional when your webpage conforms to basic XHTML rules, but still uses some HTML presentational tags for the sake of viewers that don't support CSS:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 Use XHTML 1.0 Strict when your webpage conforms to XHTML rules and uses CSS for full separation between content and presentation:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML 1.0 frameset is identical to Transitional above, except in the use of the <frameset> tag over <body>:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

XHTML 1.1 DTD

XHTML 1.1 declaration. Visit the WC3 site for an overview and what's changed from 1.0:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

Useful Links

Posted at 04:09 |  by Unknown
Another common and practical usage of SSI is to display basic information about your server or your visitors, such as the last modified date of the webpage, current server time, visitor IP address etc. These are all tasks that client side languages such as JavaScript cannot accomplish, but can be done using the #echo command of SSI.. Here's a quick overview of some of variables you can use with SSI's #echo to display useful information:

DATE_GMT The current server date, in Greenwich mean time. Format using #config.
DATE_LOCAL The current server date. Format using #config.
DOCUMENT_NAME The file name of the current document.
DOCUMENT_URI The virtual path of the current document.
LAST_MODIFIED The last modified date of the document. Format using #config.
HTTP_REFERER URL of the document the client derived from to get to current document.
REMOTE_ADDR IP address of the visitor.
#flashmod Command to display last modified date/time of a document or file on the server. Format using #config.
#fsize Command to display file size of a document or file. Format using #config.

To echo something using SSI, the syntax is:
<!--#echo var="VARIABLE HERE" -->
Lets see how to use these variables exactly.

Echo current server date and time

To display the current server date and time, use either the "DATE_GMT" or "DATE_LOCAL" variable. In its simplest form:
<!--#echo var="DATE_LOCAL" -->
Output: Saturday, 24-Aug-2013 04:49:10 MDT
Not bad eh for one simple line of code.

Echo last modified date of current document or file

It's very useful at times to show the last modified date of a web page:
This document last modified: 
<!--#echo var="LAST_MODIFIED" -->
Output: This document last modified: Saturday, 04-Mar-2006 01:41:24 MST

Echo last modified date of any document or file

You can also display the last modified date of any document or file on your server besides the present, by using another command called #flastmod instead of #echo:
greenday.mp3 last modified: <!--#flastmod file="grenday.mp3"-->
Index page last modified: <!--#flastmod virtual="/index.html"-->
Sample output: greenday.mp3 last modified Thursday, 06-Jan-2005 05:35:27 EST.

Echoing visitor IP address

This is also a commonly requested question and answer- how to display the user's IP address: 
Your IP:
 <!--#echo var="REMOTE_ADDR" -->
Output: Your IP: 117.253.218.65

Displaying file size of a document

Finally, you can display the file size of any document on your server using #echo, by using a different command called #fsize.
This document's file size: 
<!--#fsize file="current.shtml" -->
 The file size of main index page: 
<!--#fsize virtual="/index.shtml" -->
Sample output: This document's file size: 8.4K

Interesting Uses of SSI

The interesting thing to note about using the output commands of SSI is that they can be embedded anywhere inside your HTML source, even in unexpected places to do interesting things. For example, you can use SSI echo to populate a JavaScript variable with the visitor's IP address, then continue to use JavaScript to react accordingly:
<script type="text/javascript">
var userIP="<!--#echo var="REMOTE_ADDR" -->"
if (userIP=="list of bad IPs to check")
alert("You are not allowed on this site.")
</script>
Another unconventional usage is to pass the current server time to the JavaScript Date object, then use JavaScript to display the current live time of your server:
var currentime="<!--#echo var="DATE_LOCAL" -->"
var serverdate=new Date(currenttime)
//rest of script

 

Using #config to customize time format and more

On the previous page I showed you SSI's ability to output various server information, such as the size of a file, current date and time etc. This is all great stuff, but a question that quickly follows is "Can I customize the format of the output such as of the date and time?" Sorry, got to learn to just be content! Just kidding. Yes, it's certainly possible, thanks to another SSI command called #config. Take a look at this:
<!--#config timefmt="%m/%d/%y" -->
<!--#echo var="DATE_LOCAL" -->
Output: 08/24/13
Instead of a long string containing both the date and time, I've used #config to pound things into exactly the format I want. Lets see now the various parameters of the #config command at your disposal:

CODE PURPOSE OF CODE Sample output
%a abbreviated weekday name Sun
%A full weekday name Sunday
%b abbreviated month name Jan
%B full month name January
%c locale's appropriate date and time Sun Dec 28 04:45:57 2005
%d day of month - 01 to 31 25
%D date as %m/%d/%y 12/25/05
%e day of month - 1 to 31 25
%H hour - 00 to 23 15
%I hour - 01 to 12 03
%j day of year - 001 to 366 361
%m month of year - 01 to 12 12
%M minute - 00 to 59 09
%n insert a newline character  
%p string containing AM or PM PM
%r time as %I:%M:%S %p 06:08:05 PM
%R time as %H:%M 15:09
%S second - 00 to 59 02
%t insert a tab character  
%T time as %H:%M:%S 15:21:07
%U week number of year (Sunday is the first day of the week) - 00 to 53 52
%w day of week - Sunday=0 0
%W week number of year (Monday is the first day of the week) - 00 to 53 51
%x Country-specific date format 12/25/05
%X Country-specific time format 04:50:29
%y year within century - 00 to 99 05
%Y year as CCYY (4 digits) 2005
%Z timezone name PST

Here are a couple more examples:
<!--#config timefmt="%A %d %B, %Y" -->
<!--#echo var="DATE_LOCAL" -->
Output: Saturday 24 August, 2013
<!--#config timefmt="%D %r"-->
This document last modified:
<!--#echo var="LAST_MODIFIED" -->
Output: This document last modified: 03/04/06 01:41:24 AM

Formatting file size with #config

So far on this page I've only used the #config command to format time related output. But you can also use this command on file size output:
<!--#config sizefmt="abbrev"-->
<!--#fsize file="current.shtml" -->
<!--#config sizefmt="bytes"-->
<!--#fsize file="current.shtml" -->
The first code tells the server to display the file size in abbreviated form, rounded to the nearest kilobytes. The second example obviously displays the size in bytes instead.

Echoing server information such as user IP, current date etc using SSI

Another common and practical usage of SSI is to display basic information about your server or your visitors, such as the last modified date of the webpage, current server time, visitor IP address etc. These are all tasks that client side languages such as JavaScript cannot accomplish, but can be done using the #echo command of SSI.. Here's a quick overview of some of variables you can use with SSI's #echo to display useful information:

DATE_GMT The current server date, in Greenwich mean time. Format using #config.
DATE_LOCAL The current server date. Format using #config.
DOCUMENT_NAME The file name of the current document.
DOCUMENT_URI The virtual path of the current document.
LAST_MODIFIED The last modified date of the document. Format using #config.
HTTP_REFERER URL of the document the client derived from to get to current document.
REMOTE_ADDR IP address of the visitor.
#flashmod Command to display last modified date/time of a document or file on the server. Format using #config.
#fsize Command to display file size of a document or file. Format using #config.

To echo something using SSI, the syntax is:
<!--#echo var="VARIABLE HERE" -->
Lets see how to use these variables exactly.

Echo current server date and time

To display the current server date and time, use either the "DATE_GMT" or "DATE_LOCAL" variable. In its simplest form:
<!--#echo var="DATE_LOCAL" -->
Output: Saturday, 24-Aug-2013 04:49:10 MDT
Not bad eh for one simple line of code.

Echo last modified date of current document or file

It's very useful at times to show the last modified date of a web page:
This document last modified: 
<!--#echo var="LAST_MODIFIED" -->
Output: This document last modified: Saturday, 04-Mar-2006 01:41:24 MST

Echo last modified date of any document or file

You can also display the last modified date of any document or file on your server besides the present, by using another command called #flastmod instead of #echo:
greenday.mp3 last modified: <!--#flastmod file="grenday.mp3"-->
Index page last modified: <!--#flastmod virtual="/index.html"-->
Sample output: greenday.mp3 last modified Thursday, 06-Jan-2005 05:35:27 EST.

Echoing visitor IP address

This is also a commonly requested question and answer- how to display the user's IP address: 
Your IP:
 <!--#echo var="REMOTE_ADDR" -->
Output: Your IP: 117.253.218.65

Displaying file size of a document

Finally, you can display the file size of any document on your server using #echo, by using a different command called #fsize.
This document's file size: 
<!--#fsize file="current.shtml" -->
 The file size of main index page: 
<!--#fsize virtual="/index.shtml" -->
Sample output: This document's file size: 8.4K

Interesting Uses of SSI

The interesting thing to note about using the output commands of SSI is that they can be embedded anywhere inside your HTML source, even in unexpected places to do interesting things. For example, you can use SSI echo to populate a JavaScript variable with the visitor's IP address, then continue to use JavaScript to react accordingly:
<script type="text/javascript">
var userIP="<!--#echo var="REMOTE_ADDR" -->"
if (userIP=="list of bad IPs to check")
alert("You are not allowed on this site.")
</script>
Another unconventional usage is to pass the current server time to the JavaScript Date object, then use JavaScript to display the current live time of your server:
var currentime="<!--#echo var="DATE_LOCAL" -->"
var serverdate=new Date(currenttime)
//rest of script

 

Using #config to customize time format and more

On the previous page I showed you SSI's ability to output various server information, such as the size of a file, current date and time etc. This is all great stuff, but a question that quickly follows is "Can I customize the format of the output such as of the date and time?" Sorry, got to learn to just be content! Just kidding. Yes, it's certainly possible, thanks to another SSI command called #config. Take a look at this:
<!--#config timefmt="%m/%d/%y" -->
<!--#echo var="DATE_LOCAL" -->
Output: 08/24/13
Instead of a long string containing both the date and time, I've used #config to pound things into exactly the format I want. Lets see now the various parameters of the #config command at your disposal:

CODE PURPOSE OF CODE Sample output
%a abbreviated weekday name Sun
%A full weekday name Sunday
%b abbreviated month name Jan
%B full month name January
%c locale's appropriate date and time Sun Dec 28 04:45:57 2005
%d day of month - 01 to 31 25
%D date as %m/%d/%y 12/25/05
%e day of month - 1 to 31 25
%H hour - 00 to 23 15
%I hour - 01 to 12 03
%j day of year - 001 to 366 361
%m month of year - 01 to 12 12
%M minute - 00 to 59 09
%n insert a newline character  
%p string containing AM or PM PM
%r time as %I:%M:%S %p 06:08:05 PM
%R time as %H:%M 15:09
%S second - 00 to 59 02
%t insert a tab character  
%T time as %H:%M:%S 15:21:07
%U week number of year (Sunday is the first day of the week) - 00 to 53 52
%w day of week - Sunday=0 0
%W week number of year (Monday is the first day of the week) - 00 to 53 51
%x Country-specific date format 12/25/05
%X Country-specific time format 04:50:29
%y year within century - 00 to 99 05
%Y year as CCYY (4 digits) 2005
%Z timezone name PST

Here are a couple more examples:
<!--#config timefmt="%A %d %B, %Y" -->
<!--#echo var="DATE_LOCAL" -->
Output: Saturday 24 August, 2013
<!--#config timefmt="%D %r"-->
This document last modified:
<!--#echo var="LAST_MODIFIED" -->
Output: This document last modified: 03/04/06 01:41:24 AM

Formatting file size with #config

So far on this page I've only used the #config command to format time related output. But you can also use this command on file size output:
<!--#config sizefmt="abbrev"-->
<!--#fsize file="current.shtml" -->
<!--#config sizefmt="bytes"-->
<!--#fsize file="current.shtml" -->
The first code tells the server to display the file size in abbreviated form, rounded to the nearest kilobytes. The second example obviously displays the size in bytes instead.

Posted at 03:58 |  by Unknown

Beginner's Guide to SSI (server side includes)

Don't worry, SSI doesn't require a rocket-science degree to understand and use. It is, however, a highly useful feature that lets you do incredibly time saving tasks such as include the contents of an external file across multiple pages on your site, or access and display server specific information such as the current server time, visitor's IP address, etc. In this tutorial I'll introduce new comers to the wonderful world of SSI! SSI is short for Server Side Includes, by the way.

Does my server support SSI?

The first thing that needs to be settled is whether your server supports SSI and have it enabled. SSI is a Linux/Apache specific feature, so if you're on a Windows server for example, you'll need to look for the Windows equivalent of SSI (sorry, not a Window's guy). To test if your server supports SSI then, you can run a simple test, by inserting the below code inside a webpage, and saving the page with a .shtml extension (the most common extension configured to parse SSI by default):
test.shtml source:
<!--#echo var="DATE_LOCAL" -->
When you run test.shtml in your browser, you should see the current date plus time of your server displayed:

Saturday, 24-Aug-2013 04:39:16 MDT

If not, you can either ask your web host about SSI support for your account, or try and manually enable SSI, by reading "Enabling SSI on my server."

With that said, lets explore some nifty abilities of SSI now.

Using SSI to include the contents of an external file

The most common usage of SSI is to include the contents of an external file onto a page or across multiple pages on your site. Modify the external file, and all pages that have this file embedded is also updated with the modified information. For a site that uses the same header, navigational menu, or footer across pages, for example, this can save you countless time and energy. The syntax to embed the contents of an external file onto the current page is:
<!--#include file="external.htm"-->
<!--#include virtual="/external.htm"-->
Which one to use depends on where "external.htm" is located. The first command assumes that the file is located in the same directory as the document containing it while the second syntax uses an absolute reference to "external.htm" starting from your root HTML directory. Typically you'll want to turn to the second syntax, as the external files to include most likely will be in a central directory while the pages including them are scattered across different directories. Here are a couple of more examples of the second syntax:
<!--#include virtual="/includes/navbar.txt"-->
<!--#include virtual="../navbar.txt"-->
With the first code, I'm telling the server to look for "navbar.txt" inside the "includes" directory that's directly beneath the root HTML directory (ie: http://www.mysite.com/includes), while in the second code, I'm simply telling it to look in the parent directory of the page that's including "navbar.txt"

As shown, you're also not limited to just including .htm files, but other static text files such as .txt. You cannot, however, include .cgi files using this syntax without additional configuration to your server. Just FYI, both the left menu and copyright footer on this page and across the site are dynamically included using SSI using two external files. To update the copyright notice for example, all I have to do is make one simple modification to one of these files, and the changes are reflected across the entire site.

Manually enabling SSI on your server

If you're on a Linux+Apache server that probably supports SSI but just doesn't have it enabled, you can try manually turning it on using the magical little file called .htaccess. Simply create an empty txt file called .htaccess, and add the below to it:
AddType text/html .shtml
AddHandler server-parsed .shtml
Options Indexes FollowSymLinks Includes
Then, upload this file to the root HTML directory of your server account, enabling SSI across the entire site for any web page with a .shtml extension. To isolate SSI ability to just a specific sub directory, upload this file to that directory instead.
Now, assuming this works, you should now make sure your web host allows you to manually turn on SSI. Some hosts may not like the idea of you turning on a feature without paying more, so to be safe, you should check with them first.

Enabling SSI on regular html pages

Now, the above requires that your webpages be named with a .shtml extension in order for SSI to be enabled for that page. If you're going to be using SSI across the board, you may want to consider just turning on SSI for regular .html and .htm pages, so you don't have to rename your files or follow the .shtml convention. To do this, add the below code to your .htaccess file:
AddHandler server-parsed .html
AddHandler server-parsed .htm
Viola, SSI should now be enabled for regular HTML pages as well!

Beginner's Guide to SSI (server side includes)


Beginner's Guide to SSI (server side includes)

Don't worry, SSI doesn't require a rocket-science degree to understand and use. It is, however, a highly useful feature that lets you do incredibly time saving tasks such as include the contents of an external file across multiple pages on your site, or access and display server specific information such as the current server time, visitor's IP address, etc. In this tutorial I'll introduce new comers to the wonderful world of SSI! SSI is short for Server Side Includes, by the way.

Does my server support SSI?

The first thing that needs to be settled is whether your server supports SSI and have it enabled. SSI is a Linux/Apache specific feature, so if you're on a Windows server for example, you'll need to look for the Windows equivalent of SSI (sorry, not a Window's guy). To test if your server supports SSI then, you can run a simple test, by inserting the below code inside a webpage, and saving the page with a .shtml extension (the most common extension configured to parse SSI by default):
test.shtml source:
<!--#echo var="DATE_LOCAL" -->
When you run test.shtml in your browser, you should see the current date plus time of your server displayed:

Saturday, 24-Aug-2013 04:39:16 MDT

If not, you can either ask your web host about SSI support for your account, or try and manually enable SSI, by reading "Enabling SSI on my server."

With that said, lets explore some nifty abilities of SSI now.

Using SSI to include the contents of an external file

The most common usage of SSI is to include the contents of an external file onto a page or across multiple pages on your site. Modify the external file, and all pages that have this file embedded is also updated with the modified information. For a site that uses the same header, navigational menu, or footer across pages, for example, this can save you countless time and energy. The syntax to embed the contents of an external file onto the current page is:
<!--#include file="external.htm"-->
<!--#include virtual="/external.htm"-->
Which one to use depends on where "external.htm" is located. The first command assumes that the file is located in the same directory as the document containing it while the second syntax uses an absolute reference to "external.htm" starting from your root HTML directory. Typically you'll want to turn to the second syntax, as the external files to include most likely will be in a central directory while the pages including them are scattered across different directories. Here are a couple of more examples of the second syntax:
<!--#include virtual="/includes/navbar.txt"-->
<!--#include virtual="../navbar.txt"-->
With the first code, I'm telling the server to look for "navbar.txt" inside the "includes" directory that's directly beneath the root HTML directory (ie: http://www.mysite.com/includes), while in the second code, I'm simply telling it to look in the parent directory of the page that's including "navbar.txt"

As shown, you're also not limited to just including .htm files, but other static text files such as .txt. You cannot, however, include .cgi files using this syntax without additional configuration to your server. Just FYI, both the left menu and copyright footer on this page and across the site are dynamically included using SSI using two external files. To update the copyright notice for example, all I have to do is make one simple modification to one of these files, and the changes are reflected across the entire site.

Manually enabling SSI on your server

If you're on a Linux+Apache server that probably supports SSI but just doesn't have it enabled, you can try manually turning it on using the magical little file called .htaccess. Simply create an empty txt file called .htaccess, and add the below to it:
AddType text/html .shtml
AddHandler server-parsed .shtml
Options Indexes FollowSymLinks Includes
Then, upload this file to the root HTML directory of your server account, enabling SSI across the entire site for any web page with a .shtml extension. To isolate SSI ability to just a specific sub directory, upload this file to that directory instead.
Now, assuming this works, you should now make sure your web host allows you to manually turn on SSI. Some hosts may not like the idea of you turning on a feature without paying more, so to be safe, you should check with them first.

Enabling SSI on regular html pages

Now, the above requires that your webpages be named with a .shtml extension in order for SSI to be enabled for that page. If you're going to be using SSI across the board, you may want to consider just turning on SSI for regular .html and .htm pages, so you don't have to rename your files or follow the .shtml convention. To do this, add the below code to your .htaccess file:
AddHandler server-parsed .html
AddHandler server-parsed .htm
Viola, SSI should now be enabled for regular HTML pages as well!

Posted at 03:50 |  by Unknown

Introduction to "robots.txt"

There is a hidden, relentless force that permeates the web and its billions of web pages and files, unbeknownst to the majority of us sentient beings. I'm talking about search engine crawlers and robots here. Every day hundreds of them go out and scour the web, whether it's Google trying to index the entire web, or a spam bot collecting any email address it could find for less than honorable intentions. As site owners, what little control we have over what robots are allowed to do when they visit our sites exist in a magical little file called "robots.txt."


"Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that.

Creating your "robots.txt" file

So lets get moving. Create a regular text file called "robots.txt", and make sure it's named exactly that. This file must be uploaded to the root accessible directory of your site, not a subdirectory (ie: http://www.mysite.com but NOT http://www.mysite.com/stuff/). It is only by following the above two rules will search engines interpret the instructions contained in the file. Deviate from this, and "robots.txt" becomes nothing more than a regular text file, like Cinderella after midnight.

Now that you know what to name your text file and where to upload it, you need to learn what to actually put in it to send commands off to search engines that follow this protocol (formally the "Robots Exclusion Protocol"). The format is simple enough for most intents and purposes: a USERAGENT line to identify the crawler in question followed by one or more DISALLOW: lines to disallow it from crawling certain parts of your site.

1) Here's a basic "robots.txt":
User-agent: *
Disallow: /
With the above declared, all robots (indicated by "*") are instructed to not index any of your pages (indicated by "/"). Most likely not what you want, but you get the idea.

2) Lets get a little more discriminatory now. While every webmaster loves Google, you may not want Google's Image bot crawling your site's images and making them searchable online, if just to save bandwidth. The below declaration will do the trick:
User-agent: Googlebot-Image
Disallow: /
3) The following disallows all search engines and robots from crawling select directories and pages:
User-agent: *
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.htm
4) You can conditionally target multiple robots in "robots.txt." Take a look at the below:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/
This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance.

5) There is a way to use Disallow: to essentially turn it into "Allow all", and that is by not entering a value after the semicolon(:):
User-agent: *
Disallow: /
User-agent: ia_archiver
Disallow:
Here I'm saying all crawlers should be prohibited from crawling our site, except for Alexa, which is allowed.

6) Finally, some crawlers now support an additional field called "Allow:", most notably, Google. As its name implies, "Allow:" lets you explicitly dictate what files/folders can be crawled. However, this field is currently not part of the "robots.txt" protocol, so my recommendation is to use it only if absolutely needed, as it might confuse some less intelligent crawlers.

Per Google's FAQs for webmasters, the below is the preferred way to disallow all crawlers from your site EXCEPT Google:

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

  1. Introduction to "robots.txt"
  2. The "robots" meta tag/ Useful links on "robots.txt"

Introduction to "robots.txt"


Introduction to "robots.txt"

There is a hidden, relentless force that permeates the web and its billions of web pages and files, unbeknownst to the majority of us sentient beings. I'm talking about search engine crawlers and robots here. Every day hundreds of them go out and scour the web, whether it's Google trying to index the entire web, or a spam bot collecting any email address it could find for less than honorable intentions. As site owners, what little control we have over what robots are allowed to do when they visit our sites exist in a magical little file called "robots.txt."


"Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that.

Creating your "robots.txt" file

So lets get moving. Create a regular text file called "robots.txt", and make sure it's named exactly that. This file must be uploaded to the root accessible directory of your site, not a subdirectory (ie: http://www.mysite.com but NOT http://www.mysite.com/stuff/). It is only by following the above two rules will search engines interpret the instructions contained in the file. Deviate from this, and "robots.txt" becomes nothing more than a regular text file, like Cinderella after midnight.

Now that you know what to name your text file and where to upload it, you need to learn what to actually put in it to send commands off to search engines that follow this protocol (formally the "Robots Exclusion Protocol"). The format is simple enough for most intents and purposes: a USERAGENT line to identify the crawler in question followed by one or more DISALLOW: lines to disallow it from crawling certain parts of your site.

1) Here's a basic "robots.txt":
User-agent: *
Disallow: /
With the above declared, all robots (indicated by "*") are instructed to not index any of your pages (indicated by "/"). Most likely not what you want, but you get the idea.

2) Lets get a little more discriminatory now. While every webmaster loves Google, you may not want Google's Image bot crawling your site's images and making them searchable online, if just to save bandwidth. The below declaration will do the trick:
User-agent: Googlebot-Image
Disallow: /
3) The following disallows all search engines and robots from crawling select directories and pages:
User-agent: *
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.htm
4) You can conditionally target multiple robots in "robots.txt." Take a look at the below:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/
This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance.

5) There is a way to use Disallow: to essentially turn it into "Allow all", and that is by not entering a value after the semicolon(:):
User-agent: *
Disallow: /
User-agent: ia_archiver
Disallow:
Here I'm saying all crawlers should be prohibited from crawling our site, except for Alexa, which is allowed.

6) Finally, some crawlers now support an additional field called "Allow:", most notably, Google. As its name implies, "Allow:" lets you explicitly dictate what files/folders can be crawled. However, this field is currently not part of the "robots.txt" protocol, so my recommendation is to use it only if absolutely needed, as it might confuse some less intelligent crawlers.

Per Google's FAQs for webmasters, the below is the preferred way to disallow all crawlers from your site EXCEPT Google:

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

  1. Introduction to "robots.txt"
  2. The "robots" meta tag/ Useful links on "robots.txt"

Posted at 03:27 |  by Unknown

The "robots" meta tag/ Useful links on "robots.txt" 

The "robots" meta tag

If your web host prohibits you from uploading "robots.txt" to the root directory, or you simply wish to restrict crawlers from a few select pages on your site, an alternative to "robots.txt" is to use the robots meta tag.

Creating your "robots" meta tag

The "robots" meta tag looks similar to any meta tag, and should be added between the HEAD section of your page(s) in question:
<meta name="robots" content="noindex,nofollow" />
Here's a list of the values you can specify within the "contents" attribute of this tag:

Value Description
(no)index Determines whether crawler should index this page. Possible values: "noindex" or "index"
(no)follow Determines whether crawler should follow links on this page and crawl them. Possible values: "nofollow" and "follow."

Here are a few examples:

1) This disallows both indexing and following of links by a crawler on that specific page:
<meta name="robots" content="noindex,nofollow" />
2) This disallows indexing of the page, but lets the crawler go on and follow/crawl links contained within it.
<meta name="robots" content="noindex,follow" />
3) This allows indexing of the page, but instructs the crawler to not crawl links contained within it:
<meta name="robots" content="index,nofollow" />
4) Finally, there is a shorthand way of declaring 1) above (don't index nor follow links on page):
<meta name="robots" content="none">

Useful Links on "robots.txt"

  Introduction to "robots.txt"

The "robots" meta tag/ Useful links on "robots.txt"

The "robots" meta tag/ Useful links on "robots.txt" 

The "robots" meta tag

If your web host prohibits you from uploading "robots.txt" to the root directory, or you simply wish to restrict crawlers from a few select pages on your site, an alternative to "robots.txt" is to use the robots meta tag.

Creating your "robots" meta tag

The "robots" meta tag looks similar to any meta tag, and should be added between the HEAD section of your page(s) in question:
<meta name="robots" content="noindex,nofollow" />
Here's a list of the values you can specify within the "contents" attribute of this tag:

Value Description
(no)index Determines whether crawler should index this page. Possible values: "noindex" or "index"
(no)follow Determines whether crawler should follow links on this page and crawl them. Possible values: "nofollow" and "follow."

Here are a few examples:

1) This disallows both indexing and following of links by a crawler on that specific page:
<meta name="robots" content="noindex,nofollow" />
2) This disallows indexing of the page, but lets the crawler go on and follow/crawl links contained within it.
<meta name="robots" content="noindex,follow" />
3) This allows indexing of the page, but instructs the crawler to not crawl links contained within it:
<meta name="robots" content="index,nofollow" />
4) Finally, there is a shorthand way of declaring 1) above (don't index nor follow links on page):
<meta name="robots" content="none">

Useful Links on "robots.txt"

  Introduction to "robots.txt"

Posted at 03:08 |  by Unknown

Text Widget

© 2013 iNet Freaks. WP Theme-junkie converted by BloggerTheme9
Blogger templates. Proudly Powered by Blogger.
back to top