configuring apache downloading, compiling and installing apache is straightforward and even someone...

23
Configuring Apache Downloading, compiling and installing apache is straightforward and even someone easy The real effort is to properly configure apache Configuring apache consists of specifying directives in the configuration file(s) using containers to override directives for specific directories, files or URLs writing proper rewrite and redirection rules compiling and utilizing important modules making proper use of apache environment variables properly configuring the Linux environment around apache In this chapter, we start with the basic configurations and explore the main configuration file, httpd.conf in chapter 5, we explore more advanced configurations and later in the semester, we also explore modules, rules, etc

Upload: benjamin-harvey

Post on 27-Dec-2015

252 views

Category:

Documents


0 download

TRANSCRIPT

Configuring Apache• Downloading, compiling and installing apache is

straightforward and even someone easy• The real effort is to properly configure apache• Configuring apache consists of– specifying directives in the configuration file(s)– using containers to override directives for specific directories,

files or URLs– writing proper rewrite and redirection rules– compiling and utilizing important modules– making proper use of apache environment variables– properly configuring the Linux environment around apache

• In this chapter, we start with the basic configurations and explore the main configuration file, httpd.conf– in chapter 5, we explore more advanced configurations and

later in the semester, we also explore modules, rules, etc

httpd.conf• Starting with apache 2.0, most of the configuration

information is stored in this one file– this file and other configuration information are placed in the

sysconfdir directory (if you specified this during the ./configure Linux command)

– otherwise, it will be found in your conf directory• This file contains three types of information– documentation (comments, which always follow the #

symbol)– directives – settings for various apache parameters

• such as DocumentRoot or CustomLog – these directives are system-wide – that is, they impact all of Apache

– containers – directives that will take effect on specific entities such as directories, files or URLs

• The httpd.conf file can be very large– you can reduce its size by dividing the file into several files

and use Include statements to attach other files to the main httpd.conf file

Basic Directives• ServerRoot – this specifies location in the Linux file

system where the httpd.conf file will be found– other files that are referenced through Include or

LoadModule statements will also be found in this directory• DocumentRoot – this specifies the location in the

Linux file system where the html files and other web documents will be stored– never make ServerRoot and DocumentRoot be the same

directory as this leads to security holes • people could potentially access and alter configuration files

– to fulfill a request, the server will take the path from the URL and append it to DocumentRoot • if DocumentRoot is /var/www/html • and the URL is http://www.someserver.com/dir1/dir2/file1.html • then the file accessed is /var/www/html/dir1/dir2/file1.html

– NOTE: DocumentRoot’s path should not end with a / to make sure this works correctly!

More Directives• ServerName – this is the IP alias of the server– in order to establish this, we must also have the

ServerName to IP address mapping stored in DNS tables• DirectoryIndex – a list of files that the server will

attempt to find in a given directory if no file name is given in the URL– e.g., DirectoryIndex index.html index.shtml index.php– if the URL is /someserver.com/foo/ then it will look for

the file foo/index.html, foo/index.shtml and foo/index.php (in that order)• if no such file is found, Apache usually sends back an error

message • you can also set it up so that Apache displays a directory listing

instead (we’ll see that later)• whether to display a directory or not is specified by directory

(that is, we can set it up so that directory /pub will display its contents but directory /files will return an error

And More Directives• User, Group – define the user and group that will

own the apache processes– typically both default to apache or www– this is useful when you want apache to execute scripts

and other processes• you don’t want this to be root as this could cause a security hole

– if you change this from the default, you have to create the user (and group)• in such a case, the user will probably not have a directory

• ServerSignature – this specifies whether a “signature” should be attached to a web page returned as the result of an error or directory listing– you can specify off, on or email such as

• ServerSignature off• ServerSignature [email protected]

MIME Type Directives• MIME is Multipurpose Internet Mail Extension– originally it was set up so that email attachments could be

handled based on type, but we extend this idea to our web server• the mime.types file (stored in the same directory as the httpd.conf file)

stores MIME type mappings of file extensions to MIME types– format is MIME type followed by a list of file extensions as

in:• text/html html htm• text/plain txt text asc

– you can edit the file to tailor the file extensions as you need to• DefaultType is a directive to specify the mime type to use

as default when a file is found that has no extension or an unknown extension – typically this will be some form of text or text/html

• AddType is a directive that allows you to add file extensions to a given format as in – AddType image/jpeg .jp .jpg .jpeg .jg

TCP Directives• Listen – to specify (and alter) the IP address(es) and

port(s) that apache will listen to for incoming requests– the default port is 80

• Listen 8080 (listen only to port 8080)• Listen 1.2.3.4 (listen only to this IP address)• Listen 1.2.3.4:8080 (listen only to this IP address and port)

– by having multiple Listen directives, you specify that apache should listen to ALL items listed• Listen 1.2.3.4:8080• Listen 1.2.3.4:80

– listen to both port 8080 and 80 for IP address 1.2.3.4

• Other TCP directives include– KeepAlive – on or off

• whether a connection should be long lived or not (it is on by default)– MaxKeepAliveRequests – defaults to 100

• if you set this to 0, it allows an unlimited number– KeepAliveTimeout – default is 15 (seconds)

• on servers with heavy traffic, you may want to reduce this

Apache Processes• When you start apache, you are invoking a single process– which will be denoted in Linux as the parent process

• This parent process will be owned by root and it will spawn some set number of child processes– these children will be owned by the apache user as denoted by

this directive (it is usually named apache)• Whenever a request arrives, it is serviced through one of

the available child processes– note: we refer to the parent and children as threads instead of

processes as they are all running the same executable code but have their own internal data/stack space

• At any one time, there may be dozens or more children running– children threads are persistent – they do not die as soon as

they finish a request but instead may persist until they have been around for awhile, or until the parent thread kills it off

– To establish the number of children that should be available at any time, use the directives on the next slide

Controlling Child Threads• StartServers – number of children to initially start up by

apache’s parent (the apache parent will spawn more children if or as needed)

• MinSpareServers, MaxSpareServers– a spare server is a child process not currently in use– these two values tell the apache parent how many child threads

should exist at any time – if three children are currently in use and there are 15 idle

children but MaxSpareServers is set to 10, then 5 children should be killed

– if there are 2 idle children and MinSpareServers is set to 5, then 3 new threads should be spawned

• MaxClients – the number of children that can exist at a maximum

• MaxRequestsPerChild – a child will persist until either it has serviced this many requests or the parent has killed it off because there are too many child threads around

Containers• The previous directives apply to the server as a whole• There are directives that can apply to specific resources

– these directives are placed inside of container definitions• There are many types of container directives:

– Directory – apply to anything in this directory (and subdirectories)– Files – apply to the listed file(s)– IfDefine – apply only if the given parameter in this statement was

given on the command line when Apache was invoked– IfModule – apply only if specified module is loaded– IfVersion – apply only if running specified version of apache– Limit – apply only if the http request used the specified method– LimitExcept – apply if the http request did not use the specified

method– Location – apply only to the given URL(s)– Proxy – apply only if URL is of a specified proxy server– VirtualHost – apply only to this particular virtual host

• there are also DirectoryMatch, FilesMatch, LocationMatch and ProxyMatch

Need to Restart apachectl• When apachectl first starts (or restarts), the

httpd.conf file is read– followed by any Include files– followed by any and all .htaccess files found under the

DocumentRoot directory structure• You will need to restart apachectl (or stop it and

start it anew) if– you make any changes to httpd.conf– you compile (or recompile) any new modules

(LoadModule statements are automatically inserted into httpd.conf, so you will need to quit out of vi if you are currently editing httpd.conf and you compile a module)

– you change or delete any .htaccess file or you add a new htaccess file anywhere under the DocumentRoot directory structure

Defining A Container• Containers are defined using < > notation like html– containers have three parts

• the opening of the container as in <Directory /var/web>• the directives for the container, listed one per line• the closing of the container </type> as in </Directory>

– directives vary by container type– we will cover the <Directory> container here but more

detail will be offered in chapter 5 (and other chapters)• the specified directory/file/URL spelling, capitalization and

punctuation must match to be applied• note that <Directory /> means the root directory, try not to

confuse the use of / in the Directory tag from the </Directory> closing tag!

– containers of the same type cannot be nested • in some cases, we can nest certain types within other types (e.g.,

Limit can go inside of Directory)

Using Directory Containers• The most common uses of the directory container are

to specify– whether a directory is displayed if the user omits the file

name or whether an error is returned• this is done using the directive Options Indexes

– to permit the use of symbolic links• suppose you set up a symbolic link from

/usr/local/apache2/htdocs/icons to /usr/local/apache2/icons• if we do not permit the use of symbolic links by Apache, then

any reference to this directory or its contents would result in an error

– to control who can access resources in the given directory using Allow or Deny (we cover these in more detail later in the semester)

<Directory /usr/local/apache2/htdocs/icons> Options FollowSymLinks </Directory>

Specifying the Directory(ies)• Spelling/capitalization must be exact– you can specify directory(ies) using regular expressions

• * for any number of characters, ? for 0 or 1 character, [ ]– <Directory “/var/www/html/[Ff]ox[r]?”>

– would fit any of Fox, fox, Foxr, foxr

– the wild card characters (*, ?) will not match a / symbol • so for instance, if you have /[A-Za-z]*/public_html then this will

match /home/public_html but not /home/foxr/public_html

– if you precede the directory with a ~, this allows you to use extended regex as in

– <Directory ~ “^/www/.*/[0-9]{3}”>• this regular expression says “any path that starts with /www and

is followed by any directory name and ends with a resource (file or directory) whose name is 3 digits

Continued• The directory is specified from / (the file system root), not

from DocumentRoot– if DocumentRoot is /usr/local/apache2/htdocs then you would

have to use <Directory /usr/local/apache2/htdocs> for a directive to effect the entire web space

• Log files, error files, icons, cgi scripts, etc, might be in another portion of the file system– so the <Directory> container allows you to specify directives

that impact these areas without impacting the web pages themselves, for example:• <Directory /var/icons>• <Directory /var/error>• <Directory /usr/local/apache2/cgi-bin>

– the good news is that you do not have to place all server files under DocumentRoot (which could lead to security issues)

– the bad news is that you have to remember to specify directories appropriately

Additional Notes on <Directory>• If you use a regular expression in your <Directory>

specification, apache will first apply any <Directory> definitions that do not have regular expressions– for instance <Directory /> and <Directory /usr/local/apache2/htdocs>

will both be enacted before <Directory /var/www/*/foo>• If a directory can be reached both with a normal path and

through symbolic links, the <Directory> definition only covers access through the normal path– for instance, we define <Directory /usr/local/apache2/htdocs/fox>

but there is a symbolic link from /usr/local/apache2/htdocs/zappa to this directory

– a URL of /usr/local/apache2/htdocs/zappa will not apply the <Directory> directive(s) • you should discourage the use of symbolic links within the DocumentRoot

subtree because of this

• <DirectoryMatch> is the same as <Directory> except that the argument is a regular expression – the <Directory> container may include a regex

Protecting Directories• It is common for security purposes to define the following

– this means that the Linux root should be inaccessible by everyone• Now we must explicitly make accessible the actual web portion of

the file system (from DocumentRoot down)

• How can anyone access the Linux root anyway? – they should only be able to access DirectoryRoot and below because

URLs automatically map to DirectoryRoot – through symbolic links, redirections, aliases and other mechanisms we

will see throughout the semester

<Directory /> Order Deny,Allow # implement Deny first so that Deny from All # Allow will override Deny</Directory> # but Deny everyone

<Directory /usr/local/apache2/htdocs> Order Allow,Deny Allow from All</Directory>

We can similarly allow access toother directories under / that may notbe part of DocumentRoot, suchas user directories under /home

Other Config Files• The httpd.conf file is not the only file that apache can load

upon starting/restarting• Apache can load other config files if they are available– the directive Include allows you to specify what other files

should be loaded from the specified directories• Include conf/*.conf will load all conf files found in the conf directory

– in apache 2.2, a lot of the directives that used to go into httpd.conf have been moved into other files to be included, particularly httpd-default.conf

• In addition to *.conf files, apache will automatically load any .htaccess file it finds in any directory/subdirectory under DocumentRoot – as long as this is not overridden through a <Directory>

directive• note that if something is defined in one directory, unless overridden, the

same directive is applied in all subdirectories

Modules• Modules define still more directives that the web

administrator can apply– modules range in use from

• authentication modules (to force users to log in)• to logging modules (in case you want to override the default logging

behavior) • to proxy modules to security modules

• In older versions of apache, you would have to use LoadModule directives to load each module explicitly– with apache 2.2, there is a group of modules known as the

apache core, that are automatically loaded, simplifying our httpd.conf file

• To use modules in older apache versions, you would have to compile each module you wanted to use, but the core modules are automatically compiled now simplifying our ./configure command (we explore modules in chapter 5)

Overrides• What happens if a directive is defined in two

locations?– for example

• ServerSignature is defined in the server configuration portion• a more specific ServerSignature is in a <Directory> container• an even more specific value is defined in a

subdirectory’s .htaccess file

– which version is applied?

• Typically, any directive defined for a <Directory> (or <Files> or <Location>) container will override the definition from the server configuration– an .htaccess directive will not override other definitions

unless you have specified that they can be overridden with an AllowOverride directive

Example• You have defined a directive in httpd.conf (A)• You have defined a container for the directory

<Directory /usr/local/apache2/htdocs/CIT436> where you have the same directive with a new value (B)

• In the subdirectory CIT436 you have a .htaccess file with the same directive given a new value (C)

• Anywhere in the file system, the directive value is A except– in /usr/local/apache2/htdocs/CIT436 and its subdirectories

where it is B– unless you have specified in the CIT436 <Directory>

container AllowOverride in which case in the directory and subdirectories, it will be C

• Confused? We will visit AllowOverrides in more detail in chapter 5

Other Apache Programs• If you examine the apache bin directory (where you

found apachectl), you will also find– rotatelogs – used to specify how often apache log files

should be rotated• you can indicate which log file(s) and the rotation time in

seconds– logresolve – a program that takes a logfile of IP aliases

and translates them to addresses • so that you can analyze the addresses from which you have

received requests – it contains its own cache so that once an IP address has been resolved, the cache is examined in order to limit traffic to your DNS

– htdigest, htdbm and htpasswd are all used to allow apache to perform authentication of users (logins) – we will cover some of these in more detail in chapter 9

– ab – apache benchmarking tool to gage apache’s performance

– apxs – to compile modules

Environment Variables• These are variables defined by Apache

– these variables are not part of the OS environment • for instance, you cannot access them from the command line or in other

Linux software– you can however use these variables in cgi scripts that are invoked

from apache• Environment variables are used typically only in restricted

directives:– Allow, Deny– BrowserMatch, BrowserMatchNoCase– PassEnv (to pass a variable to a script)– RewriteRule (to change a URL)– SetEnv, SetEnvIf, SetEnvIfNoCase – to create or change the value

of an environment variable– UnsetEnv– LogFormat (and related logging directives)

• we will explore some of these variables and their usage later in the semester